Hey, happy to help. Quick question first: what’s pulling you toward LiteLLM here? Modal’s vLLM endpoint already speaks the OpenAI spec, so for a single endpoint you can wire it up directly with langchain-openai and skip LiteLLM entirely. If you’re using LiteLLM because you want unified routing across multiple providers (model-aliases, fallbacks, cost tracking, the proxy server), totally valid — the config is just a bit different. Knowing which case it is will save back-and-forth.
In the meantime, here’s what I’d try.
Option 1 — Direct ChatOpenAI (recommended for a single Modal endpoint)
The CLI lets you define a custom provider in ~/.deepagents/config.toml that points at any BaseChatModel subclass. For an OpenAI-compatible endpoint like Modal’s vLLM server, ChatOpenAI is the simplest fit:
[models]
default = "modal:my-llm"
[models.providers.modal]
class_path = "langchain_openai.chat_models:ChatOpenAI"
base_url = "https://<workspace>--<app>-serve.modal.run/v1"
api_key_env = "MODAL_API_KEY"
models = ["my-llm"] # whatever id vLLM serves
[models.providers.modal.params]
temperature = 0 # as needed
Then:
export MODAL_API_KEY=anything # ChatOpenAI requires a non-empty value; vLLM ignores it unless you've added auth
deepagents
# /model → modal:my-llm
Under the hood the CLI calls ChatOpenAI(model="my-llm", base_url=..., api_key=..., temperature=0), so requests actually hit Modal — not OpenAI.
Option 2 — If you do want LiteLLM
The reason /model looked like it switched but kept hitting OpenAI: LiteLLM’s ChatLiteLLM class uses api_base, not base_url. The CLI’s top-level base_url field is a ChatOpenAI convention and gets silently dropped by LiteLLM, so traffic falls through to its default routing. Put the URL under params.api_base and prefix the model id so LiteLLM picks the OpenAI-compatible adapter:
[models.providers.litellm]
api_key_env = "LITELLM_API_KEY"
models = ["openai/my-llm"]
[models.providers.litellm.params]
api_base = "https://<workspace>--<app>-serve.modal.run/v1"
Verifying it actually routed
Set LANGSMITH_TRACING=true and look at the trace’s invocation_params/base URL. If you still see api.openai.com, the kwargs didn’t reach the client.
Let me know which path fits and we’ll iron out the rest.