Modal Inference

I am using modal to host an LLM that uses the OpenAI API spec.

Has anyone had success configuring the deepagents CLI to use inference for model(s) hosted on modal?

My latest attempt was a swing at using litellm as the provider. It didn’t throw any error, and even though the model was switched within the interface via /model, it calls were hitting my default LLM which is an OpenAI model.

Hey, happy to help. Quick question first: what’s pulling you toward LiteLLM here? Modal’s vLLM endpoint already speaks the OpenAI spec, so for a single endpoint you can wire it up directly with langchain-openai and skip LiteLLM entirely. If you’re using LiteLLM because you want unified routing across multiple providers (model-aliases, fallbacks, cost tracking, the proxy server), totally valid — the config is just a bit different. Knowing which case it is will save back-and-forth.

In the meantime, here’s what I’d try.

Option 1 — Direct ChatOpenAI (recommended for a single Modal endpoint)

The CLI lets you define a custom provider in ~/.deepagents/config.toml that points at any BaseChatModel subclass. For an OpenAI-compatible endpoint like Modal’s vLLM server, ChatOpenAI is the simplest fit:

[models]
default = "modal:my-llm"

[models.providers.modal]
class_path  = "langchain_openai.chat_models:ChatOpenAI"
base_url    = "https://<workspace>--<app>-serve.modal.run/v1"
api_key_env = "MODAL_API_KEY"
models      = ["my-llm"]   # whatever id vLLM serves

[models.providers.modal.params]
temperature = 0  # as needed

Then:

export MODAL_API_KEY=anything   # ChatOpenAI requires a non-empty value; vLLM ignores it unless you've added auth
deepagents
# /model → modal:my-llm

Under the hood the CLI calls ChatOpenAI(model="my-llm", base_url=..., api_key=..., temperature=0), so requests actually hit Modal — not OpenAI.

Option 2 — If you do want LiteLLM

The reason /model looked like it switched but kept hitting OpenAI: LiteLLM’s ChatLiteLLM class uses api_base, not base_url. The CLI’s top-level base_url field is a ChatOpenAI convention and gets silently dropped by LiteLLM, so traffic falls through to its default routing. Put the URL under params.api_base and prefix the model id so LiteLLM picks the OpenAI-compatible adapter:

[models.providers.litellm]
api_key_env = "LITELLM_API_KEY"
models      = ["openai/my-llm"]

[models.providers.litellm.params]
api_base = "https://<workspace>--<app>-serve.modal.run/v1"

Verifying it actually routed

Set LANGSMITH_TRACING=true and look at the trace’s invocation_params/base URL. If you still see api.openai.com, the kwargs didn’t reach the client.

Let me know which path fits and we’ll iron out the rest.

Thanks. LiteLLM was an attempt to try to stay inside the providers and “fake out” the call for modal.

Your config suggestions allowed me to hit modal and quickly switch within the cli. However it’s throwing a 400 error, but should note that I can get a successful hit modal with the following script:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(

    model="QuantTrio/Qwen3.6-35B-A3B-AWQ",

    base_url="https://MYMODAL-serve.modal.run/v1",

    api_key="dummy",

    temperature=0,

    max_tokens=50,

)

print(llm.invoke("Say hello in one sentence").content)

Can you paste the full 400 body from Modal’s app logs?

My hunch is that vLLM tool-calling might not be enabled on your end. The CLI binds tools on every call, so requests go out with tools=[...] and an implicit tool_choice="auto".

Can you update your standalone invoke("Say hello") to bind a tool?