Modal Inference

Btibert3 · April 30, 2026, 5:43pm

I am using modal to host an LLM that uses the OpenAI API spec.

Has anyone had success configuring the deepagents CLI to use inference for model(s) hosted on modal?

My latest attempt was a swing at using litellm as the provider. It didn’t throw any error, and even though the model was switched within the interface via /model, it calls were hitting my default LLM which is an OpenAI model.

mdrxy · April 30, 2026, 6:01pm

Hey, happy to help. Quick question first: what’s pulling you toward LiteLLM here? Modal’s vLLM endpoint already speaks the OpenAI spec, so for a single endpoint you can wire it up directly with langchain-openai and skip LiteLLM entirely. If you’re using LiteLLM because you want unified routing across multiple providers (model-aliases, fallbacks, cost tracking, the proxy server), totally valid — the config is just a bit different. Knowing which case it is will save back-and-forth.

In the meantime, here’s what I’d try.

Option 1 — Direct `ChatOpenAI` (recommended for a single Modal endpoint)

The CLI lets you define a custom provider in ~/.deepagents/config.toml that points at any BaseChatModel subclass. For an OpenAI-compatible endpoint like Modal’s vLLM server, ChatOpenAI is the simplest fit:

[models]
default = "modal:my-llm"

[models.providers.modal]
class_path  = "langchain_openai.chat_models:ChatOpenAI"
base_url    = "https://<workspace>--<app>-serve.modal.run/v1"
api_key_env = "MODAL_API_KEY"
models      = ["my-llm"]   # whatever id vLLM serves

[models.providers.modal.params]
temperature = 0  # as needed

Then:

export MODAL_API_KEY=anything   # ChatOpenAI requires a non-empty value; vLLM ignores it unless you've added auth
deepagents
# /model → modal:my-llm

Under the hood the CLI calls ChatOpenAI(model="my-llm", base_url=..., api_key=..., temperature=0), so requests actually hit Modal — not OpenAI.

Option 2 — If you do want LiteLLM

The reason /model looked like it switched but kept hitting OpenAI: LiteLLM’s ChatLiteLLM class uses api_base, not base_url. The CLI’s top-level base_url field is a ChatOpenAI convention and gets silently dropped by LiteLLM, so traffic falls through to its default routing. Put the URL under params.api_base and prefix the model id so LiteLLM picks the OpenAI-compatible adapter:

[models.providers.litellm]
api_key_env = "LITELLM_API_KEY"
models      = ["openai/my-llm"]

[models.providers.litellm.params]
api_base = "https://<workspace>--<app>-serve.modal.run/v1"

Verifying it actually routed

Set LANGSMITH_TRACING=true and look at the trace’s invocation_params/base URL. If you still see api.openai.com, the kwargs didn’t reach the client.

Let me know which path fits and we’ll iron out the rest.

Btibert3 · April 30, 2026, 11:39pm

Thanks. LiteLLM was an attempt to try to stay inside the providers and “fake out” the call for modal.

Your config suggestions allowed me to hit modal and quickly switch within the cli. However it’s throwing a 400 error, but should note that I can get a successful hit modal with the following script:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(

    model="QuantTrio/Qwen3.6-35B-A3B-AWQ",

    base_url="https://MYMODAL-serve.modal.run/v1",

    api_key="dummy",

    temperature=0,

    max_tokens=50,

)

print(llm.invoke("Say hello in one sentence").content)

mdrxy · May 2, 2026, 9:14pm

Can you paste the full 400 body from Modal’s app logs?

My hunch is that vLLM tool-calling might not be enabled on your end. The CLI binds tools on every call, so requests go out with tools=[...] and an implicit tool_choice="auto".

Can you update your standalone invoke("Say hello") to bind a tool?

Topic		Replies	Views
Langchiain deep_agent very slowly LangChain intro-to-langgraph , python-help	12	710	February 24, 2026
Cache disable in Deepagent Deep Agents	4	223	April 22, 2026
Help - How to use Google Gemini Models with createDeepAgent LangGraph js-help	1	325	January 20, 2026
Langchain model providing adapters Deep Agents self-hosted , cloud , python-help , feature-request	2	61	June 22, 2026
Deep Agents in Cloud LangGraph self-hosted , guidelines , intro-to-langgraph , python-help	0	169	October 6, 2025

Modal Inference

Option 1 — Direct ChatOpenAI (recommended for a single Modal endpoint)

Option 2 — If you do want LiteLLM

Verifying it actually routed

Related topics

Option 1 — Direct `ChatOpenAI` (recommended for a single Modal endpoint)