How can i disable cache while using deepagents, i m using litllm for that
deepagent auto add cache headers, but haiku model not support it
{“error_code”:“BAD_REQUEST”,“message”:“{\“message\”:\“cache_control: Extra inputs are not permitted\”}”
How can i disable cache while using deepagents, i m using litllm for that
deepagent auto add cache headers, but haiku model not support it
{“error_code”:“BAD_REQUEST”,“message”:“{\“message\”:\“cache_control: Extra inputs are not permitted\”}”
hi @AmitPZepto
what I can see from the souce code is that there is no public flag on create_deep_agent to turn the prompt-cache middleware off. It is appended unconditionally to the middleware stack (deepagents/graph.py:462, :520, :591):
# graph.py (deepagents)
gp_middleware.append(AnthropicPromptCachingMiddleware(unsupported_model_behavior="ignore"))
# ...
subagent_middleware.append(AnthropicPromptCachingMiddleware(unsupported_model_behavior="ignore"))
# ...
deepagent_middleware.append(AnthropicPromptCachingMiddleware(unsupported_model_behavior="ignore"))
The good news: that middleware is a no-op for any model that is not a ChatAnthropic instance - including ChatLiteLLM. So the right fix depends on why you are seeing cache_control headers at all.
AnthropicPromptCachingMiddleware._should_apply_caching gates on an isinstance check:
# langchain_anthropic/middleware/prompt_caching.py
def _should_apply_caching(self, request: ModelRequest) -> bool:
if not isinstance(request.model, ChatAnthropic):
msg = (
"AnthropicPromptCachingMiddleware caching middleware only supports "
f"Anthropic models, not instances of {type(request.model)}"
)
if self.unsupported_model_behavior == "raise":
raise ValueError(msg)
if self.unsupported_model_behavior == "warn":
warn(msg, stacklevel=3)
return False
...
Because deepagents constructs it with unsupported_model_behavior="ignore", a non-Anthropic model silently skips caching - no cache_control block, no warning, no error. A real langchain_litellm.ChatLiteLLM instance would not get cache headers added.
So if you are seeing cache headers reach the wire, one of the following is true:
"anthropic:claude-haiku-4-5") to create_deep_agent. Internally resolve_model() calls init_chat_model() (deepagents/_models.py:45), which constructs a ChatAnthropic - not LiteLLM. The middleware then does fireChatAnthropic at a LiteLLM proxy via base_url=.... It is still a ChatAnthropic instance, so cache headers are injected; whether the proxy forwards them correctly is a separate problemFix 1 - actually use
If your intent is “talk to Haiku through LiteLLM”, instantiate ChatLiteLLM yourself and pass the instance (not a string). The cache middleware will see it is not ChatAnthropic and silently skip (unsupported_model_behavior="ignore").
from langchain_litellm import ChatLiteLLM
from deepagents import create_deep_agent
llm = ChatLiteLLM(model="claude-3-5-haiku-20241022", temperature=0)
agent = create_deep_agent(
model=llm, # pass the instance, NOT a string like "anthropic:..."
tools=[...],
system_prompt="...",
)
Docs: ChatLiteLLM integration.
With this setup there are no cache_control blocks in the outbound payload at all - verify by enabling LiteLLM debug logging (litellm._turn_on_debug()). If you still see them, your model is not what you think it is; inspect type(agent.nodes[...].runnable.model) or just print the bound chat model.
Fix 2 - if you must keep ChatAnthropic but do not want caching
No public API exists today, so your options are:
(a) Subclass / replace the middleware. Build your own no-op class and ship it; you still cannot remove the deepagents-appended one, but you can post-process the request in your own middleware that runs inside it:
from langchain.agents.middleware.types import AgentMiddleware
class StripCacheControl(AgentMiddleware):
def wrap_model_call(self, request, handler):
# Remove cache_control that the Anthropic middleware just injected
ms = dict(request.model_settings or {})
ms.pop("cache_control", None)
request = request.override(model_settings=ms)
# also strip from system + tools if you need to be thorough
return handler(request)
the AnthropicPromptCachingMiddleware is appended after user middleware= in graph.py:580-591, so your middleware wraps it from the outside. In the LangChain agents middleware model, wrap_model_call composes like an onion: the last-appended middleware runs closest to the model, which means your middleware runs after the cache middleware on the response but before it on the request - so stripping on the way in will be re-added by the Anthropic middleware on its way down. The practical way to kill it is a monkey-patch:
Apply this once at import time, before calling create_deep_agent. Ugly, but it is the only reliable switch today.
(b) Open a feature request. A disable_prompt_cache: bool = False kwarg on create_deep_agent (or better, letting the caller fully replace the tail middleware) is a reasonable ask - track or file it at https://github.com/langchain-ai/deepagents/issues.
Worth sanity-checking the premise. Per Anthropic’s prompt-caching docs, Claude 3 Haiku, Claude 3.5 Haiku and Claude Haiku 4.5 all support prompt caching (2,048-token minimum for the Haiku family). If your LiteLLM call is failing with a cache-related error, the likely culprit is not the model - it is that:
cache_control blocks on the Anthropic path, orIf you share the actual error message from LiteLLM, the root cause is usually identifiable without disabling caching at all. But if you just want it gone: use Fix 1.
AnthropicPromptCachingMiddleware (isinstance(ChatAnthropic) gate): https://github.com/langchain-ai/langchain/blob/master/libs/partners/anthropic/langchain_anthropic/middleware/prompt_caching.pyChatLiteLLM integration: https://docs.langchain.com/oss/python/integrations/chat/litellmWould love to know more about your use case @AmitPZepto – what does LiteLLM caching do better? Are you using ChatLiteLLM?
We have customization in the pipeline. Knowing more will help us get it right.
but same here
error: Bad request (400): Error code: 400 - {‘error’: {‘message’: 'litellm.BadRequestError: DatabricksException - {“error_code”:“BAD_REQUEST”,“message”:"{\“message\”:\“cache_control: Extra inputs are not permitted\”}
import logging
from deepagents import create_deep_agent
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
load_dotenv()
# Databricks in LiteLLM proxy rejects cache_control - disable caching
logging.basicConfig(
level=logging.INFO,
format=“%(levelname)s | %(message)s”,
)
log = logging.getLogger(_name_)
model = ChatAnthropic(model=“claude-sonnet-4-6”, temperature=0)
@tooltooltooltool
def spawn_agent(task_type: str, task_description: str) → str:
“”“Spawn a new specialized agent for any task dynamically.”“”
log.info(
“supervisor → spawn_agent | task_type=%r | task_description=%r”,
task_type,
task_description,
)
prompt_response = model.invoke(
f"Write a short system prompt (2-3 lines) for an AI agent specialized in: {task_type}. "
f"Task it needs to handle: {task_description}. "
f"Return ONLY the system prompt, nothing else."
)
system_prompt = prompt_response.content
print(f"\\n\[SPAWNED\] {task_type} agent")
print(f"\[PROMPT\] {system_prompt}\\n")
agent = create_deep_agent(
name=f"{task_type}\_agent",
tools=\[\],
system_prompt=system_prompt,
)
result = agent.invoke(
{"messages": \[{"role": "user", "content": task_description}\]}
)
return result\["messages"\]\[-1\].content
supervisor = create_deep_agent(
name=“supervisor”,
tools=[spawn_agent],
system_prompt=“”"You are a supervisor agent.
For EVERY task from the user:
on the basis of user input, you need to decide the task_type and task_description.
“”",
)
while True:
user_input = input("\nYou: ").strip()
if not user_input or user_input == “exit”:
break
result = supervisor.invoke(
{"messages": \[{"role": "user", "content": user_input}\]}
)
print("result: ", result)
print(f"\\nSupervisor: {result\['messages'\]\[-1\].content}")
```
error: Bad request (400): Error code: 400 - {‘error’: {‘message’: 'litellm.BadRequestError: DatabricksException - {“error_code”:“BAD_REQUEST”,“message”:"{\“message\”:\“cache_control: Extra inputs are not permitted\”} with using deepagent sdk