Hi @razaullah
1) What is stored inside state[“messages”]?
state[“messages”] is the agent’s conversation history as a list of LangChain message objects.
In the agent state schema, it’s defined as messages: list[AnyMessage] (with LangGraph’s add_messages reducer).
What kinds of messages are in it
AnyMessage includes the normal chat message types you see in agent loops, e.g.:
- Human messages (user inputs)
- AI messages (model outputs; may include tool calls)
- Tool messages (tool results; appended after tools execute)
- System messages can exist as message objects in general, but in create_agent the system prompt is handled separately
You can see in create_agent’s docstring that the loop works by:
- model produces an AIMessage with tool_calls
- tools run and their outputs are added as ToolMessage objects
- model is called again with the updated message list
Why system prompts / tool schemas often aren’t counted in state[“messages”]
Two key implementation details:
- The middleware (including SummarizationMiddleware) reads only state[“messages”]
- The model call uses a separate system_message field in the ModelRequest, and only prepends it at call time. Also ModelRequest.messages is explicitly documented as “excluding system message”
So: state[“messages”] is the conversation + tool results history, but the system prompt may be outside it, and tool schemas are not messages at all (they’re passed via the tools bindings at model-call time).
2) Tool responses increase context - best ways to handle it
A) Use Context Editing to clear older tool outputs (best “drop-in” fix)
LangChain provides ContextEditingMiddleware with ClearToolUsesEdit, which is specifically designed to remove older tool outputs while preserving the most recent N tool results.
Docs: Built-in middleware → Context editing (section “Context editing” / ClearToolUsesEdit).
This directly targets your stated pain: large ToolMessage content bloating context.
B) Use SummarizationMiddleware, but recognize what it can/can’t shrink
SummarizationMiddleware summarizes older messages (and then keeps only a configured suffix). It triggers based on counting state[“messages”] (default token counting is approximate; docs call this out).
This helps with tool results only insofar as those tool results are inside state[“messages”] (they are), but it won’t reduce tool schema overhead.
C) Reduce tool schema and selection overhead (prevents tool bloat from being injected every turn)
If you have many tools, passing all tool definitions every call can be expensive. A strong pattern is selecting only a small subset of tools per turn.
LangChain supports this via LLMToolSelectorMiddleware.
D) Make tool outputs “reference-based” (architectural best practice)
Even with middleware, the cheapest tokens are the ones you never generate:
- Have tools store large payloads externally (DB / object storage / vector store)
- Return a short summary + an ID/link, not the full raw text
- For retrieval tools, return top-k small snippets, not entire documents
Note about your model (openai/gpt-oss-120b)
Your provider change doesn’t change what state[“messages”] contains. It may, however, affect how accurately token triggers work (because SummarizationMiddleware defaults to approximate counting unless you override token_counter.