Summarization Middleware

pawel-twardziak · January 25, 2026, 6:29pm

Hi @razaullah

1) What is stored inside state[“messages”]?

state[“messages”] is the agent’s conversation history as a list of LangChain message objects.

In the agent state schema, it’s defined as messages: list[AnyMessage] (with LangGraph’s add_messages reducer).

What kinds of messages are in it

AnyMessage includes the normal chat message types you see in agent loops, e.g.:

Human messages (user inputs)
AI messages (model outputs; may include tool calls)
Tool messages (tool results; appended after tools execute)
System messages can exist as message objects in general, but in create_agent the system prompt is handled separately

You can see in create_agent’s docstring that the loop works by:

model produces an AIMessage with tool_calls
tools run and their outputs are added as ToolMessage objects
model is called again with the updated message list

Why system prompts / tool schemas often aren’t counted in state[“messages”]

Two key implementation details:

The middleware (including SummarizationMiddleware) reads only state[“messages”]
The model call uses a separate system_message field in the ModelRequest, and only prepends it at call time. Also ModelRequest.messages is explicitly documented as “excluding system message”

So: state[“messages”] is the conversation + tool results history, but the system prompt may be outside it, and tool schemas are not messages at all (they’re passed via the tools bindings at model-call time).

2) Tool responses increase context - best ways to handle it

A) Use Context Editing to clear older tool outputs (best “drop-in” fix)

LangChain provides ContextEditingMiddleware with ClearToolUsesEdit, which is specifically designed to remove older tool outputs while preserving the most recent N tool results.
Docs: Built-in middleware → Context editing (section “Context editing” / ClearToolUsesEdit).
This directly targets your stated pain: large ToolMessage content bloating context.

B) Use SummarizationMiddleware, but recognize what it can/can’t shrink

SummarizationMiddleware summarizes older messages (and then keeps only a configured suffix). It triggers based on counting state[“messages”] (default token counting is approximate; docs call this out).

This helps with tool results only insofar as those tool results are inside state[“messages”] (they are), but it won’t reduce tool schema overhead.

C) Reduce tool schema and selection overhead (prevents tool bloat from being injected every turn)

If you have many tools, passing all tool definitions every call can be expensive. A strong pattern is selecting only a small subset of tools per turn.
LangChain supports this via LLMToolSelectorMiddleware.

D) Make tool outputs “reference-based” (architectural best practice)

Even with middleware, the cheapest tokens are the ones you never generate:

Have tools store large payloads externally (DB / object storage / vector store)
Return a short summary + an ID/link, not the full raw text
For retrieval tools, return top-k small snippets, not entire documents

Note about your model (openai/gpt-oss-120b)

Your provider change doesn’t change what state[“messages”] contains. It may, however, affect how accurately token triggers work (because SummarizationMiddleware defaults to approximate counting unless you override token_counter.

Topic		Replies	Views
SummarizationMiddleware LangGraph python-help	8	272	December 13, 2025
Questions about SummarizationMiddleware outputs for different models and using summary_prompt LangChain python-help	2	265	September 19, 2025
Complete context compression through middleware LangChain intro-to-langgraph , python-help	2	38	March 19, 2026
Exclude tools output from Summary middleware LangChain python-help	2	172	January 5, 2026
Summarization for Multi Agent Systems Talking Shop python-help	2	245	November 26, 2025