I’m using ContextEditingMiddleware and SummarizationMiddleware together in create_agent, and I’d like to understand the exact execution semantics when they’re combined. The docs describe each one individually but don’t explain how they interact.
Execution order — Which one runs first? Does the order in the middleware=[] list matter, or is it determined by hook type (before_model vs wrap_model_call)?
Data flow — Does one middleware’s output feed into the other?
Does the cleanup affect what the summary LLM sees?
Does the summary output become the input that cleanup operates on?
Both triggered at once — If the conversation exceeds both thresholds in the same call:
Do both run?
Which result reaches the main LLM?
Are the effects combined, or does one override the other?
State vs request — I noticed SummarizationMiddleware uses before_model and ContextEditingMiddleware uses wrap_model_call. Does that mean summarization persistently modifies state["messages"] while context editing only modifies the per-call request? What are the multi-turn implications?
Recommended pattern — Is the “cheap cleanup first, summarization as fallback” pattern (often suggested in community tutorials) actually supported by this combination, or does it require custom middleware?
A concrete walkthrough showing the message list before and after each middleware runs — especially for the “both triggered” case — would be really helpful.
I think this answer is addressing a different layer of the problem.
I’m not trying to solve middleware coordination in general — I’m specifically asking how LangChain currently executes these two middlewares internally.
Hello @rushant001
Good question, the interaction is subtle because these two middleware use completely different hooks.
Hook types and what that means
SummarizationMiddleware → before_model: compiled into a graph node that runs before the model node, writes back to state["messages"] persistently via the add_messages reducer.
ContextEditingMiddleware → wrap_model_call: a function wrapper inside the model node, operates on a deepcopy of request.messages only, never touches state.
Execution order (always, regardless of list position)
SummarizationMiddleware.before_model
→ model_node starts
→ ContextEditingMiddleware.wrap_model_call
→ actual LLM call
List position matters within each hook type, but since these two use different hooks, summarization always runs first. ContextEditingMiddleware listed first in middleware=[] doesn’t move it before SummarizationMiddleware’s node.
Data flow
Summarization’s output feeds context editing’s input, in one direction only. The ModelRequest is built from state["messages"] after before_model has already run:
So context editing sees the already-summarized message list. Summarization never sees what context editing does.
Both triggered at once
Both run and effects compound - neither overrides the other. The LLM sees: summarized messages, with tool results cleared within that already-reduced set. If the post-summarization list is already under context editing’s trigger threshold, context editing simply does nothing (it re-evaluates token count on its deepcopy and returns early).
State persistence - the most important distinction
SummarizationMiddleware
ContextEditingMiddleware
Hook
before_model
wrap_model_call
Scope
Persistent: writes RemoveMessage + summary + preserved to state["messages"]
Ephemeral: deepcopy only, state is never touched
Next turn
Agent starts with the compressed history
Agent starts with the original, uncleared tool messages
Cost
One LLM call to the summary model
Zero LLM calls (might also depend on your other custom edits)
Context editing’s cleared "[cleared]" content exists only for the duration of that one model call. On the next turn, state["messages"] still has the original tool results, context editing re-evaluates from scratch on each call.
“Cheap cleanup first, summarization as fallback” - is it real?
Partially. These two don’t have a conditional relationship, they always run independently and their effects compound when both thresholds are exceeded. The list order doesn’t make context editing a gate that prevents summarization from firing.
What they actually are is complementary layers:
Context editing: cheap, stateless, runs on every call - keeps each individual LLM call’s effective token count low.
Summarization: expensive (LLM call), stateful - fires occasionally to compress the persistent history when it gets too large.
If you want true “run A, only run B if A wasn’t enough,” that requires a single custom middleware that checks both conditions in sequence. The built-in primitives don’t express conditional chaining between separate middleware instances.