How do ContextEditingMiddleware and SummarizationMiddleware interact when used together?Combining ContextEditingMiddleware + SummarizationMiddleware — execution order and behavior when both trigger?

rushant001 · April 20, 2026, 1:09pm

Hi everyone,

I’m using ContextEditingMiddleware and SummarizationMiddleware together in create_agent, and I’d like to understand the exact execution semantics when they’re combined. The docs describe each one individually but don’t explain how they interact.

My code:

agent_executor = create_agent(
    model=self.llm_service.llm,
    tools=self.tools,
    system_prompt=system_prompt,
    middleware=[
        # Layer 1: clean up old tool results (lightweight, no LLM call)
        ContextEditingMiddleware(
            edits=[
                ClearToolUsesEdit(
                    trigger=self.clear_trigger,
                    keep=self.clear_tool_keep,
                ),
            ],
        ),
        # Layer 2: summarization (heavier, uses a cheap LLM)
        SummarizationMiddleware(
            model=LLMService(model_name=self.summary_model_name).llm,
            trigger=self.summary_trigger,
            keep=self.summary_keep,
            trim_tokens_to_summarize=self.summary_trim_tokens,
        ),
    ],
)

Questions I’d love help with:

Execution order — Which one runs first? Does the order in the middleware=[] list matter, or is it determined by hook type (before_model vs wrap_model_call)?
Data flow — Does one middleware’s output feed into the other?
- Does the cleanup affect what the summary LLM sees?
- Does the summary output become the input that cleanup operates on?
Both triggered at once — If the conversation exceeds both thresholds in the same call:
- Do both run?
- Which result reaches the main LLM?
- Are the effects combined, or does one override the other?
State vs request — I noticed SummarizationMiddleware uses before_model and ContextEditingMiddleware uses wrap_model_call. Does that mean summarization persistently modifies state["messages"] while context editing only modifies the per-call request? What are the multi-turn implications?
Recommended pattern — Is the “cheap cleanup first, summarization as fallback” pattern (often suggested in community tutorials) actually supported by this combination, or does it require custom middleware?

A concrete walkthrough showing the message list before and after each middleware runs — especially for the “both triggered” case — would be really helpful.

Using langchain 1.2.15.

Thanks!

hipvlady · April 22, 2026, 8:27pm

This is a classic shared-state synchronization problem solved by coherence protocols.

When multiple middleware edit the same context artifact, without write-ordering semantics,
you get race conditions.

Solution: Apply MESI-style lazy invalidation:

First middleware writes → marks context “clean”
Second middleware checks: valid? → reuse (skip re-fetch)
Only re-fetch if another middleware modified it

Reduces redundant context updates by 84–95% in multi-middleware pipelines.

Reference: “Token Coherence: Adapting MESI Cache Protocols…” (arXiv:2603.15183)
Implementation: pip install agent-coherence (LangGraph adapter included)

rushant001 · April 23, 2026, 5:48am

I think this answer is addressing a different layer of the problem.

I’m not trying to solve middleware coordination in general — I’m specifically asking how LangChain currently executes these two middlewares internally.

keenborder786 · April 23, 2026, 9:52am

rushant001:

Hi everyone,

I’m using ContextEditingMiddleware and SummarizationMiddleware together in create_agent, and I’d like to understand the exact execution semantics when they’re combined. The docs describe each one individually but don’t explain how they interact.

My code:
agent_executor = create_agent(
    model=self.llm_service.llm,
    tools=self.tools,
    system_prompt=system_prompt,
    middleware=[
        # Layer 1: clean up old tool results (lightweight, no LLM call)
        ContextEditingMiddleware(
            edits=[
                ClearToolUsesEdit(
                    trigger=self.clear_trigger,
                    keep=self.clear_tool_keep,
                ),
            ],
        ),
        # Layer 2: summarization (heavier, uses a cheap LLM)
        SummarizationMiddleware(
            model=LLMService(model_name=self.summary_model_name).llm,
            trigger=self.summary_trigger,
            keep=self.summary_keep,
            trim_tokens_to_summarize=self.summary_trim_tokens,
        ),
    ],
)
Questions I’d love help with:

Execution order — Which one runs first? Does the order in the middleware=[] list matter, or is it determined by hook type (before_model vs wrap_model_call)?

Data flow — Does one middleware’s output feed into the other?

Does the cleanup affect what the summary LLM sees?

Does the summary output become the input that cleanup operates on?

Both triggered at once — If the conversation exceeds both thresholds in the same call:

Do both run?

Which result reaches the main LLM?

Are the effects combined, or does one override the other?

State vs request — I noticed SummarizationMiddleware uses before_model and ContextEditingMiddleware uses wrap_model_call. Does that mean summarization persistently modifies state["messages"] while context editing only modifies the per-call request? What are the multi-turn implications?

Recommended pattern — Is the “cheap cleanup first, summarization as fallback” pattern (often suggested in community tutorials) actually supported by this combination, or does it require custom middleware?

A concrete walkthrough showing the message list before and after each middleware runs — especially for the “both triggered” case — would be really helpful.

Using langchain 1.2.15.

Thanks!

Hello @rushant001
Good question, the interaction is subtle because these two middleware use completely different hooks.

Hook types and what that means

SummarizationMiddleware → before_model: compiled into a graph node that runs before the model node, writes back to state["messages"] persistently via the add_messages reducer.
ContextEditingMiddleware → wrap_model_call: a function wrapper inside the model node, operates on a deepcopy of request.messages only, never touches state.

Execution order (always, regardless of list position)

SummarizationMiddleware.before_model
  → model_node starts
    → ContextEditingMiddleware.wrap_model_call
      → actual LLM call

List position matters within each hook type, but since these two use different hooks, summarization always runs first. ContextEditingMiddleware listed first in middleware=[] doesn’t move it before SummarizationMiddleware’s node.

Data flow

Summarization’s output feeds context editing’s input, in one direction only. The ModelRequest is built from state["messages"] after before_model has already run:

    def model_node(state: AgentState[Any], runtime: Runtime[ContextT]) -> list[Command[Any]]:
        request = ModelRequest(
            ...
            messages=state["messages"],  # already post-summarization
            ...
        )

So context editing sees the already-summarized message list. Summarization never sees what context editing does.

Both triggered at once

Both run and effects compound - neither overrides the other. The LLM sees: summarized messages, with tool results cleared within that already-reduced set. If the post-summarization list is already under context editing’s trigger threshold, context editing simply does nothing (it re-evaluates token count on its deepcopy and returns early).

State persistence - the most important distinction

	`SummarizationMiddleware`	`ContextEditingMiddleware`
Hook	`before_model`	`wrap_model_call`
Scope	Persistent: writes `RemoveMessage + summary + preserved` to `state["messages"]`	Ephemeral: `deepcopy` only, state is never touched
Next turn	Agent starts with the compressed history	Agent starts with the original, uncleared tool messages
Cost	One LLM call to the summary model	Zero LLM calls (might also depend on your other custom edits)

Context editing’s cleared "[cleared]" content exists only for the duration of that one model call. On the next turn, state["messages"] still has the original tool results, context editing re-evaluates from scratch on each call.

“Cheap cleanup first, summarization as fallback” - is it real?

Partially. These two don’t have a conditional relationship, they always run independently and their effects compound when both thresholds are exceeded. The list order doesn’t make context editing a gate that prevents summarization from firing.

What they actually are is complementary layers:

Context editing: cheap, stateless, runs on every call - keeps each individual LLM call’s effective token count low.
Summarization: expensive (LLM call), stateful - fires occasionally to compress the persistent history when it gets too large.

If you want true “run A, only run B if A wasn’t enough,” that requires a single custom middleware that checks both conditions in sequence. The built-in primitives don’t express conditional chaining between separate middleware instances.

Doc Reference:

Topic		Replies	Views
Complete context compression through middleware LangChain intro-to-langgraph , python-help	2	127	March 19, 2026
SummarizationMiddleware LangGraph python-help	8	380	December 13, 2025
Summarization Middleware Talking Shop	6	264	January 26, 2026
Questions about SummarizationMiddleware outputs for different models and using summary_prompt LangChain python-help	2	350	September 19, 2025
Exclude tools output from Summary middleware LangChain python-help	2	244	January 5, 2026