Hello @prakash , welcome to langchain community!!!
Why Your UI Is Breaking
Here’s the core issue: you’re using the wrong data source for your UI.
The official docs at Context Engineering state this directly:
“The summarized conversation history is permanently updated — future turns will see the summary instead of the original messages.”
When SummarizationMiddleware fires, it issues a RemoveMessage(id=REMOVE_ALL_MESSAGES), this permanently wipes all messages from the LangGraph checkpointer state and replaces them with [summary] + [last N messages]. That’s by design for the LLM. But if your UI reads from that same checkpointer state, the history disappears from the user’s view.
The Context Engineering guide draws the key distinction:
| Type |
What it is |
Docs term |
| Transient context |
What the LLM sees for a single call — modified without changing state |
“Per-call, not saved” |
| Persistent context |
What gets saved in state across turns — life-cycle hooks modify this permanently |
“Saved for all future turns” |
SummarizationMiddleware is a persistent life-cycle operation. The checkpointer is not a message store , it is a LLM context management tool.
The Fix: Two Separate Storage Layers
| Feature |
LangGraph Checkpointer (PostgresSaver) |
Your Own messages Table |
| Contents |
[summary_msg] + last N messages |
Every message ever sent |
| Managed by |
LangGraph automatically |
Your application code |
| Used by |
LLM (working context) |
UI (full history) |
| Mutability |
Rewritten on summarization |
Append-only, never deleted |
| Query for UI? |
Never |
Always |
| Query for LLM? |
Always (via checkpointer) |
Never |
Step-by-Step Implementation (Just an example so you can take inspiration for your architecture)
1. Install dependencies
Per the Short-term memory → In production section:
pip install langgraph-checkpoint-postgres "psycopg[binary,pool]"
2. Create your DB tables
-- YOUR table , powers the UI, never delete rows
CREATE TABLE messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
thread_id TEXT NOT NULL,
role TEXT NOT NULL CHECK (role IN ('human', 'ai', 'tool')),
content TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ON messages (thread_id, created_at);
-- LangGraph creates its own tables automatically via checkpointer.setup():
-- checkpoints, checkpoint_blobs, checkpoint_writes
-- Leave those alone ; don't query or modify them directly.
3. Set up your agent with PostgresSaver and SummarizationMiddleware
Combining the Short-term memory → In production and Summarize messages sections:
from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
DB_URI = "postgresql://user:password@localhost:5432/mydb"
async with AsyncPostgresSaver.from_conn_string(DB_URI) as checkpointer:
await checkpointer.setup() # run once to create LangGraph tables
agent = create_agent(
model="openai:gpt-4o",
tools=[...],
checkpointer=checkpointer,
middleware=[
SummarizationMiddleware(
model="openai:gpt-4o-mini", # cheaper model for summarization
trigger=("fraction", 0.75), # fire when 75% of context window is used
keep=("messages", 20), # keep the 20 most recent messages
)
],
)
The docs note: “You need to call checkpointer.setup() the first time you’re using the Postgres checkpointer.”
4. Your chat handler — write to YOUR table on every turn
import asyncpg
async def chat(thread_id: str, user_text: str, db_conn: asyncpg.Connection) -> str:
# Step 1: Write user message to YOUR table immediately
# (UI can render it before the LLM even responds)
await db_conn.execute(
"INSERT INTO messages (thread_id, role, content) VALUES ($1, 'human', $2)",
thread_id, user_text,
)
# Step 2: Run the agent
# Checkpointer loads [summary + recent N] for the LLM - you don't touch this
# SummarizationMiddleware may fire and compact the checkpointer state
# but YOUR messages table is completely untouched
result = await agent.ainvoke(
{"messages": [{"role": "user", "content": user_text}]},
config={"configurable": {"thread_id": thread_id}},
)
# Step 3: Write AI response to YOUR table
ai_response = result["messages"][-1].content
await db_conn.execute(
"INSERT INTO messages (thread_id, role, content) VALUES ($1, 'ai', $2)",
thread_id, ai_response,
)
return ai_response
5. Your UI query — reads from YOUR table, never the checkpointer
async def get_full_chat_history(thread_id: str, db_conn: asyncpg.Connection) -> list[dict]:
rows = await db_conn.fetch(
"""
SELECT role, content, created_at
FROM messages
WHERE thread_id = $1
ORDER BY created_at ASC
""",
thread_id,
)
return [{"role": r["role"], "content": r["content"]} for r in rows]
Answering Your Specific Questions
Q: Should I NEVER delete messages from the UI table?
Correct, never delete from your messages table. It is an append-only audit log. The Short-term memory → Delete messages section covers RemoveMessage usage, that API is for managing the LangGraph state (checkpointer), not your own application table. Keep those two operations completely separate.
Q: How should I structure the DB tables?
messages → your append-only full history (UI source of truth)
checkpoints → LangGraph managed, do not touch
checkpoint_blobs → LangGraph managed, do not touch
checkpoint_writes → LangGraph managed, do not touch
If you also want long-term memory across sessions (user preferences, facts about the user), the Long-term memory docs show how to add a PostgresStore alongside the checkpointer:
from langgraph.store.postgres import PostgresStore
with PostgresStore.from_conn_string(DB_URI) as store:
store.setup()
agent = create_agent(
"openai:gpt-4o",
tools=[...],
checkpointer=checkpointer,
store=store, # separate from checkpointer!
)
Q: When is the best time to trigger summarization?
The Summarize messages section and the Context Engineering guide both show all three trigger modes. The ("fraction", ...) mode is the most robust because it automatically reads model.profile.max_input_tokens and adapts to whatever model you use:
# Best: automatic, adapts to any model's context window
SummarizationMiddleware(
model="openai:gpt-4o-mini",
trigger=("fraction", 0.75), # fire when 75% of context window is used
keep=("fraction", 0.25), # after compaction, only 25% of window is used
)
# Explicit token count, use this if your model has no profile
SummarizationMiddleware(
model="openai:gpt-4o-mini",
trigger=("tokens", 100_000),
keep=("messages", 20),
)
# Fire on EITHER condition - whichever hits first
SummarizationMiddleware(
model="openai:gpt-4o-mini",
trigger=[("fraction", 0.80), ("messages", 100)],
keep=("messages", 20),
)
Per the docs: “See SummarizationMiddleware for more configuration options.” → Short-term memory
Q: How do I avoid infinite loops?
The keep parameter is what prevents this. The gap between trigger and keep determines how long until the next summarization fires. With trigger=("fraction", 0.75) and keep=("fraction", 0.25), the context drops from 75% to ~25% after each cycle, giving you a large buffer before the next cycle fires.
Q: Where can I read more?
All the relevant official docs pages:
The Mental Model to Remember
The Memory concepts page describes it clearly:
“Thread-scoped memory tracks the ongoing conversation by maintaining message history within a single session.”
That thread-scoped memory (checkpointer) is the LLM’s working RAM, it gets compacted when it gets too full. Your messages table is the permanent disk record that never shrinks. Once you separate these two concerns the whole system becomes straightforward: UI never breaks, LLM never overflows, summarization runs silently in the background.
I hope this helps, if you have more concerns, do not hesistate to reach out.