LangGraph + PostgreSQL: Chat history and summarization best practice

Hi everyone,

I am building a chatbot using LangGraph with PostgreSQL as the checkpointer and storage layer.

Current setup:

  • Using LangGraph agent with tools and memory
  • PostgreSQL to store chat history (messages table)
  • UI displays chat conversations similar to ChatGPT

Problem:
I am facing confusion around handling long conversations and summarization.

My requirements:

  1. UI should display the full chat history (like ChatGPT sidebar/chat window)
  2. For LLM context, I want to send:
    • summarized older messages
      • recent messages
  3. Avoid token overflow and infinite loops

Issue I am facing:

  • When I summarize older messages and delete them from the database,
    the UI breaks because full chat history is lost
  • If I don’t delete messages, context becomes too large for the model
  • I am not sure what is the correct architecture to:
    • maintain full history for UI
    • and optimized context for LLM

Questions:

  1. What is the recommended pattern in LangGraph for handling:
    • PostgreSQL chat history
    • summarization
    • recent message window?
  2. Should I NEVER delete messages and instead store summary separately?
  3. How should I structure DB tables (messages vs summary)?
  4. When is the best time to update summary (after N messages or token limit)?
  5. Any example repo or reference architecture for this?

Goal:
To implement a ChatGPT-like system where:

  • UI shows full conversation
  • LLM uses summary + recent context efficiently

Any guidance would be really helpful :folded_hands:

Hello @prakash , welcome to langchain community!!!

Why Your UI Is Breaking

Here’s the core issue: you’re using the wrong data source for your UI.

The official docs at Context Engineering state this directly:

“The summarized conversation history is permanently updated — future turns will see the summary instead of the original messages.”

When SummarizationMiddleware fires, it issues a RemoveMessage(id=REMOVE_ALL_MESSAGES), this permanently wipes all messages from the LangGraph checkpointer state and replaces them with [summary] + [last N messages]. That’s by design for the LLM. But if your UI reads from that same checkpointer state, the history disappears from the user’s view.

The Context Engineering guide draws the key distinction:

Type What it is Docs term
Transient context What the LLM sees for a single call — modified without changing state “Per-call, not saved”
Persistent context What gets saved in state across turns — life-cycle hooks modify this permanently “Saved for all future turns”

SummarizationMiddleware is a persistent life-cycle operation. The checkpointer is not a message store , it is a LLM context management tool.


The Fix: Two Separate Storage Layers

Feature LangGraph Checkpointer (PostgresSaver) Your Own messages Table
Contents [summary_msg] + last N messages Every message ever sent
Managed by LangGraph automatically Your application code
Used by LLM (working context) UI (full history)
Mutability Rewritten on summarization Append-only, never deleted
Query for UI? Never Always
Query for LLM? Always (via checkpointer) Never

Step-by-Step Implementation (Just an example so you can take inspiration for your architecture)

1. Install dependencies

Per the Short-term memory → In production section:

pip install langgraph-checkpoint-postgres "psycopg[binary,pool]"

2. Create your DB tables

-- YOUR table , powers the UI, never delete rows
CREATE TABLE messages (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    thread_id   TEXT NOT NULL,
    role        TEXT NOT NULL CHECK (role IN ('human', 'ai', 'tool')),
    content     TEXT NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ON messages (thread_id, created_at);

-- LangGraph creates its own tables automatically via checkpointer.setup():
--   checkpoints, checkpoint_blobs, checkpoint_writes
-- Leave those alone ; don't query or modify them directly.

3. Set up your agent with PostgresSaver and SummarizationMiddleware

Combining the Short-term memory → In production and Summarize messages sections:

from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver

DB_URI = "postgresql://user:password@localhost:5432/mydb"

async with AsyncPostgresSaver.from_conn_string(DB_URI) as checkpointer:
    await checkpointer.setup()  # run once to create LangGraph tables

    agent = create_agent(
        model="openai:gpt-4o",
        tools=[...],
        checkpointer=checkpointer,
        middleware=[
            SummarizationMiddleware(
                model="openai:gpt-4o-mini",  # cheaper model for summarization
                trigger=("fraction", 0.75),  # fire when 75% of context window is used
                keep=("messages", 20),       # keep the 20 most recent messages
            )
        ],
    )

The docs note: “You need to call checkpointer.setup() the first time you’re using the Postgres checkpointer.”

4. Your chat handler — write to YOUR table on every turn

import asyncpg

async def chat(thread_id: str, user_text: str, db_conn: asyncpg.Connection) -> str:
    # Step 1: Write user message to YOUR table immediately
    # (UI can render it before the LLM even responds)
    await db_conn.execute(
        "INSERT INTO messages (thread_id, role, content) VALUES ($1, 'human', $2)",
        thread_id, user_text,
    )

    # Step 2: Run the agent
    # Checkpointer loads [summary + recent N] for the LLM - you don't touch this
    # SummarizationMiddleware may fire and compact the checkpointer state
    # but YOUR messages table is completely untouched
    result = await agent.ainvoke(
        {"messages": [{"role": "user", "content": user_text}]},
        config={"configurable": {"thread_id": thread_id}},
    )

    # Step 3: Write AI response to YOUR table
    ai_response = result["messages"][-1].content
    await db_conn.execute(
        "INSERT INTO messages (thread_id, role, content) VALUES ($1, 'ai', $2)",
        thread_id, ai_response,
    )

    return ai_response

5. Your UI query — reads from YOUR table, never the checkpointer

async def get_full_chat_history(thread_id: str, db_conn: asyncpg.Connection) -> list[dict]:
    rows = await db_conn.fetch(
        """
        SELECT role, content, created_at
        FROM messages
        WHERE thread_id = $1
        ORDER BY created_at ASC
        """,
        thread_id,
    )
    return [{"role": r["role"], "content": r["content"]} for r in rows]

Answering Your Specific Questions

Q: Should I NEVER delete messages from the UI table?

Correct, never delete from your messages table. It is an append-only audit log. The Short-term memory → Delete messages section covers RemoveMessage usage, that API is for managing the LangGraph state (checkpointer), not your own application table. Keep those two operations completely separate.


Q: How should I structure the DB tables?

messages               → your append-only full history (UI source of truth)
checkpoints            → LangGraph managed, do not touch
checkpoint_blobs       → LangGraph managed, do not touch
checkpoint_writes      → LangGraph managed, do not touch

If you also want long-term memory across sessions (user preferences, facts about the user), the Long-term memory docs show how to add a PostgresStore alongside the checkpointer:

from langgraph.store.postgres import PostgresStore

with PostgresStore.from_conn_string(DB_URI) as store:
    store.setup()
    agent = create_agent(
        "openai:gpt-4o",
        tools=[...],
        checkpointer=checkpointer,
        store=store,           # separate from checkpointer!
    )

Q: When is the best time to trigger summarization?

The Summarize messages section and the Context Engineering guide both show all three trigger modes. The ("fraction", ...) mode is the most robust because it automatically reads model.profile.max_input_tokens and adapts to whatever model you use:

# Best: automatic, adapts to any model's context window
SummarizationMiddleware(
    model="openai:gpt-4o-mini",
    trigger=("fraction", 0.75),   # fire when 75% of context window is used
    keep=("fraction", 0.25),      # after compaction, only 25% of window is used
)

# Explicit token count, use this if your model has no profile
SummarizationMiddleware(
    model="openai:gpt-4o-mini",
    trigger=("tokens", 100_000),
    keep=("messages", 20),
)

# Fire on EITHER condition - whichever hits first
SummarizationMiddleware(
    model="openai:gpt-4o-mini",
    trigger=[("fraction", 0.80), ("messages", 100)],
    keep=("messages", 20),
)

Per the docs: “See SummarizationMiddleware for more configuration options.”Short-term memory


Q: How do I avoid infinite loops?

The keep parameter is what prevents this. The gap between trigger and keep determines how long until the next summarization fires. With trigger=("fraction", 0.75) and keep=("fraction", 0.25), the context drops from 75% to ~25% after each cycle, giving you a large buffer before the next cycle fires.


Q: Where can I read more?

All the relevant official docs pages:

Topic Link
Short-term memory (checkpointer, summarization, trim, delete) oss/python/langchain/short-term-memory
Long-term memory (cross-session store) oss/python/langchain/long-term-memory
Context engineering (transient vs persistent, summarization) oss/python/langchain/context-engineering
Memory concepts (short vs long term, types) oss/python/concepts/memory
LangGraph add-memory (PostgresSaver examples) oss/python/langgraph/add-memory
Middleware overview oss/python/langchain/middleware/overview

The Mental Model to Remember

The Memory concepts page describes it clearly:

“Thread-scoped memory tracks the ongoing conversation by maintaining message history within a single session.”

That thread-scoped memory (checkpointer) is the LLM’s working RAM, it gets compacted when it gets too full. Your messages table is the permanent disk record that never shrinks. Once you separate these two concerns the whole system becomes straightforward: UI never breaks, LLM never overflows, summarization runs silently in the background.

I hope this helps, if you have more concerns, do not hesistate to reach out.