Help Needed: MongoDB Checkpoints Collection Growing Too Large

pawel-twardziak · November 6, 2025, 7:43pm

Is this expected?

Yes. LangGraph persists a checkpoint at every “super-step” of graph execution, not once per user message. In a typical agent graph, a single user turn can span several super-steps (e.g., input ingest, LLM/tool nodes, reducer/merge, finalization), so multiple checkpoints per message is expected. Additionally, each checkpoint can have multiple “writes” stored in a separate checkpoint_writes collection (interrupts, errors, scheduled tasks, per-channel writes), which increases “records per message.”

Is there built-in retention?

AFAIK, not in the MongoDB checkpointer. The implementation exposes put, putWrites, getTuple, list, and deleteThread, but does not include any max-per-thread, TTL, or pruning configuration. You’ll need to implement manual cleanup or wrap/extend the saver to add TTL fields.

Best practices to manage growth

Trim/summarize conversation state to shrink each checkpoint payload.

Use reducers to delete old messages or summarize history so fewer/lighter messages are saved in channel_values each step.

Periodic pruning job (recommended): keep the latest N checkpoints per thread and delete older ones from both checkpoints and checkpoint_writes.

Checkpoints are ordered by checkpoint_id (UUIDv6) which is time-sortable. The MongoDB saver itself sorts by checkpoint_id descending to fetch latest, so you can safely prune older IDs.

Optional TTL (Cosmos DB/MongoDB): viable only if a top-level Date field exists.

The current saver stores checkpoint.ts and metadata inside serialized blobs, so there’s no top-level Date field you can index for TTL out of the box. If you require TTL-based expiry, wrap/extend the saver to add an expiresAt (or createdAt) top-level field on both collections and create TTL indexes in Cosmos DB.

Configuration options in @langchain/langgraph-checkpoint-mongodb

There are no built-in options to cap per-thread checkpoints or auto-prune. The only deletion helper is deleteThread(threadId). Use a scheduled cleanup job or extend the saver for TTL/retention.

Migration concerns when deleting old checkpoints

Deleting old checkpoints does not break resuming from the latest checkpoint (that’s what getTuple returns by default). You will, however, lose the ability to “time-travel” or replay to older steps you delete.
To be safe, keep at least the last few checkpoints per thread (e.g., N=3–10), and always delete both from checkpoints and matching checkpoint_writes to keep data consistent. Avoid pruning when you have an active human-in-the-loop interrupt on a thread unless you retain the parent checkpoint for that interrupt.

Topic		Replies	Views
Checkpoint cleanup LangGraph cloud , python-help	5	386	March 3, 2026
Separate Long term memory and Checkpointing LangGraph intro-to-langgraph , python-help	1	441	September 29, 2025
How to Prune Old Messages and Blobs with PostgresSaver to Manage Database Size? Hi team, I have a question regarding the intended usage of PostgresSaver in LangGraph, specifically around managing the size of the checkpointing database when implementing st LangGraph self-hosted , python-help	3	463	December 19, 2025
MESSAGE_COERCION_FAILURE Using Redis Checkpointer with langraph LangGraph python-help	2	526	September 28, 2025
MemorySaver/Checkpointer is not recalling thread discussion LangGraph intro-to-langgraph , js-help	1	544	July 28, 2025