Hi @Nikfury
Is this expected?
Yes. LangGraph persists a checkpoint at every “super-step” of graph execution, not once per user message. In a typical agent graph, a single user turn can span several super-steps (e.g., input ingest, LLM/tool nodes, reducer/merge, finalization), so multiple checkpoints per message is expected. Additionally, each checkpoint can have multiple “writes” stored in a separate checkpoint_writes collection (interrupts, errors, scheduled tasks, per-channel writes), which increases “records per message.”
Is there built-in retention?
AFAIK, not in the MongoDB checkpointer. The implementation exposes put, putWrites, getTuple, list, and deleteThread, but does not include any max-per-thread, TTL, or pruning configuration. You’ll need to implement manual cleanup or wrap/extend the saver to add TTL fields.
Best practices to manage growth
- Trim/summarize conversation state to shrink each checkpoint payload.
Use reducers to delete old messages or summarize history so fewer/lighter messages are saved in channel_values each step.
- Periodic pruning job (recommended): keep the latest N checkpoints per thread and delete older ones from both
checkpointsandcheckpoint_writes.
Checkpoints are ordered by checkpoint_id (UUIDv6) which is time-sortable. The MongoDB saver itself sorts by checkpoint_id descending to fetch latest, so you can safely prune older IDs.
- Optional TTL (Cosmos DB/MongoDB): viable only if a top-level Date field exists.
The current saver stores checkpoint.ts and metadata inside serialized blobs, so there’s no top-level Date field you can index for TTL out of the box. If you require TTL-based expiry, wrap/extend the saver to add an expiresAt (or createdAt) top-level field on both collections and create TTL indexes in Cosmos DB.
Configuration options in @langchain/langgraph-checkpoint-mongodb
There are no built-in options to cap per-thread checkpoints or auto-prune. The only deletion helper is deleteThread(threadId). Use a scheduled cleanup job or extend the saver for TTL/retention.
Migration concerns when deleting old checkpoints
-
Deleting old checkpoints does not break resuming from the latest checkpoint (that’s what
getTuplereturns by default). You will, however, lose the ability to “time-travel” or replay to older steps you delete. -
To be safe, keep at least the last few checkpoints per thread (e.g., N=3–10), and always delete both from
checkpointsand matchingcheckpoint_writesto keep data consistent. Avoid pruning when you have an active human-in-the-loop interrupt on a thread unless you retain the parent checkpoint for that interrupt.