hi @Huimin-station
@Bitcot_Kaushal is right that retrieved documents are meant to be temporary - but that’s only fully true for one of the three patterns you might be using, and the mechanics are worth spelling out precisely.
Mechanics clarification
In create_agent, the messages field uses the add_messages reducer. If retrieval results land there every turn - which is exactly what happens in the standard agentic RAG setup, where the retriever runs as a tool and ToolNode returns a ToolMessage - they will grow like conversation history. So the answer to your original question is actually “yes, they do accumulate” for this pattern.
Important distinction.
RAG does not “by definition” concatenate everything. This is an architectural decision based on which state key stores retrieved context and which reducer that key uses:
messages with add_messages → append semantics, accumulates across turns
- a separate
context field with no reducer → LastValue semantics (LangGraph’s default), overwrites on each retrieval, no accumulation
- a plain LCEL chain with no
StateGraph → retrieved documents are local Python variables inside the chain, never written to any state, gone after the call returns
The accumulation behavior is not a RAG property - it is a state schema choice.
Concrete architecture recommendation
Keep retrieval as ephemeral context for the current model call and persist only high-value artifacts: summaries, extracted facts, citations, user preferences. Raw retrieved chunks rarely need to survive beyond the turn they were fetched for.
Implementation pattern
Use a separate state key for retrieval context with overwrite semantics instead of appending to messages:
class RAGState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages] # this accumulates - intentional
context: list[Document] # this overwrites on each retrieval
The context field uses LastValue by default (no reducer annotation needed). Each time the retrieve node writes to it, the previous value is replaced. Note: if retrieval is conditional and doesn’t run in a given turn, the previous value persists in the checkpoint - it is not auto-cleared. If you need the field to clear itself when retrieval is skipped, use EphemeralValue instead.
Operational safety valve for long threads
Combine a checkpointer with trim, delete, or summarization strategies to control the overall context budget. SummarizationMiddleware (available in langchain.agents.middleware) replaces old message history with a rolling summary, keeping the window bounded regardless of conversation length.