Is the way RAG stores retrieved information in the state also just by directly concatenating it like historical memory?

Huimin-station · February 28, 2026, 3:12am

Below is the code I used to implement historical memory. Does RAG also directly concatenate to the historical dialogue like this?



history = []

while True:
    user = input("\nusr_input：")
    user_content = user.strip()
    ai_content = ""
    async for chunk in agent.astream({"messages":history + [HumanMessage(user)]},stream_mode=["messages"]):
        ai_content += chunk[-1][0].content
        if isinstance(chunk[-1][0],AIMessageChunk) and (chunk[-1][0].content != ("False" or "True")):
            print(chunk[-1][0].content,end="",flush=True)
    print("\nUSR:",user)
    print("\nAI:",ai_content)
    history.append(HumanMessage(user))
    history.append(AIMessage(ai_content))

Bitcot_Kaushal · February 28, 2026, 3:54am

Hey @Huimin-station ,

You’re basically saving the whole chat history and sending it back each time. That’s what conversation memory is. In RAG, the documents that are found are not added to history permanently. They were only pulled in for that one question, and the model used them as context to help it answer better. Instead of seeing RAG as something that keeps coming up in conversation, think of it as “temporary reference material.”

pawel-twardziak · February 28, 2026, 8:20am

hi @Huimin-station

@Bitcot_Kaushal is right that retrieved documents are meant to be temporary - but that’s only fully true for one of the three patterns you might be using, and the mechanics are worth spelling out precisely.

Mechanics clarification

In create_agent, the messages field uses the add_messages reducer. If retrieval results land there every turn - which is exactly what happens in the standard agentic RAG setup, where the retriever runs as a tool and ToolNode returns a ToolMessage - they will grow like conversation history. So the answer to your original question is actually “yes, they do accumulate” for this pattern.

Important distinction.

RAG does not “by definition” concatenate everything. This is an architectural decision based on which state key stores retrieved context and which reducer that key uses:

messages with add_messages → append semantics, accumulates across turns
a separate context field with no reducer → LastValue semantics (LangGraph’s default), overwrites on each retrieval, no accumulation
a plain LCEL chain with no StateGraph → retrieved documents are local Python variables inside the chain, never written to any state, gone after the call returns

The accumulation behavior is not a RAG property - it is a state schema choice.

Concrete architecture recommendation

Keep retrieval as ephemeral context for the current model call and persist only high-value artifacts: summaries, extracted facts, citations, user preferences. Raw retrieved chunks rarely need to survive beyond the turn they were fetched for.

Implementation pattern

Use a separate state key for retrieval context with overwrite semantics instead of appending to messages:

class RAGState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]  # this accumulates - intentional
    context: list[Document]                               # this overwrites on each retrieval

The context field uses LastValue by default (no reducer annotation needed). Each time the retrieve node writes to it, the previous value is replaced. Note: if retrieval is conditional and doesn’t run in a given turn, the previous value persists in the checkpoint - it is not auto-cleared. If you need the field to clear itself when retrieval is skipped, use EphemeralValue instead.

Operational safety valve for long threads

Combine a checkpointer with trim, delete, or summarization strategies to control the overall context budget. SummarizationMiddleware (available in langchain.agents.middleware) replaces old message history with a rolling summary, keeping the window bounded regardless of conversation length.

Huimin-station · February 28, 2026, 8:54am

@pawel-twardziak Thank you very much for your answer. @Bitcot_Kaushal I also really appreciate your answer.

So you are suggesting that I make the RAG function into a node, put each retrieval result into a unique key in the state, and overwrite it each time it’s called?

pawel-twardziak · February 28, 2026, 4:02pm

hi @Huimin-station

You don’t need (but maybe it’s reasonable, not a must at least) a new key for each retrieval. You need one single key (e.g. context) that gets replaced on every retrieval. “Unique” in the sense of “separate from messages”, not “unique per invocation”.

The whole point is that by putting retrieval in its own node that writes to a dedicated state key, you take the retrieved documents out of messages entirely. They no longer travel through add_messages, so they don’t pile up.

Topic		Replies	Views
Regarding whether the knowledge base recall can be excluded from the context and used as a node-level system prompt for the response model LangGraph intro-to-langgraph , python-help	13	139	February 27, 2026
Issues about the entry points of RAG knowledge LangGraph intro-to-langgraph , python-help	2	56	February 28, 2026
Building ReAct RAG LangGraph python-help	5	717	July 15, 2025
Dynamic, large "Variables" in LangGraph / ReAct Agent (without polluting chat history)? LangGraph js-help	2	597	July 18, 2025
DOC Rag Sample clarification LangSmith Product Help intro-to-langgraph , product-feedback	0	217	August 20, 2025

Is the way RAG stores retrieved information in the state also just by directly concatenating it like historical memory?

Related topics