Regarding whether the knowledge base recall can be excluded from the context and used as a node-level system prompt for the response model

Huimin-station · February 27, 2026, 3:27am

I recently learned about vector databases. When I recall valid information and give it to the model for a response, I’m using LangGraph. Do I need to create a RAG node to fetch knowledge and directly append it to the global state as historical memory? Is this a common practice? But I wonder if it might be better to only write the recalled information into the system prompt of the response model node to get the reply, without adding it to the state context.

yech · February 27, 2026, 10:39am

If you’re writing memory information into the system prompt of a model node, I think you still need to integrate it yourself. I view the system prompt more as a direction corrector or role definer. You can check out “The persona selection model” article by Anthropic. I’d suggest avoiding putting anything into the model node’s system prompt—even memory keywords—as I believe it can cause task drift.

I’m relatively new to LangGraph, but I maintain context manually in my projects. For context compression or forced memory persistence, I only use RAG storage for non-structured data (for fuzzy matching or multimodal memories). For structured data, I use relational databases. In complex scenarios, I employ hybrid retrieval strategies with appropriate chunking.

I currently use Qdrant as my vector database. For multimodal inputs that need persistent memory, you can also use relational databases for mapping. However, Qdrant’s payload feature allows attaching structured conditions during vector search, enabling hybrid retrieval.

I think it’s reasonable to treat RAG as an external tool—placing retrieved content after the user’s input but before the model’s output. If you append it directly to the global state as historical memory, the context will bloat. A better approach might be to only store the user’s query, then use payload keywords for attribute pre-filtering during retrieval.

This is just my rough idea. LangGraph probably has more elegant solutions—I’m just not familiar enough with it yet. Would appreciate insights from more experienced folks.

yech · February 27, 2026, 10:39am

The persona selection model \ Anthropic

yech · February 27, 2026, 10:47am

Here’s a lesser-known fact, or you could call it a “cold knowledge” tip: When storing data in a vector database, the payload must include the original text chunk. This is because embeddings cannot be reversed—there’s no way to reconstruct the original text from a vector. Current retrieval methods are all based on similarity matching, not one-to-one restoration. If you need to return the original content, you have to store it explicitly in the payload for mapping.

Huimin-station · February 27, 2026, 11:28am

First of all, thank you for your reply. I hope to have more exchanges with you. What I specifically mean is that in a model response node, if you directly invoke a system prompt, it only affects this particular conversation and won’t be added to the state messages. Would this reduce the token count?

async def model_messages_get_node(state:dict):
    model_list_get = await model_list()
    return {
        "messages":[
            model_list_get[2].invoke([SystemMessage(content=model_messages_get_prompt)]+ state["messages"])
        ]
    }

By the way, are you also a beginner with langGraph? I hope we can have more interactions and collaborate more with you.

Huimin-station · February 27, 2026, 11:30am

Or it’s not a system prompt, but changed to another type of prompt. What I mean is to briefly inform the model in this way, rather than forming a permanent memory in the context.

yech · February 27, 2026, 11:37am

Yes, I think it would reduce tokens. I see the conversation memory between user and model as a unique list. Whether it’s sub-agents, tool calls, or structured outputs—they all operate around this list (reading from it or writing to it). Not sure if this analogy is accurate, haha.

Or it’s not a system prompt, but changed to another type of prompt. What I mean is to briefly inform the model in this way, rather than forming a permanent memory in the context.

Just as you said.

yech · February 27, 2026, 11:38am

Sure, I think so too.

yech · February 27, 2026, 11:44am

I’m also a beginner—I wasn’t even clear on how LangGraph’s state appending works. I actually had to ask an AI what the difference is between that and my own manual context management, just so I could reply to you naturally, haha.

Huimin-station · February 27, 2026, 12:12pm

May I ask where you are from? If there’s a chance, we could work on some small projects together to strengthen our knowledge.

yech · February 27, 2026, 12:14pm

我来自中国

Huimin-station · February 27, 2026, 12:19pm

那就更方便了，因为我也是bro

pawel-twardziak · February 27, 2026, 3:53pm

hehe guys, speak English

Huimin-station · February 27, 2026, 4:27pm

Of course, haha, I’m really happy to see your comment.

Topic		Replies	Views
Is the way RAG stores retrieved information in the state also just by directly concatenating it like historical memory? LangGraph intro-to-langgraph , python-help	4	75	February 28, 2026
Issues about the entry points of RAG knowledge LangGraph intro-to-langgraph , python-help	2	56	February 28, 2026
LangGraph Cloud – Using the Built-in PostgreSQL Store for Long-Term Memory (LTM) and Vector Similarity Search Deployment cloud , python-help	5	1256	July 8, 2025
DOC Rag Sample clarification LangSmith Product Help intro-to-langgraph , product-feedback	0	218	August 20, 2025
Building ReAct RAG LangGraph python-help	5	717	July 15, 2025

Regarding whether the knowledge base recall can be excluded from the context and used as a node-level system prompt for the response model

Related topics