Hi,
I’m working on a project (for educational purposes) and trying to create a chatbot where I can upload files and ask questions on each file.
When selecting a file and querying “summarize the content of the file”, the chatbot correctly returns a response.
When uploading another file with the same query, the chatbot repeats previous response.
When disabling short-term memory, the issue is resolved.
I’m using a supervisor agent with a sub-agent, that performs RAG on a file. as the LLM model I’ve selected for the supervisor agent doesn’t support tool-calling, I’m using a middleware with the @dynamic_promptdynamic_prompt decorator hook. the middleware passes the query and file to the sub-agent, the response from the sub-agent is returned in the middleware to the supervisor agent.
Have you looked at the state transitions in your agent? The fact that disabling short-term memory “fixes” it means your agent’s state is getting polluted between queries.
You can use the before_model and after_model decorators to print the states. You’ll likely see the context from the first file sticking around and confusing the supervisor on the query.
I would use the before_model decorator to invoke the sub-agent and update the state with the subagent’s response. That way the supervisor has the necessary context to synthesize an answer.
Using dynamic prompt with short-term memory does mess up the supervisor agent as I’ve injected the sub-agent response into the system prompt text.
I’ve also tested using the @before_model_call hook. In the hook function, I’ve tried injecting the sub-agent response once as a HumanMessage and once as an AiMessage. I’m seeing better results with the AIMessage.
Now, I’m still wondering what I should do. Is it the correct way? Or should I inject the sub-agent response as a content block to user’s query HumanMessage?
UPDATE:
after running the code again, injecting context as an AIMessage is making problems and confusing the supervisor. I’ve now switched to @wrap_model_call in order not to keep the sub-agent response in chat history and injecting the sub-agent response in a HumanMessage instead.