Hello friends, I am working on a multi-agent system and wanted to ask how you would approach a challenge I’m facing.
I’m building a ReAct agent with a web search tool, and the topic it researches can require around 40 tool calls. The agent accumulates text after each tool call and passes it to the LLM again, so I need a way to handle this accumulation efficiently to optimize latency and token usage.
One idea I had is to compress the web search results after each tool call. However, LangChain seems to require that each AIMessage with a tool call be followed by a ToolMessage generated in the tool_node with the same tool_call_id in the message history. This prevents me from inserting a compression node that adds a compressed ToolMessage in between.
How would you handle this situation? Are there any resources or design patterns you’d recommend for managing this kind of context growth in a ReAct workflow?