hi @george32
The callbacks config in LangChain/LangGraph is an inheritable property. Any callback handler added to a parent runnable automatically propagates to all child runnables via getChild(). When both parent and child independently add their own StreamMessagesHandler (by both calling .stream() with "messages" mode), you get two handlers listening to the same token events = double output. Breaking the chain with callbacks: [] or restructuring as a subgraph node are both valid fixes.
Why callbacks: [] fixes the issue?
By passing callbacks: [] in the subagent’s .stream() config you break the inheritance chain. The empty array replaces the inherited callback list, so the subagent starts fresh with only the new StreamMessagesHandler it creates internally. Result: each token is emitted exactly once.
This issue is a well-known footgun in LangGraph JS when nesting agents or subgraphs that both stream in "messages" mode. The root cause is callback handler inheritance - the parent’s StreamMessagesHandler propagates into child runnables and listens to the same LLM token events that the child’s own handler also listens to, producing every token twice.
Solutions
1: Pass callbacks: [] on the subagent stream (your workaround)
2: Use .invoke() instead of .stream() on the subagent
If you only need the parent’s stream for the final UI and the subagent’s output can be returned as a complete message, call .invoke() on the subagent inside the tool. The parent’s StreamMessagesHandler will still capture the subagent’s LLM tokens (since it inherits into the child), giving you streaming in the parent output without any duplication:
const tool = tool(async (input) => {
const subAgent = createAgent({ model, tools: subTools });
const result = await subAgent.invoke({
messages: [{ role: "user", content: input.query }],
});
return result.messages.at(-1)?.content ?? "";
}, { name: "ask_subagent", schema: z.object({ query: z.string() }) });
3: Model the subagent as a subgraph node (for complex setups)
Instead of spawning a subagent inside a tool, add it as a proper subgraph node:
const subAgent = createAgent({ model, tools: subTools });
const parentGraph = new StateGraph(AgentState)
.addNode("main_agent", mainAgentNode)
.addNode("sub_agent", subAgent) // subgraph node
.addEdge("main_agent", "sub_agent")
.compile();
// Stream with subgraphs to get events from both
for await (const [ns, mode, chunk] of await parentGraph.stream(
input,
{ streamMode: "messages", subgraphs: true }
)) {
// ns tells you which agent emitted the chunk
}
LangGraph handles callback scoping correctly for subgraph nodes, and the subgraphs: true flag gives you namespaced events so you can tell which agent produced each chunk.
4: Use the nostream tag to suppress streaming from internal llm calls
If the subagent’s LLM is called inside a tool and you don’t want its tokens leaking into the parent stream at all, tag it:
const quietModel = model.withConfig({
tags: ["langsmith:nostream"] // or just "nostream"
});
const subAgent = createAgent({ model: quietModel, tools: subTools });