If I use subgraph=True + stream_mode='messages' when call stream(), the arguments of tool call become incorrect

    data_agent = create_react_agent(
        model=llm,
        tools=[get_cards_by_panel_id],
        prompt=template,
        name="data manager"
    )

    supervisor_agent = create_react_agent(
        model=supervisor_llm,
        tools=[],
        prompt=supervisor_template,
        name="business manager"
    )

    def call_data_agent(
        state: MessagesState,
    ) -> Command[Literal["supervisor_agent"]]:
        response = data_agent.invoke(state)
        update = {**response}
        return Command(update=update, goto="supervisor_agent")
    
    def manager_node(
    state: MessagesState, config
    ) -> Command[Literal["data_agent", END]]:
        response = supervisor_agent.invoke(state)
        
        if response["messages"][-1].content.find("next_agent:data_agent") >= 0 or response["messages"][-1].content.find("next_agent: data_agent") >= 0:
            next = "data_agent"
        else:
            next = END
        update = {**response}

        return Command(
            update=update,
            goto=next,
        )

    builder = StateGraph(MessagesState)
    builder.add_node("supervisor_agent", manager_node)
    builder.add_node("data_agent", call_data_agent)

    builder.add_edge(START, "supervisor_agent")
    builder.add_edge("supervisor_agent", END)
    graph = builder.compile()

and the tool `get_cards_by_panel_id` is like below:

@tool
def get_cards_by_panel_id(id: int) -> str:
...

my question is like “give me the information of panel which id is 35563“, if I call graph.stream() like this:

for event in graph.stream({"messages": [{"role": "user", "content": user_input}]}, config, context=ctx):
    for value in event.values():
        print("Assistant:", value["messages"][-1].content)

the tool call is OK( I print the messages of data_agent and the argument in method `get_cards_by_panel_id` the argument is 35563). But if I call graph.stream() like this:

for namespace, data in graph.stream({"messages": [{"role": "user", "content": user_input}]}, config, context=ctx, subgraphs=True, stream_mode="messages"):
    if data[0].content:
        print(data[0].content, end="", flush=True)

the argument become 3563 (while the tokens produced by llm is 35563, but arguments become 3563).

I want to know how to resolve this problem?

python version: 3.9.22

langgraph version: 0.6.7

What happens if you don’t use subgraph=True but keep messages as the stream mode? The behavior only happens when both are true?

If I just set stream_mode, the llm won’t produce message token by token but the argument of tool is right. So I think if it has some association with stream?

I also tried a single agent, if I set stream_mode=’messages’, the argument of tool call is incorrect too. That makes me guess if it’s caused by stream token? So that the langgraph cannot get a complete content?

Could you provide more detail on how you’re providing the tool result? I’d be surprised if streaming missed tokens generally since we would expect to see similar bugs in a lot of places

Hi @mogitozhang

I figured out this. Let me know if works.

You’re reading partial LLM token chunks. In stream_mode="messages", the stream yields incremental message pieces (tokens), not finalized structured tool calls. When you also enable subgraphs=True, the chunk boundaries and namespacing can make this more visible. The tool itself still receives the correct argument (e.g., 35563), but mid-stream chunks can show transient, incomplete JSON like 3563.

Instead (pick one):

  • Use a state stream for finalized data: rely on updates (or values) to read the finalized tool call and arguments rather than token chunks.
for ns, update in graph.stream(
    {"messages": [{"role": "user", "content": user_input}]},
    config,
    subgraphs=True,
    stream_mode="updates",
):
    # update is a dict of node_name -> {state_delta}
    node_update = next(iter(update.values()))
    if "messages" in node_update:
        last = node_update["messages"][-1]
        # When the AI requests a tool, arguments here are finalized
        tool_calls = getattr(last, "tool_calls", None) or last.get("tool_calls")
        if tool_calls:
            args_json = tool_calls[0]["function"]["arguments"]
            print("Final tool args:", args_json)
  • Keep token streaming, but stream the real args from the tool: emit a custom event from inside the tool to expose the exact parsed arguments you’re executing with.
from langgraph.config import get_stream_writer
from langchain_core.tools import tool

@tool
def get_cards_by_panel_id(id: int) -> str:
    writer = get_stream_writer()
    writer({"tool": "get_cards_by_panel_id", "args": {"id": id}})
    # ... run tool
    return "ok"

for mode, chunk in graph.stream(
    {"messages": [{"role": "user", "content": user_input}]},
    config,
    subgraphs=True,
    stream_mode=["messages", "custom"],
):
    if mode == "messages":
        msg, meta = chunk
        if msg.content:
            print(msg.content, end="", flush=True)
    elif mode == "custom":
        print("\nTool args:", chunk)
  • Disable token streaming on the specific model: if you don’t need tokens from that agent/model, initialize it with disable_streaming=True so only finalized updates come through.

Why this happens: per the docs, messages mode streams LLM outputs token-by-token; tool-call JSON (including arguments) is constructed incrementally and only becomes reliable once the message is complete. See LLM tokens and Stream subgraph outputs:

Optional: upgrade LangGraph/LangChain to the latest version, as streaming + tools has seen fixes and polish over time, but the core guidance above (don’t parse tool-call args from token chunks) remains the recommended approach.

I tried other custom llm model and I found that only the Qwen_32B(which developed by Alibaba) has this problem. After replace it with DeepSeekV3, the id passed to tool become correct when stream_mode is True!

If I still use Qwen_32B, As @pawel-twardziak suggested, I can only set the disable_streaming=True for the llm of `data_agent`

Hi @pawel-twardziak ! Thanks so much for your reply!

The first one and third one works (the second one still return 3563) ! And the third one maybe the best choice for me, because I want stream token as much as possible, so I just set disable_streaming to llm of data_agent.

But after some tests, I found that the problem only occur when I use the Qwen_32B(which developed by Alibaba) as model, If I use DeepSeekinstead, the arguments is correct!

hi @mogitozhang yeah, I faced that before, some OSS models still struggle with tool caling :worried:

Gotcha, that’s a really interesting insight. Will note with the team that Qwen_32B has this issue, glad to hear you found a workaround!