Unexpected Multiple content Blocks Returned by LLM Inside LangGraph
I’m experiencing a behavior I can’t fully understand.
I have a langgraph project, where I my main graph, and multiple subgraphs, which consist of an agent node and tools node.
Each agent node, eventually invokes an LLM (AzureChatOpenAI, I’m experiencing it with gpt5-mini), which are automatically streamed outside by langgraph.
I’m catching LLM responses outside, and streams them out according to some logic.
The Behavior
Occasionally, the LLM returns more than one content message, which in turn are streamed back to my UI. I’m seeing it in my langfuse traces, for example:
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "Some text:\n- true: ...\n- false: ...\n\n....",
"index": 0
},
{
"type": "text",
"text": "You're right — duplicate final. Need to respond concisely. The user asked to proceed to analyze ....",
"index": 1
}
],
"additional_kwargs": {
"__openai_function_call_ids__": {
"call_Wsp5cOJi62N9ASGCjHcLzmNj": "fc_0c4a29f5fc4536db016924a12a7c208193bbd15168a0ef97cc"
}
}
}
My Streaming Logic
Now the weird thing is, I was expecting this to be filtered by my streaming mechanism, which looks like this:
for _, stream_mode, llm_response in response:
if stream_mode == StreamModes.MESSAGES:
yield from _stream_messages(out_model, message_obj, metadata)
def _stream_messages(...):
...
elif message_obj.content:
out_model.content += (
message_obj.content
if isinstance(message_obj.content, str)
else message_obj.content[0].get("text", "")
)
output_dict = out_model.model_dump()
data = json.dumps(jsonable_encoder(output_dict))
yield data + "\n"
You can see if it’s not a string I stream back only content[0], which on its face should have resulted in only the first message being streamed back to the user.
I can confirm that out_model doesn’t contain anything else relevant, and that I don’t alter its content attribute anywhere else.
My Questions
-
Why is the reasoning tokens being returned?
Can I disable them somehow from being returned to me via a param toAzureChatOpenAI? -
Why are these tokens being streamed?
I thought my streaming mechanism should only take the first message.
I tried to reproduce it locally so I can debug, but unfortunately it doesn’t happen often, so I wasn’t able to.