Reasoning tokens being returned in langgraph stream

sagi.l · November 25, 2025, 11:38am

Unexpected Multiple `content` Blocks Returned by LLM Inside LangGraph

I’m experiencing a behavior I can’t fully understand.

I have a langgraph project, where I my main graph, and multiple subgraphs, which consist of an agent node and tools node.

Each agent node, eventually invokes an LLM (AzureChatOpenAI, I’m experiencing it with gpt5-mini), which are automatically streamed outside by langgraph.

I’m catching LLM responses outside, and streams them out according to some logic.

The Behavior

Occasionally, the LLM returns more than one content message, which in turn are streamed back to my UI. I’m seeing it in my langfuse traces, for example:

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Some text:\n- true: ...\n- false: ...\n\n....",
      "index": 0
    },
    {
      "type": "text",
      "text": "You're right — duplicate final. Need to respond concisely. The user asked to proceed to analyze ....",
      "index": 1
    }
  ],
  "additional_kwargs": {
    "__openai_function_call_ids__": {
      "call_Wsp5cOJi62N9ASGCjHcLzmNj": "fc_0c4a29f5fc4536db016924a12a7c208193bbd15168a0ef97cc"
    }
  }
}

My Streaming Logic

Now the weird thing is, I was expecting this to be filtered by my streaming mechanism, which looks like this:

for _, stream_mode, llm_response in response:
    if stream_mode == StreamModes.MESSAGES:
        yield from _stream_messages(out_model, message_obj, metadata)

def _stream_messages(...):
    ...
    elif message_obj.content:

        out_model.content += (
            message_obj.content
            if isinstance(message_obj.content, str)
            else message_obj.content[0].get("text", "")
        )
        output_dict = out_model.model_dump()
        data = json.dumps(jsonable_encoder(output_dict))
        yield data + "\n"

You can see if it’s not a string I stream back only content[0], which on its face should have resulted in only the first message being streamed back to the user.

I can confirm that out_model doesn’t contain anything else relevant, and that I don’t alter its content attribute anywhere else.

My Questions

Why is the reasoning tokens being returned?
Can I disable them somehow from being returned to me via a param to AzureChatOpenAI?
Why are these tokens being streamed?
I thought my streaming mechanism should only take the first message.

I tried to reproduce it locally so I can debug, but unfortunately it doesn’t happen often, so I wasn’t able to.

pawel-twardziak · November 25, 2025, 3:36pm

Hi @sagi.l

It seems to be expected behaviour when you use GPT-5 or GPT-5.1

https://platform.openai.com/docs/guides/reasoning#reasoning-summaries
Azure OpenAI reasoning models - GPT-5 series, o3-mini, o1, o1-mini - Azure OpenAI | Microsoft Learn

sagi.l · November 25, 2025, 3:58pm

Hi, thanks for the reply
Unfortunately it doesn’t really answer my question.

I don’t pass the summary to the reasoning attribute in AzureChatOpenAI
Even if it’s passed by default, Langfuse logged this in the trace as another item in the content list, and not as an independent attribute under “summary”, so how can I know if that is coming from there?
Regarding my second question in my post, why isn’t it being filtered out?

pawel-twardziak · November 25, 2025, 4:21pm

I don’t know the answers yet, let me investigate for better understanding

pawel-twardziak · November 25, 2025, 4:40pm

Mabe filtering by the index position will help?

for block in message_obj.content:
    if isinstance(block, dict):
        if block.get("index", 0) == 0 and block.get("type") == "text":
            out_model.content += block.get("text", "")

Most probably the regular content is with index 0 and the reasoning one with index 1

pawel-twardziak · November 25, 2025, 4:45pm

Naah, that might be a bad idea … still investigating

Topic		Replies	Views
Capturing intermediate reasoning text between tool calls LangChain python-help	5	372	February 24, 2026
@langchain/openai for AzureChatOpenAI isn't returning reasoning content or tokens LangChain js-help	3	318	January 26, 2026
LangChain Agents: stream_mode="messages" intermittently emits cumulative AIMessageChunk (large duplicate text mid-response) LangChain python-help	6	448	November 21, 2025
Prevent summarization from polluting message stream LangGraph python-help	6	840	November 19, 2025
React SDK How to listen differentiate LLM stream LangGraph js-help	2	184	October 31, 2025

Reasoning tokens being returned in langgraph stream

Unexpected Multiple content Blocks Returned by LLM Inside LangGraph

The Behavior

My Streaming Logic

My Questions

Related topics

Unexpected Multiple `content` Blocks Returned by LLM Inside LangGraph