Reasoning tokens being returned in langgraph stream

Unexpected Multiple content Blocks Returned by LLM Inside LangGraph

I’m experiencing a behavior I can’t fully understand.

I have a langgraph project, where I my main graph, and multiple subgraphs, which consist of an agent node and tools node.

Each agent node, eventually invokes an LLM (AzureChatOpenAI, I’m experiencing it with gpt5-mini), which are automatically streamed outside by langgraph.

I’m catching LLM responses outside, and streams them out according to some logic.


The Behavior

Occasionally, the LLM returns more than one content message, which in turn are streamed back to my UI. I’m seeing it in my langfuse traces, for example:

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Some text:\n- true: ...\n- false: ...\n\n....",
      "index": 0
    },
    {
      "type": "text",
      "text": "You're right — duplicate final. Need to respond concisely. The user asked to proceed to analyze ....",
      "index": 1
    }
  ],
  "additional_kwargs": {
    "__openai_function_call_ids__": {
      "call_Wsp5cOJi62N9ASGCjHcLzmNj": "fc_0c4a29f5fc4536db016924a12a7c208193bbd15168a0ef97cc"
    }
  }
}


My Streaming Logic

Now the weird thing is, I was expecting this to be filtered by my streaming mechanism, which looks like this:

for _, stream_mode, llm_response in response:
    if stream_mode == StreamModes.MESSAGES:
        yield from _stream_messages(out_model, message_obj, metadata)

def _stream_messages(...):
    ...
    elif message_obj.content:

        out_model.content += (
            message_obj.content
            if isinstance(message_obj.content, str)
            else message_obj.content[0].get("text", "")
        )
        output_dict = out_model.model_dump()
        data = json.dumps(jsonable_encoder(output_dict))
        yield data + "\n"

You can see if it’s not a string I stream back only content[0], which on its face should have resulted in only the first message being streamed back to the user.

I can confirm that out_model doesn’t contain anything else relevant, and that I don’t alter its content attribute anywhere else.


My Questions

  1. Why is the reasoning tokens being returned?
    Can I disable them somehow from being returned to me via a param to AzureChatOpenAI?

  2. Why are these tokens being streamed?
    I thought my streaming mechanism should only take the first message.

I tried to reproduce it locally so I can debug, but unfortunately it doesn’t happen often, so I wasn’t able to.

Hi @sagi.l

It seems to be expected behaviour when you use GPT-5 or GPT-5.1

Hi, thanks for the reply
Unfortunately it doesn’t really answer my question.

  1. I don’t pass the summary to the reasoning attribute in AzureChatOpenAI
  2. Even if it’s passed by default, Langfuse logged this in the trace as another item in the content list, and not as an independent attribute under “summary”, so how can I know if that is coming from there?
  3. Regarding my second question in my post, why isn’t it being filtered out?

I don’t know the answers yet, let me investigate for better understanding

Mabe filtering by the index position will help?

for block in message_obj.content:
    if isinstance(block, dict):
        if block.get("index", 0) == 0 and block.get("type") == "text":
            out_model.content += block.get("text", "")

Most probably the regular content is with index 0 and the reasoning one with index 1

Naah, that might be a bad idea … still investigating