Capturing intermediate reasoning text between tool calls

morenoj11 · January 22, 2026, 3:24pm

Hi team! I’m building a streaming agent UI with LangChain and Azure OpenAI (GPT-5.2 with reasoning mode). I’m using astream_events(version="v2") to capture token-by-token streaming.

Setup:

langchain-openai with AzureChatOpenAI
reasoning={"effort": "high", "summary": "detailed"} and output_version="responses/v1"
Using astream_events to capture on_chat_model_stream events

What works:

Initial reasoning tokens stream correctly via content_blocks with type="reasoning"
Tool calls and results are captured via on_tool_start/on_tool_end
Final reasoning after all tools completes streams correctly

Issue:
After a tool result returns, the model often goes directly to the next tool call (tool_args tokens) without emitting visible reasoning text. I see this pattern:

reasoning tokens → tool_call → tool_result → tool_args (next tool) → tool_result → ... → reasoning tokens (final)

I expected intermediate reasoning between each tool call, similar to how ChatGPT/Claude show “thinking” text between tool invocations.

Questions:

Is there a way to force/encourage the model to emit reasoning summaries between tool calls, not just at the start and end?
Are there different astream_events event types I should be listening for to capture this intermediate content?
Is this a model behavior limitation with the Responses API, or is there a configuration I’m missing?

Any guidance appreciated!

iamhassaan · January 28, 2026, 6:42pm

Facing the same problem with Azure OpenAI 5.2 (chat completions). Would highly appreciate someone responding to this.

Using ChatAnthropic gives intermediate reasoning.

keenborder786 · January 29, 2026, 2:51pm

Are you using Microsoft Foundry for Azure Deployment?

theodelobelle · February 19, 2026, 2:16pm

The foundry is preview and and not fully well implemented, trust me use that for now :

AZURE_OPENAI_API_KEY=”Your_project_foundry_key”
AZURE_OPENAI_ENDPOINT=”https://{foundry_ressource_name}.openai.azure.com/"

 api_key = os.environ.get("AZURE_OPENAI_API_KEY", "")
    if not api_key:
        logging.error("AZURE_OPENAI_API_KEY environment variable not set.")
        raise EnvironmentError("AZURE_OPENAI_API_KEY must be set in environment variables.")

    azure_endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT", "")
    if not azure_endpoint:
        logging.error("AZURE_OPENAI_ENDPOINT environment variable not set.")
        raise EnvironmentError("AZURE_OPENAI_ENDPOINT must be set in environment variables.")

    logging.info(f"Loading {model_name} from Azure OpenAI.")
    return AzureChatOpenAI(
        api_key=api_key,
        azure_deployment=model_name,
        azure_endpoint=azure_endpoint,
        temperature=temperature,
    )

keenborder786 · February 21, 2026, 1:32am

@morenoj11 were you able to verify it?

Bitcot_Kaushal · February 24, 2026, 5:10pm

Hi @morenoj11 ,

This is expected behavior with GPT-5.2 via the Responses API on Azure.

What’s happening

With:

reasoning={"effort": "high"}
output_version="responses/v1"
astream_events(version="v2")

The model typically emits:

initial reasoning → tool_call
tool_result → (silent internal reasoning) → next tool_call
...
final reasoning summary

The key point:

The model does not always externalize intermediate reasoning between tool calls.
It often performs that reasoning internally and jumps straight to the next tool invocation.

So what you’re seeing is normal for GPT-5.x on Responses API.

Why ChatAnthropic shows intermediate reasoning

Claude exposes more “thinking text” between tool calls.

GPT-5.2 (especially via Azure Responses API) is more conservative and:

Emits reasoning at the start
Emits reasoning at the end
May skip intermediate summaries

This is model behavior, not a LangChain streaming bug.

Are you missing an event type?

No.

If you’re already listening to:

on_chat_model_stream
on_tool_start
on_tool_end

Then you’re capturing everything the model is emitting.

There isn’t a hidden intermediate reasoning event you’re missing.

Can you force intermediate reasoning?

Not reliably.

You can encourage it by adding to your system prompt:

After each tool result, briefly summarize your reasoning before deciding the next action.

But:

It may increase token usage
It’s not guaranteed
The model may still choose silent reasoning

Bottom Line

This is a model-level behavior of GPT-5.2 + Responses API, not a streaming configuration issue.

If intermediate reasoning between tool calls is critical for your UI, Claude currently exposes that more consistently.

Otherwise, the only lever you have is prompt steering — there’s no configuration flag that forces intermediate reasoning emission.

If you’re using Azure Foundry specifically, behavior can vary slightly by deployment mode — but structurally this is expected for GPT-5.x.

Topic		Replies	Views
Langchain not compatible with GPT-5.1 reasoning model to extract reasoning? LangChain python-help	1	970	November 22, 2025
Unable to distinguish between reasoning text and final response in streaming mode with tool calls OSS Product Help self-hosted , python-help	2	464	June 18, 2026
How to separate reasoning(chain of thoughts) and final response in LangChain v1 Agent (create_agent)? LangChain Academy python-help	3	1393	December 8, 2025
How to extract GPT-5 reasoning summaries with @langchain/openai? LangChain js-help	15	4906	March 15, 2026
Agent Chat UI stream reasoning and tool calls? LangChain python-help	1	481	March 7, 2026