Feature Request: Auto-detect streaming in create_react_agent for real-time token feedback

## Problem

Currently, `create_react_agent` in LangGraph always uses `model.invoke()` (lines 657, 659, 685, 687 in `chat_agent_executor.py`), which returns the complete response at once. This completely ignores streaming capabilities of models that support it.

**Impact:**
- ❌ No real-time feedback during response generation
- ❌ Poor UX for long responses (users see nothing until completion)
- ❌ Callbacks don't receive `on_llm_new_token` events
- ❌ Underutilizes native streaming APIs (Bedrock's `converse_stream`, OpenAI streaming, etc)

**Example:** AWS Bedrock Nova 2 Lite has `_stream()` fully implemented using `converse_stream` API, but when used with `create_react_agent`, it falls back to `converse()` (non-streaming) because LangGraph calls `invoke()`.

---

## Proposed Solution

Add automatic streaming detection in the agent's `call_model` function:

```python
def _should_use_streaming(model: Runnable) -> bool:
    """Check if model has _stream implemented (not just inherited)."""
    if hasattr(model, '_stream'):
        from langchain_core.language_models import BaseChatModel
        return type(model)._stream != BaseChatModel._stream
    return False

def _invoke_with_streaming(
    model: Runnable[LanguageModelInput, BaseMessage],
    model_input: Any,
    config: RunnableConfig
) -> AIMessage:
    """Use streaming if available, otherwise fallback to invoke."""
    if _should_use_streaming(model):
        chunks = []
        for chunk in model.stream(model_input, config):
            chunks.append(chunk)
        if chunks:
            final_chunk = chunks[0]
            for chunk in chunks[1:]:
                final_chunk = final_chunk + chunk
            return cast(AIMessage, final_chunk)
    
    return cast(AIMessage, model.invoke(model_input, config))

# Similar for async version

Changes in call_model (line ~657):

# Before:
response = cast(AIMessage, static_model.invoke(model_input, config))

# After:
response = _invoke_with_streaming(static_model, model_input, config)

Benefits

:white_check_mark: Backward compatible - Falls back to invoke() if streaming not available
:white_check_mark: Zero config - Works automatically without user code changes
:white_check_mark: Better UX - Real-time token streaming via callbacks
:white_check_mark: Performance - Leverages native streaming APIs
:white_check_mark: Safe - Detects actual _stream implementation (avoids infinite loops)


Current Workaround

We’re currently using this wrapper as a workaround:

class StreamingChatBedrockConverse(ChatBedrockConverse):
    """Forces streaming even when invoke() is called."""
    
    def invoke(self, input, config=None, **kwargs):
        chunks = []
        for chunk in self.stream(input, config, **kwargs):
            chunks.append(chunk)
        
        if not chunks:
            return AIMessage(content="")
        
        final_message = chunks[0].message
        for chunk in chunks[1:]:
            final_message = final_message + chunk.message
        
        return final_message

This works but feels like something LangGraph should handle natively.


Alternative Approach

If you prefer explicit control, could add an optional parameter:

def create_react_agent(
    model: ...,
    tools: ...,
    *,
    enable_streaming: bool = True,  # New param
    ...
):
    # Use streaming only if enabled and supported

Would love to hear thoughts from the team! Happy to contribute a PR if this approach makes sense. :rocket:



More information below

Problem

When using create_agent with Amazon Nova 2 models, streaming doesn’t work
because invoke() doesn’t emit chunks via callbacks.

Solution

Override _generate() to use _stream() internally:

class StreamingChatBedrockConverse(ChatBedrockConverse):
    """Wrapper that forces streaming behavior even when invoke() is called.

    PROBLEM:
    --------
    LangGraph's create_agent calls model.invoke() which returns complete AIMessage.
    This bypasses streaming even though the model and LangGraph support it.

    SOLUTION:
    ---------
    Override _generate() (called by invoke()) to use _stream() internally.
    The key insight: _generate() is called within the callback manager context,
    so when we call _stream() internally, the chunks are emitted via callbacks
    AND LangGraph's astream() picks them up automatically!

    Result: invoke() still returns AIMessage (satisfying LangGraph), but chunks
    are streamed in real-time via the callback system!
    """

    def _generate(self, messages, stop=None, run_manager=None, **kwargs):
        """Override _generate to force streaming even when called via invoke()."""

        # Check if streaming is disabled for this model
        if self.disable_streaming:
            )
            return super()._generate(
                messages, stop=stop, run_manager=run_manager, **kwargs
            )


        )

        from langchain_core.messages import message_chunk_to_message
        from langchain_core.outputs import ChatGeneration, ChatResult

        # Use _stream() which emits chunks via run_manager callbacks
        chunks = []
        for chunk in self._stream(
            messages, stop=stop, run_manager=run_manager, **kwargs
        ):
            chunks.append(chunk)
            # Chunk is automatically emitted via run_manager.on_llm_new_token()
            # which LangGraph's astream() monitors!


        if not chunks:
            from langchain_core.messages import AIMessage

            return ChatResult(
                generations=[ChatGeneration(message=AIMessage(content=""))]
            )

        # Combine chunks for final result
        final_chunk = chunks[0]
        for chunk in chunks[1:]:
            final_chunk = final_chunk + chunk

        # Convert to proper message
        final_message = message_chunk_to_message(final_chunk.message)

        return ChatResult(
            generations=[
                ChatGeneration(
                    message=final_message,
                    generation_info=final_chunk.generation_info,
                )
            ]
        )

Why it works

  • LangGraph monitors callbacks for streaming
  • _stream() emits chunks via run_manager.on_llm_new_token()
  • invoke() still returns complete AIMessage
  • Token-by-token streaming works! :white_check_mark:

Tested with

  • Amazon Nova 2 Lite (amazon.nova-2-lite-v1:0)
  • LangChain 1.0+
  • Langchain-aws 1.0