## Problem
Currently, `create_react_agent` in LangGraph always uses `model.invoke()` (lines 657, 659, 685, 687 in `chat_agent_executor.py`), which returns the complete response at once. This completely ignores streaming capabilities of models that support it.
**Impact:**
- ❌ No real-time feedback during response generation
- ❌ Poor UX for long responses (users see nothing until completion)
- ❌ Callbacks don't receive `on_llm_new_token` events
- ❌ Underutilizes native streaming APIs (Bedrock's `converse_stream`, OpenAI streaming, etc)
**Example:** AWS Bedrock Nova 2 Lite has `_stream()` fully implemented using `converse_stream` API, but when used with `create_react_agent`, it falls back to `converse()` (non-streaming) because LangGraph calls `invoke()`.
---
## Proposed Solution
Add automatic streaming detection in the agent's `call_model` function:
```python
def _should_use_streaming(model: Runnable) -> bool:
"""Check if model has _stream implemented (not just inherited)."""
if hasattr(model, '_stream'):
from langchain_core.language_models import BaseChatModel
return type(model)._stream != BaseChatModel._stream
return False
def _invoke_with_streaming(
model: Runnable[LanguageModelInput, BaseMessage],
model_input: Any,
config: RunnableConfig
) -> AIMessage:
"""Use streaming if available, otherwise fallback to invoke."""
if _should_use_streaming(model):
chunks = []
for chunk in model.stream(model_input, config):
chunks.append(chunk)
if chunks:
final_chunk = chunks[0]
for chunk in chunks[1:]:
final_chunk = final_chunk + chunk
return cast(AIMessage, final_chunk)
return cast(AIMessage, model.invoke(model_input, config))
# Similar for async version
Changes in call_model (line ~657):
# Before:
response = cast(AIMessage, static_model.invoke(model_input, config))
# After:
response = _invoke_with_streaming(static_model, model_input, config)
Benefits
Backward compatible - Falls back to invoke() if streaming not available
Zero config - Works automatically without user code changes
Better UX - Real-time token streaming via callbacks
Performance - Leverages native streaming APIs
Safe - Detects actual _stream implementation (avoids infinite loops)
Current Workaround
We’re currently using this wrapper as a workaround:
class StreamingChatBedrockConverse(ChatBedrockConverse):
"""Forces streaming even when invoke() is called."""
def invoke(self, input, config=None, **kwargs):
chunks = []
for chunk in self.stream(input, config, **kwargs):
chunks.append(chunk)
if not chunks:
return AIMessage(content="")
final_message = chunks[0].message
for chunk in chunks[1:]:
final_message = final_message + chunk.message
return final_message
This works but feels like something LangGraph should handle natively.
Alternative Approach
If you prefer explicit control, could add an optional parameter:
def create_react_agent(
model: ...,
tools: ...,
*,
enable_streaming: bool = True, # New param
...
):
# Use streaming only if enabled and supported
Would love to hear thoughts from the team! Happy to contribute a PR if this approach makes sense. ![]()