LLM invoked in Tool streams to Main Chat Panel - Can not redirect or filter Next.js client

LLM Invoked in Tool Streams to Main Chat Panel – Cannot Redirect or Filter

Purpose: Document streaming issue for LangGraph development team

Issue: When invoking an LLM inside a tool with streaming enabled, the content streams directly to the main chat UI and cannot be intercepted, redirected, or filtered.

Executive Summary

We’re building a document creation tool that uses an LLM to generate content. When the LLM is invoked inside our tool with streaming enabled (disable_streaming=False), the generated content streams directly to the main chat panel in our UI. We’ve tried multiple approaches to intercept, redirect, or filter these tokens but have been unsuccessful.

We need a way to prevent LLM content generated inside a tool from appearing in the main message stream.

System Architecture

Stack

  • Backend: LangGraph Platform with Python tools

  • Frontend: Next.js with LangGraph SDK

  • LLM: Claude (via langchain_anthropic)

  • Stream Mode: messages-tuple, updates, custom (configured), though server logs show values is also included

Tool Implementation

# /agent/tools/document.py

from langchain_core.tools import tool
from langchain_anthropic import ChatAnthropic
from langgraph.config import get_stream_writer, get_config
import os

def get_claude_client(enable_streaming=False):
    """Get Claude client with streaming configuration"""
    api_key = os.getenv("ANTHROPIC_API_KEY")
    return ChatAnthropic(
        model="claude-3-5-sonnet-20241022",
        api_key=api_key,
        temperature=0.7,
        max_tokens=4096,
        disable_streaming=not enable_streaming
    )

@tool(response_format="content_and_artifact")
def create_document(query: str, title: str = "") -> tuple[str, dict]:
    """Create a document using AI generation"""
    config = get_config()
    tool_call_id = config.get("tool_call_id", "unknown")
    writer = get_stream_writer()

    # Feature flag for streaming
    enable_streaming = os.getenv("FEATURE_DOCUMENT_STREAMING", "false").lower() == "true"

    # Generate document content
    claude = get_claude_client(enable_streaming=enable_streaming)

    # THIS IS WHERE THE PROBLEM OCCURS
    # When streaming is enabled, this content goes directly to the chat UI
    response = claude.invoke([
        {"role": "system", "content": "You are a document creator..."},
        {"role": "user", "content": query}
    ])

    document_content = response.content

    # Return tuple for LangGraph
    llm_content = f"I've created a document titled '{title}'"
    artifact = {
        "tool_name": "create_document",
        "result": {"content": document_content}
    }
    return (llm_content, artifact)

The Problem in Detail

What Happens

  1. User requests document creation

  2. Tool is invoked by LangGraph

  3. Tool calls claude.invoke() with streaming enabled

  4. Content streams directly to the chat UI (undesired behavior)

  5. Content also appears in the tool’s artifact result

What We Want

  1. User requests document creation

  2. Tool is invoked by LangGraph

  3. Tool generates content with LLM

  4. Content streams ONLY to a document panel (not main chat)

  5. Only the summary message appears in chat

Attempted Solutions

1. Message Filtering (Failed)

// /client/src/hooks/useChat.ts
const filteredMessages = rawMessages.filter((message) => {
  if (message.type === 'ai' && activeDocuments.has(message.metadata?.tool_call_id)) {
    // Try to filter out document content
    return false;
  }
  return true;
});

Result: Filtering happens too late — UI has already rendered the content.

2. onMessage Handler (Failed)

const { messages } = useChat({
  onMessage: (message, metadata) => {
    if (metadata?.tool_name === 'create_document') {
      // Try to intercept and redirect
      return; // This doesn’t prevent rendering
    }
  }
});

Result: onMessage is not called when using messages-tuple stream mode.

3. Custom Events Approach (Partially Working)

# Emit custom events alongside normal streaming
if enable_streaming and writer:
    writer({
        "type": "document_token",
        "data": {
            "document_id": doc_id,
            "content": chunk
        }
    })

Result:

  • :white_check_mark: Custom events are received by client

  • :white_check_mark: We may handle them in a separate UI component (TBD)

  • :cross_mark: Content still appears in main chat (duplicate display)

4. .stream() Method Instead of .invoke() (Failed)

# Attempt to use stream() to control output
stream = claude.stream(messages)
for chunk in stream:
    # Emit only as custom events
    writer({"type": "document_token", "data": {"content": chunk.content}})

Result: Same issue — content still goes to message stream.

Core Issue

The fundamental problem is that when an LLM is invoked inside a LangGraph tool with streaming enabled, the tokens are automatically added to the message stream. There doesn’t appear to be a way to:

  1. Prevent tokens from entering the message stream

  2. Redirect tokens to custom events only

  3. Filter tokens before they reach the UI

  4. Mark tokens as “internal” or “hidden”

What We Need

Option 1: Prevent Stream Injection

# Hypothetical API
response = claude.invoke(messages, add_to_stream=False)

Option 2: Stream Interception

# Hypothetical API
response = claude.invoke(messages, stream_handler=custom_handler)

Questions for LangGraph Team

  1. Is there a recommended pattern for using LLMs inside tools without streaming to the main chat?

  2. Can we intercept or redirect the stream before it reaches the message stream?

  3. Are we missing a configuration option or API that would solve this?

Reproduction Steps

  1. Create a tool that invokes an LLM with streaming enabled

  2. Use the tool in a LangGraph agent

  3. Connect a client with messages-tuple stream mode

  4. Observe that LLM content streams to the main chat UI

Environment

{
  "langgraph": "^0.2.0",
  "langchain": "^0.3.0",
  "langchain_anthropic": "^0.2.0",
  "@langchain/langgraph-sdk": "^0.0.21"
}

Related Issues

  • Streaming content to specific UI components (not main chat)

However, we haven’t found any working solutions or official guidance on this pattern.

Contact

We’re happy to provide more code examples, test different approaches, or work with the team to find a solution. This is a critical feature for our document creation workflow, where we want to show document content in a dedicated panel — not mixed with the conversation.


Let me know if you’d like this converted to a PDF, integrated into a GitHub issue template, or sent as part of an email.