Hi, I’m quite new to LangGraph, and have no idea, how to properly manage next thing:
I have an agent built with create_react_agent
and a few tools:
from langgraph.config import get_stream_writer
from langchain.agents import create_react_agent
from langchain_core.tools import tool
@tool
async def my_proposed_tool(state: Annotated[UserContext, InjectedState], prompt: str):
llm = ChatOpenAI(
model="gpt-4.1-mini",
streaming=True,
)
messages = [...]
async for chunk in llm.with_structured_output(schema=SomeModel.model_json_schema()).astream_events(input=messages):
current_data = chunk['data']
post_processed_data = post_process(current_data)
print(post_processed_data) # must be yield/`get_stream_writer` here
agent = create_react_agent(
model="openai:gpt-4.1-mini",
tools=[
# other tools..
my_proposed_tool,
]
# other props
)
# stream agent
Inside the agent, I want a function/tool (e.g. my_proposed_tool
) that, when invoked, calls an LLM that streams partial JSON (structured output). I need to intercept each partial, run custom post-processing, and only then stream the processed chunks.
My current idea is to add a tool that calls the LLM internally and use astream
/astream_events
plus get_stream_writer
to emit my own events. BUT when I do this, the compiled agent also streams the inner LLM tokens alongside my custom events. Is there a way to suppress inner LLM streaming so only my processed output goes out? Or it’s even a bad practise to put LLM inside a tool? But then how do I manage this flow…
Maybe somehow with subgraphs?