I currently have a question and I need you to provide a simple demo for the following requirement: Suppose the front end accesses the back-end agent, generates an image and returns the URL of this image via websocket. The requirement is that this information sent is the result of a tool invocation. The reason I’m doing this is that I want to get structured data as well as AI’s conversational responses (for example, to enable dynamic changes to the front-end page background through dialogue). However, I am facing some difficulties in coding. I am using FastAPI. I would be very grateful for your help!
- Use LangGraph/LangChain multi-mode streaming:
agent.astream(
{"messages": [{"role": "user", "content": user_text}]},
stream_mode=["messages", "updates", "custom"],
version="v2",
)
messages gives you LLM token chunks, updates gives completed per-node updates such as tool calls and ToolMessage results, and custom gives data emitted from inside tools.
- Have the tool emit a structured custom event:
from langchain.tools import ToolRuntime, tool
@tool
async def generate_image(prompt: str, runtime: ToolRuntime) -> dict:
"""Generate an image and return its URL."""
url = await generate_and_upload_image(prompt)
runtime.stream_writer({
"type": "image_ready",
"url": url,
"prompt": prompt,
"tool_call_id": runtime.tool_call_id,
})
return {"url": url, "prompt": prompt}
This custom event is emitted by the tool invocation itself. In the agent stream it arrives wrapped as:
{"type": "custom", "data": {"type": "image_ready", "url": "..."}}
- Forward each stream chunk as a small JSON envelope over WebSocket:
async for chunk in agent.astream(...):
if chunk["type"] == "messages":
token, metadata = chunk["data"]
if token.text:
await websocket.send_json({
"type": "assistant_delta",
"text": token.text,
})
elif chunk["type"] == "custom":
await websocket.send_json(chunk["data"])
elif chunk["type"] == "updates":
await websocket.send_json({
"type": "agent_update",
"data": chunk["data"],
})
Then the browser branches on type: append assistant_delta to the chat bubble, and use image_ready to mutate document.body.style.backgroundImage.
Is using WebSocket for the entire conversation the optimal solution for my requirement?
My original idea was to use SSE for agent dialogue, and when the agent calls tools, use WebSocket to send tool-related messages.
Haha, is this approach actually not recommended?
try CopilotKit - it’s actually what you need and it has a langgraph integration too
I made an entire platform for board games - LangGraph and CopilotKit https://aiboardgames.cloud/