I have a multi-agent system with a supervisor that can call on some other ReAct agents. I have configured some Online Evaluators with Custom Code, but I find that the top-level Runs do not contain the tool calls, or any other action taken by my sub-agents besides the message they write back to the supervisor. As such, those Runs don’t contain things like how many tool calls were made, how many tokens were consumed and what type, or the results of any tool call. Of course, I can run Evals over those other Runs too, but I can’t aggregate the results over the Trace without seeing everything in one place.
I’m using the built-in create_react_agent and the library langgraph-supervisor (which is also create_react_agent), part of which is shown in the diagram below.
If I create an eval for a top level Trace (correct me if I’m wrong, but I think this is where trace_id == run_id), I can then look at the trace for the eval itself and see how it received the run (can’t include a picture because of forum restrictions).
In that Run outputs I can only see the main Message thread as seen by the supervisor, as well as metadata and the remainder of the final State. That includes things like messages that show transfers to sub-agents like the following:
`{`
"additional_kwargs": {},
"content": "",
"id": "lc_run--412923f1-c17d-409c-9616-606bea4c5259",
"invalid_tool_calls": [],
"name": "supervisor",
"response_metadata": {
"finish_reason": "tool_calls",
"model_name": "gpt-5-mini-2025-08-07",
"model_provider": "openai"
},
"tool_calls": [
{
"args": {},
"id": "call_3L7pxKSSTtIek0rwo2SLhxad",
"name": "transfer_to_planner_expert",
"type": "tool_call"
}
],
"type": "ai",
"usage_metadata": {
"input_token_details": {
"audio": 0,
"cache_read": 0
},
"input_tokens": 9849,
"output_token_details": {
"audio": 0,
"reasoning": 64
},
"output_tokens": 88,
"total_tokens": 9937
}
},
{
"additional_kwargs": {},
"content": "Successfully transferred to planner_expert",
"id": "9589376d-ea0e-423a-ae8c-e3233734b296",
"name": "transfer_to_planner_expert",
"response_metadata": {
"__handoff_destination": "planner_expert"
},
"status": "success",
"tool_call_id": "call_3L7pxKSSTtIek0rwo2SLhxad",
"type": "tool"
},
But the work of the sub-agent itself is not there.
I have shared the way I’m calling some of the built-in agents, because maybe there is configuration I can add that will improve this.
supervisor_with_planner = create_supervisor(
agents=[
planner_agent.graph,
description_writer_agent.graph,
# OTHERS
],
model=LLM,
prompt=build_supervisor_prompt, # <--- This builds a dynamic prompt that injects some things from state
state_schema=MyCustomState,
add_handoff_back_messages=True, # My sub-agents send a message back to the supervisor
output_mode="last_message",
).compile()
# In another file...
_description_writer_agent_graph = create_react_agent(
model=LLM,
tools=DESCRIPTION_WRITER_TOOLS,
state_schema=MyCustomState,
prompt=build_description_writer_prompt, # <--- This builds a dynamic prompt that injects some things from state
name="description_writer",
)
I have looked at the multi-turn evaluators and that is not what I need. I am aware that create_react_agent is deprecated, but I had issues migrating and will wait until (hopefully) the langgraph_supervisor library migrates.
Question: How can I see the runs from all the sub-agents and the supervisor in one place and run an Online Eval on that?
