Unexpected latency in minimal Deep Agent setup (create_deep_agent) + confusing @traceable runs

formertheorist · March 26, 2026, 9:37am

Unexpected latency in minimal Deep Agent setup (`create_deep_agent`) + confusing `@traceable` runs

Hi all, I’m debugging latency in a LangGraph/LangSmith setup and could use guidance. I’m building an assistant (chat_agent) that will eventually delegate to subagents, but to isolate latency I reduced it to a minimal graph:

Build Bedrock chat model
Pass it to create_deep_agent
Minimal system prompt

What I observe

End-to-end latency is ~13–15s, first token is ~10–12s
In LangSmith waterfall, middleware + actual model call seem much shorter (roughly ~3s)
So there appears to be ~10s missing from visible child spans

To investigate, I wrapped graph() and it’s key steps (build model, create_deep_agent) with @traceable. This showed multiple extra traces named chat_agent:graph (around 7 per request).

Each trace’s internal shows a couple of seconds for building the model.

Questions

Why would graph() be traced multiple times per apparent single user interaction?
Could repeated graph factory calls explain the large gap between total latency and visible model/middleware time (first screenshot)?

Any pointers on best-practice tracing for this would be super helpful.

keenborder786 · March 26, 2026, 10:06am

Hello @formertheorist , based on what you have provided, the following is my analysis:

Why `graph()` is traced multiple times

Looking at your waterfall screenshot, I see a single chat_agent trace at 15.18s with nested middleware spans. If you’re seeing 7 separate chat_agent:graph traces per request, this suggests your graph() factory function is being invoked multiple times. Common causes:

Retry/reconnection logic in your Bedrock client or HTTP layer
Framework-level calls (health checks, warmup, middleware initialization)
Multiple concurrent requests hitting the same endpoint
The @traceable decorator applied at multiple levels (class + method)

Check if your graph() function is called from multiple places or wrapped in retry logic.

The ~10s gap between visible spans and total latency

Your waterfall shows:

Total trace: ~15s
Middleware spans starting around 3s
ChatBedrockConverse model call: ~2.87s

The “missing” ~10s is from agent construction inside create_deep_agent(), which is not traced separately. When you call create_deep_agent(), it performs significant synchronous work:

Builds the general-purpose subagent with a full middleware stack (TodoListMiddleware, FilesystemMiddleware, SummarizationMiddleware, PatchToolCallsMiddleware, AnthropicPromptCachingMiddleware)
Calls create_summarization_middleware() multiple times — once for the main agent, once for the general-purpose subagent, and once per custom subagent. Each may initialize models.
Compiles all subagent graphs SubAgentMiddleware.__init__() calls create_agent() for each subagent, which compiles LangGraph state graphs
Resolves string model specs if you pass "provider:model" strings, init_chat_model() is called for resolution

None of this appears in your trace because it happens during object construction, not during the traced ainvoke() call.

Recommendations

1. Create the agent once at startup, not per-request:

# Create once at module/app startup
agent = create_deep_agent(model, system_prompt="...")

@traceable(name="chat_agent")
async def handle_request(messages):
    return await agent.ainvoke({"messages": messages})

2. If you must create per-request, trace the construction separately:

@traceable(name="build_model")
def build_model():
    return ChatBedrock(...)

@traceable(name="create_agent")
def create_agent(model):
    return create_deep_agent(model, ...)

@traceable(name="chat_agent")
async def graph(user_input):
    model = build_model()
    agent = create_agent(model)
    return await agent.ainvoke({"messages": [HumanMessage(user_input)]})

This will show you exactly where the 10s is going.

3. Verify your factory isn’t being called multiple times:

Add logging at the top of your graph() function to confirm it’s only called once per user interaction:

import logging
logger = logging.getLogger(__name__)

@traceable(name="chat_agent")
def graph():
    logger.info("graph() called", stack_info=True)
    # ...

Easy Summary

Symptom	Likely Cause
7 traces per request	Factory function called multiple times (retries, framework behavior)
~10s untraced latency	`create_deep_agent()` construction (subagent compilation, middleware init, model resolution)
Middleware spans ~3s each	These wrap the model call, so they overlap .

Topic		Replies	Views
Agent LLM calls taking much longer than reported in LLM logs LangGraph python-help	1	430	August 5, 2025
❓ How to Reduce Double Agent Calls in React architecture (LangGraph) & Reduce Latency LangGraph python-help	4	629	September 26, 2025
OpenAI API client isnide creat_agent insted of ChatOpenAI Forum Feedback	1	53	February 5, 2026
Token bloat when using create_agent LangGraph intro-to-langgraph , python-help	4	217	November 25, 2025
(1) using a graph-based tool directly via createAgent (LangGraph) vs (2) spawning a CompiledSubAgent via DeepAgents' createSubAgentMiddleware for multi-tool search workflows — which approach is better for accuracy and speed? LangChain js-help	0	104	December 9, 2025

Unexpected latency in minimal Deep Agent setup (create_deep_agent) + confusing @traceable runs