Unexpected latency in minimal Deep Agent setup (create_deep_agent) + confusing @traceable runs

Unexpected latency in minimal Deep Agent setup (create_deep_agent) + confusing @traceable runs

Hi all, I’m debugging latency in a LangGraph/LangSmith setup and could use guidance. I’m building an assistant (chat_agent) that will eventually delegate to subagents, but to isolate latency I reduced it to a minimal graph:

  • Build Bedrock chat model
  • Pass it to create_deep_agent
  • Minimal system prompt

What I observe

  • End-to-end latency is ~13–15s, first token is ~10–12s
  • In LangSmith waterfall, middleware + actual model call seem much shorter (roughly ~3s)
  • So there appears to be ~10s missing from visible child spans

To investigate, I wrapped graph() and it’s key steps (build model, create_deep_agent) with @traceable. This showed multiple extra traces named chat_agent:graph (around 7 per request).

Each trace’s internal shows a couple of seconds for building the model.

Questions

  1. Why would graph() be traced multiple times per apparent single user interaction?
  2. Could repeated graph factory calls explain the large gap between total latency and visible model/middleware time (first screenshot)?

Any pointers on best-practice tracing for this would be super helpful.

Hello @formertheorist , based on what you have provided, the following is my analysis:

Why graph() is traced multiple times

Looking at your waterfall screenshot, I see a single chat_agent trace at 15.18s with nested middleware spans. If you’re seeing 7 separate chat_agent:graph traces per request, this suggests your graph() factory function is being invoked multiple times. Common causes:

  1. Retry/reconnection logic in your Bedrock client or HTTP layer
  2. Framework-level calls (health checks, warmup, middleware initialization)
  3. Multiple concurrent requests hitting the same endpoint
  4. The @traceable decorator applied at multiple levels (class + method)

Check if your graph() function is called from multiple places or wrapped in retry logic.


The ~10s gap between visible spans and total latency

Your waterfall shows:

  • Total trace: ~15s
  • Middleware spans starting around 3s
  • ChatBedrockConverse model call: ~2.87s

The “missing” ~10s is from agent construction inside create_deep_agent(), which is not traced separately. When you call create_deep_agent(), it performs significant synchronous work:

  1. Builds the general-purpose subagent with a full middleware stack (TodoListMiddleware, FilesystemMiddleware, SummarizationMiddleware, PatchToolCallsMiddleware, AnthropicPromptCachingMiddleware)

  2. Calls create_summarization_middleware() multiple times — once for the main agent, once for the general-purpose subagent, and once per custom subagent. Each may initialize models.

  3. Compiles all subagent graphs SubAgentMiddleware.__init__() calls create_agent() for each subagent, which compiles LangGraph state graphs

  4. Resolves string model specs if you pass "provider:model" strings, init_chat_model() is called for resolution

None of this appears in your trace because it happens during object construction, not during the traced ainvoke() call.


Recommendations

1. Create the agent once at startup, not per-request:

# Create once at module/app startup
agent = create_deep_agent(model, system_prompt="...")

@traceable(name="chat_agent")
async def handle_request(messages):
    return await agent.ainvoke({"messages": messages})

2. If you must create per-request, trace the construction separately:

@traceable(name="build_model")
def build_model():
    return ChatBedrock(...)

@traceable(name="create_agent")
def create_agent(model):
    return create_deep_agent(model, ...)

@traceable(name="chat_agent")
async def graph(user_input):
    model = build_model()
    agent = create_agent(model)
    return await agent.ainvoke({"messages": [HumanMessage(user_input)]})

This will show you exactly where the 10s is going.

3. Verify your factory isn’t being called multiple times:

Add logging at the top of your graph() function to confirm it’s only called once per user interaction:

import logging
logger = logging.getLogger(__name__)

@traceable(name="chat_agent")
def graph():
    logger.info("graph() called", stack_info=True)
    # ...

Easy Summary

Symptom Likely Cause
7 traces per request Factory function called multiple times (retries, framework behavior)
~10s untraced latency create_deep_agent() construction (subagent compilation, middleware init, model resolution)
Middleware spans ~3s each These wrap the model call, so they overlap .
1 Like