Hello @formertheorist , based on what you have provided, the following is my analysis:
Why graph() is traced multiple times
Looking at your waterfall screenshot, I see a single chat_agent trace at 15.18s with nested middleware spans. If you’re seeing 7 separate chat_agent:graph traces per request, this suggests your graph() factory function is being invoked multiple times. Common causes:
- Retry/reconnection logic in your Bedrock client or HTTP layer
- Framework-level calls (health checks, warmup, middleware initialization)
- Multiple concurrent requests hitting the same endpoint
- The
@traceable decorator applied at multiple levels (class + method)
Check if your graph() function is called from multiple places or wrapped in retry logic.
The ~10s gap between visible spans and total latency
Your waterfall shows:
- Total trace: ~15s
- Middleware spans starting around 3s
ChatBedrockConverse model call: ~2.87s
The “missing” ~10s is from agent construction inside create_deep_agent(), which is not traced separately. When you call create_deep_agent(), it performs significant synchronous work:
-
Builds the general-purpose subagent with a full middleware stack (TodoListMiddleware, FilesystemMiddleware, SummarizationMiddleware, PatchToolCallsMiddleware, AnthropicPromptCachingMiddleware)
-
Calls create_summarization_middleware() multiple times — once for the main agent, once for the general-purpose subagent, and once per custom subagent. Each may initialize models.
-
Compiles all subagent graphs SubAgentMiddleware.__init__() calls create_agent() for each subagent, which compiles LangGraph state graphs
-
Resolves string model specs if you pass "provider:model" strings, init_chat_model() is called for resolution
None of this appears in your trace because it happens during object construction, not during the traced ainvoke() call.
Recommendations
1. Create the agent once at startup, not per-request:
# Create once at module/app startup
agent = create_deep_agent(model, system_prompt="...")
@traceable(name="chat_agent")
async def handle_request(messages):
return await agent.ainvoke({"messages": messages})
2. If you must create per-request, trace the construction separately:
@traceable(name="build_model")
def build_model():
return ChatBedrock(...)
@traceable(name="create_agent")
def create_agent(model):
return create_deep_agent(model, ...)
@traceable(name="chat_agent")
async def graph(user_input):
model = build_model()
agent = create_agent(model)
return await agent.ainvoke({"messages": [HumanMessage(user_input)]})
This will show you exactly where the 10s is going.
3. Verify your factory isn’t being called multiple times:
Add logging at the top of your graph() function to confirm it’s only called once per user interaction:
import logging
logger = logging.getLogger(__name__)
@traceable(name="chat_agent")
def graph():
logger.info("graph() called", stack_info=True)
# ...
Easy Summary
| Symptom |
Likely Cause |
| 7 traces per request |
Factory function called multiple times (retries, framework behavior) |
| ~10s untraced latency |
create_deep_agent() construction (subagent compilation, middleware init, model resolution) |
| Middleware spans ~3s each |
These wrap the model call, so they overlap . |