IMHO you have a fixed, well-defined workflow with a verification loop (deploy → run → count → diagnose → fix; max ~3 attempts) and a human approval gate between Phase A and Phase B. That’s much closer to a state machine / workflow graph than an open-ended “delegate and summarize” assistant.
AFAIK from the source code. deep agents subagents are built for context quarantine. In the deepagents source, the task tool intentionally does not pass structured_response between the main agent and the subagent:
# State keys that are excluded when passing state to subagents and when returning
# updates from subagents.
# When returning updates:
# 1. The messages key is handled explicitly to ensure only the final message is included
# 2. The todos and structured_response keys are excluded as they do not have a defined reducer
# and no clear meaning for returning them from a subagent to the main agent.
_EXCLUDED_STATE_KEYS = {"messages", "todos", "structured_response"}
And for a CompiledSubAgent, the main agent gets only the final message back (wrapped as a ToolMessage), not the full internal state:
class CompiledSubAgent(TypedDict):
"""A pre-compiled agent spec.
!!! note
The runnable's state schema must include a 'messages' key.
This is required for the subagent to communicate results back to the main agent.
When the subagent completes, the final message in the 'messages' list will be
extracted and returned as a `ToolMessage` to the parent agent.
"""
So if your mapping spec is produced as a validated structured object (e.g., ProviderStrategy(…, strict=True)), that structured object won’t automatically propagate to the supervisor via subagents unless you explicitly serialize/persist it (e.g., write it to a file and have the main agent read it).
Therefore, my recommendation: dedicated LangGraph workflow (graph) with explicit state keys
Use a graph that keeps shared state across phases (e.g., params, mapping_spec, schema_metadata, sql_script, attempt, verification_results). This directly addresses “SQL generator can’t find data” because Phase B always has the full Phase A outputs + any runtime observations.
Quick shot - draft: a good shape that in my opinion would would match your diagram:
- Node: parameter_collection → validates German inputs
- Node: schema_introspection (tools) → fetch table columns/constraints + optional row-count sanity checks
- Node: create_mapping_spec (LLM structured output, strict) → writes mapping_spec into graph state
- Interrupt / HITL gate → human corrections/approval
- Node: generate_sql (LLM) → uses mapping_spec + schema_metadata (+ any sampled evidence)
- Node: deploy_and_run (tools) → execute DDL, run staging function, collect errors
- Node: verify (tools) → row count checks, constraints checks
- Conditional routing:
- success → finalize/save
- failure and attempt < 3 → diagnose (tools) → generate_sql
- failure and attempt == 3 → stop with report
This is exactly what LangGraph is good at: deterministic routing + loops + interrupts + persisted state.
When a Deep Agent still makes sense
Keep a Deep Agent when the work is not a predictable workflow (researchy, exploratory, lots of “figure it out”), or when you truly want context quarantine. But your ETL pipeline is mostly “known steps + tooling + verification”, so a graph is the more reliable base.
If you want to stay on Deep Agents anyway
You can still make it work, but you must explicitly persist canonical artifacts between subagents:
- Mapping spec subagent must write the spec to the filesystem (e.g., /artifacts/mapping_spec.json) and the SQL generator must read that file, because structured_response won’t be returned through task.
- Alternatively, don’t make mapping spec a subagent: run it in the main agent (or as a tool) so the structured result is in the main run’s state.
Does this input make sense for you @SeccoMaracuja ?