Our workflow heavily uses fanout/map-reduce style graphs. A single thread might have 70 parallel nodes. We expect to have multiple concurrent threads in production, so we would like to understand any limitations of LangGraph’s parallel execution model, as well as associated best practices. Thanks!
LangGraph executes in discrete super-steps; nodes that are scheduled together run in parallel, then the system synchronizes before the next step. This is inspired by Pregel’s message-passing model.
Use Send for dynamic fan-out, Command for update+route
For map-reduce/orchestrator-worker patterns, return a list of Send(node, arg) to dynamically spawn workers. Use Command(update=..., goto=...) when you need to update state and route in one place.
Define reducers for any keys written from parallel branches
Parallel nodes writing the same key must specify how to merge updates; otherwise you’ll hit INVALID_CONCURRENT_GRAPH_UPDATE.
Control concurrency with max_concurrency
There’s no hard-coded fanout limit, but you should throttle concurrent tasks to match host resources and provider rate limits via max_concurrency in the call config.
Use retries at the node level (especially for LLM/tool calls)
Attach RetryPolicy to nodes that can fail transiently. This keeps super-steps healthy under occasional provider errors
Aggregate on a dedicated key and avoid cross-branch mutation
Have workers write to an append-only key (e.g., results: Annotated[list, operator.add]) and do synthesis in a downstream aggregator node. Avoid multiple parallel nodes overwriting the same scalar key.
Prefer Send over imperative subgraph calls inside a single node
If you need multiple subgraph runs, don’t imperatively invoke a subgraph multiple times in one node when checkpointing; use Send fanout instead. Otherwise you may hit MULTIPLE_SUBGRAPHS naming/namespace limits
Mind recursion and loops
Default recursion_limit is 25 super-steps; increase per-run if your workflow loops over many barrier synchronizations
Performance and resource tips
Size max_concurrency to protect: model rate limits, DB pools, HTTP pools, CPU-bound work. Observe memory and file-descriptor usage under load.
Use node-level caching for pure/expensive tasks to avoid recompute where inputs repeat.
If single nodes do heavy CPU, consider pushing that work to a separate service or a worker pool to keep the event loop responsive.
Testing and observability
For heavy fanouts, add canary tests that simulate N workers and verify merge semantics with your reducers.
Stream with stream_mode="updates" during load tests to ensure branches complete and aggregate as expected.
Hopefully, other users of the forum can help improve/expand this best practices list
Thanks Pawel, this is super helpful! On the recursion_limit question: Do you happen to know how to increase that limit when running using langgraph.json, as opposed to client.runs.stream? See this question
Hi @rkauf
Actually I know nothing about that passibility - I think you can’t use langgraph.json for now to modify the limit, can you?
And honestly, why would that be useful if you can still use the context to modify it?