LangGraph Parallelism

theodevs · March 3, 2026, 2:44am

I’m seeing a discrepancy between Graph Topology and Execution Reality. My LangGraph has a clear fan-out from initialize to four parallel paths (including guardrail and preprocess subgraphs), yet my Langfuse traces show a “staircase” effect instead of vertical overlap.

Despite using precompiled graphs to eliminate overhead, the nodes aren’t executing concurrently. Specifically:

The Execution Gap: In the preprocess subgraph, topic_selection (9.50s) starts only after extraction (3.84s) finishes.
Subgraph Overhead: Is this “staircase” an inherent LangGraph mechanism where the orchestrator must checkpoint state before starting the next parallel task?
Non-Blocking Traces: I’ve confirmed Langfuse tracing isn’t the bottleneck, so the delay is happening within the graph’s task coordination.

Has anyone else faced this sequential execution issue with parallel nodes?

I’m looking for recommendations on how to achieve true 0-gap concurrency.

Is this a State-locking behavior or a Python GIL/Async limitation?
Any specific search terms or configurations to fix this?
If you have a Langfuse trace where parallel bars actually overlap, I’d love to see your setup.

Bitcot_Kaushal · March 3, 2026, 4:14am

Hi @theodevs ,

The runtime (async vs. sync) and whether your nodes are truly async + non-blocking determine true concurrency in LangGraph; parallel edges do not ensure parallel execution.

Also, you can also look for LangGraph parallel edges async execution, RunnableParallel, and async node execution LangGraph in the LangChain/LangGraph async execution docs. Avoid blocking code, and make sure all LLM/tool calls are fully async for true 0-gap overlap; otherwise, it’s typically runtime behavior rather than state-locking.

Reference: Durable execution - Docs by LangChain

theodevs · March 5, 2026, 1:55am

hi @Bitcot_Kaushal

As a reference, here is the simple code structure I’m using:

Guardrail Subgraph


class GuardrailInput(TypedDict):
    question: str

class GuardrailOutput(TypedDict):
    output: str

@observe()
async def guardrail(state: GuardrailInput) -> GuardrailInput:
    # Write to OverallState
    model = ChatOpenAI(name=settings.local_chat.large_model, api_key=settings.local_chat.large_api_key, base_url=settings.local_chat.large_api_base)
    result = await model.ainvoke(f"Return false or true wether this user query is negative or not {state.get('question')}")
    output = result.content
    return {"output": output}


guardrail_graph = StateGraph(GuardrailInput, output_schema=GuardrailOutput)
guardrail_graph.add_node("guardrail", guardrail)
guardrail_graph.add_edge(START, "guardrail")
guardrail_graph.add_edge("guardrail", END)

guardrail_graph = guardrail_graph.compile()

Short Answer Subgraph

class shortInput(TypedDict):
    question: str

class shortOutput(TypedDict):
    output: str


@observe()
async def short(state: shortInput) -> shortInput:
    model = ChatOpenAI(name=settings.local_chat.large_model, api_key=settings.local_chat.large_api_key, base_url=settings.local_chat.large_api_base)
    # Write to OverallState
    result = await model.ainvoke(f"answer this user query in 30 sentence long: {state.get('question')}")
    output = result.content
    return {"output": output}


short_graph = StateGraph(shortInput, output_schema=shortOutput)
short_graph.add_node("short", short)

short_graph.add_edge(START, "short")
short_graph.add_edge("short", END)

short_graph = short_graph.compile()

Main Graph that calling Subgraph

class MainInput(TypedDict):
    question: str
    guardrail: str
    short: str

class MainOutput(TypedDict):
    guardrail: str
    short: str

@observe()
async def guardrail(state: MainInput) -> MainOutput:
    # Write to OverallState
    
    result = await guardrail_graph.ainvoke({'question': state.get('question')})
    
    return {"guardrail": result}


@observe()
async def short(state: MainInput) -> MainOutput:
    # Write to OverallState
    result = await short_graph.ainvoke({'question': state.get('question')})
    return {"short": result}

@observe()
async def orchestrator(state: MainInput) -> MainOutput:
    # Write to OverallState
    print(state)
    return {"state": state}


main_graph = StateGraph(MainInput, output_schema=MainOutput)
main_graph.add_node("guardrail", guardrail)
main_graph.add_node("short", short)
main_graph.add_node("orchestrator", orchestrator)

main_graph.add_edge(START, "short")
main_graph.add_edge(START, "guardrail")
main_graph.add_edge('short', 'orchestrator')
main_graph.add_edge('guardrail', "orchestrator")
main_graph.add_edge("orchestrator", END)

main_graph = main_graph.compile()

This is how i start my code

@observe()
async def main():
    result = await main_graph.ainvoke({"question": 'hi'})
    return result

result = await main()

This is the running result

I’m seeing a significant latency gap in my short node within this LangGraph implementation. I’ve confirmed that Langfuse is not the cause, and I am already using asynchronous ainvoke throughout the process.

Orchestration Overhead: Is this delay an inherent behavior of nesting StateGraph objects, or is there a bottleneck in how I’ve structured these parallel transitions to the orchestrator?
Architecture Feedback: Do you see any issues with how MainInput and MainOutput are managing the merged state that could be causing a processing lag?

pawel-twardziak · March 7, 2026, 12:08am

Hi @theodevs

what local provider and model are you using? Ollama?

Topic		Replies	Views
Best practices for parallel nodes (fanouts) LangGraph python-help	3	2130	November 27, 2025
How does state work in LangGraph subgraphs? LangGraph python-help	3	499	October 8, 2025
Seeking help with some merge message issues when LangGraph is called in parallel LangGraph intro-to-langgraph , python-help	13	71	February 26, 2026
From Slack: Parallel Nodes LangGraph python-help	2	192	November 15, 2025
Can the subgraphs be executed in parallel? LangGraph python-help	1	495	August 18, 2025

LangGraph Parallelism

Related topics