Seeking help with some merge message issues when LangGraph is called in parallel

When I’m using Graph to build a graph, I want to execute two decision processes at the same time, like calling a large model to check the weather and another large model to check locations. I set up two large model decision nodes separately, and each of them can call their respective tool nodes, with the tool nodes calling the tools and appending the ToolMessages to the state. But when appending, it seems there was a conflict? I don’t know why it gave an error. I’m using add_node to directly add two parallel nodes. If anyone is kind enough to help me understand this, I can upload my code to GitHub. Please help me, experts!

hi @Huimin-station

hmm you’re very likely hitting a state merge conflict on the same key in the same superstep..

In LangGraph, parallel branches can run in one step, and if both branches write to one state key that has default semantics, that key behaves like “last write wins / single value per step.” If more than one update arrives, LangGraph raises InvalidUpdateError with INVALID_CONCURRENT_GRAPH_UPDATE.

Are you reporting INVALID_CONCURRENT_GRAPH_UPDATE erro?

1 Like

Hello, first of all, thank you for your reply. I often see your comments. Regarding the problem I encountered, since I am a beginner, I might make some basic mistakes. I have a global state that contains a list of Messages [accumulating]. My decision node will send a tool call (tool_call) to add to the Messages list in the state. After calling the tool node, a ToolMessages is concatenated to the state. It seems like a ToolMessage must follow a tool_call? I am calling in parallel, which causes a tool_call to receive another tool_call? Is this causing the error? I don’t quite understand. Do I need to merge the decision node and the tool call node, packaging tool_call and ToolMessages together? Would that solve the problem?

Can’t parallel steps modify the same state key? Do they each need a separate key? In the end, is the data unified through a join or merge? Previously, I was just accumulating messages.

@Huimin-station

thanks for following-up :slight_smile: WOuld that be doable to share your code? It’s hard to say without it.
How is your graph structured?

Does a ToolMessage have to follow a tool_call?

yes, each tool_call in an AIMessage must eventually have a matching ToolMessage (same tool_call_id).
LangGraph validates this (INVALID_CHAT_HISTORY) because most providers require that structure.

Can parallel calls make one tool_call receive another tool_call?

Yes, that can happen with a shared global messages list and naive tool-node routing.
A common pattern (including quickstart examples) reads tool calls from the last AI message in history.
If two branches append AI messages in parallel, “last message” can be from the other branch, so a tool node may execute the wrong call set or leave some calls unmatched.

Should I merge decision node + tool node into one node?

Not really, you don’t need to merge them into one giant node

Better options:

  • one decision node (llm) emits multiple tool_calls in one AIMessage, then one tool-execution stage handles them
  • if you keep branch parallelism - pass the exact call payload to each tool execution (instead of letting each tool node infer from global messages[-1])

Also ensure messages uses reducer semantics (add_messages) so parallel updates to message history merge safely.

  1. kkeep messages: Annotated[..., add_messages]
  2. let model produce tool calls
  3. route tool calls deterministically to tool execution (not by “whichever AIMessage is last”)
  4. Ensure every tool call ID gets exactly one matching ToolMessage
  5. loop back to the model only after those tool results are in state
1 Like

First of all, I want to express again that it is an honor to receive your guidance. I would be very grateful if you could help me check the code for errors. Here is the link to my repository: GitHub - Huimin-station/Graph-test: It's a test project about the langgraph . Since I am from China, there may be some Chinese comments that could affect your reading. However, I will also try my best to understand your guidance.

1 Like

Can parallel steps modify the same state key?

Yes, if the key is reducer-annotated.
In StateGraph, each key can define a reducer (Value, Value) -> Value to aggregate concurrent updates.
Without a reducer, concurrent writes to the same key in one step raise INVALID_CONCURRENT_GRAPH_UPDATE.

Do they each need separate keys?

Not required, but often cleaner.

You have two valid designns:

  • Shared key + reducer
    Example: messages: Annotated[list[AnyMessage], add_messages]
    Good when branches contribute to one logical stream
  • Separate branch keys + explicit merge node
    Example: weather_msgs, location_msgs, then a downstream merge node combines them.
    Better when you need deterministic branch ownership or clearer debugging

Is unification via join or merge?

Both concepts matter:

  • State merge happens automatically each step via per-key reducers.
  • Join (fan-in) controls execution order: add_edge(["b", "c"], "d") means d runs only after both b and c complete.

reducer handles how values combine
join edge handles when downstream runs

For your “accumulating messages” case
Use add_messages on the messages key. That is the intended reducer for message histories.
Also kep message protocol valid:

  • each AIMessage.tool_calls[i].id should get a matching ToolMessage(tool_call_id=...).

If your tool flow is standard, coonsider using create_agent to avid many manual routing pitfalls

1 Like

alright, let me inspect that codebase. Will get back to you soon :slight_smile:

1 Like

hi @Huimin-station

despite some issues described below, your graph decomposition (separate model/tool/route nodes) is a strong start, and your questions show you’re debugging at the right level.

The issues I found are very common in early agent workflows - you’re on a solid track bro :flexed_biceps:

Really nice work sharing your repo openly and iterating in public, I truly appreciate! :heart:

some quick findings:

High: blank_node returns full state, which can duplicate message history

  • File: agent/nodes/tools_nodes.py (blank_node)
  • Problem: blank_node returns the entire state instead of a partial update
    With current messages reducer set to operator.add, this can append full history back into itself
  • Impact: Message list can grow incorrectly (duplicate history), increase token costs, and break logic relying on “last message”
  • Fix: Return {} from pass-through nodes (or explicit minimal updates only).
    Also switch message reducer to add_messages (see finding #2), which is designed for message lists

High: Message state uses operator.add instead of add_messages

  • File: agent/messages_state/messages_state.py
  • Problem: messages uses Annotated[list[AnyMessage], operator.add].
  • Impact: Raw list concatenation does not provide message-aware merge semantics (id-aware updates, safe merge behavior for chat state). This is fragile for tool-calling loops and especially unsafe when branches are parallelized
  • Fix: Use LangGraph’s built-in message reducer:
    • from langgraph.graph.message import add_messages
    • messages: Annotated[list[AnyMessage], add_messages]

High: Tool dispatch is manual and unvalidated; vulnerable to call/result mismatch in parallel scenarios

  • Files: agent/nodes/tools_nodes.py, agent/nodes/choose.py, agent/main.py
  • Problem: Custom tool nodes always read from state["messages"][-1].tool_calls and do not validate tool name dispatch against a tool registry
  • Impact: If multiple AI messages/tool-call batches appear in close sequence (or if you enable parallel branches), tool-call/result pairing can drift, producing invalid chat history and hard-to-debug behavior
  • Fix: Use ToolNode/agent runtime defaults (prefer create_agent for standard agent loops), or at minimum:
    • route a specific tool call payload to each tool executor
    • validate tool name before invocation
    • guarantee 1 ToolMessage per tool_call_id

High: Repository is not runnable out of the box (missing dependency manifest and missing config module)

  • Files: project root, utils/model_builder.py
  • Problem: No requirements.txt/pyproject.toml in repo; runtime immediately fails on missing langchain_core
    utils/model_builder.py imports utils.key.deepseek, but utils/key.py is not committed
  • Impact: Reproducibility is broken; reviewers cannot run or verify behavior
  • Fix: Add packaging and setup docs:
    • committed dependency manifest
    • .env.example
    • load API key from environment via os.getenv, not a local ignored module

Medium: Control-flow logic is brittle ("True"/"False" exact string matching)

  • File: agent/nodes/choose.py
  • Problem: search_or_not checks exact text equality on model output (== "False").
  • Impact: Small output variation ("false", "False.", localized text) changes graph routing unexpectedly.
  • Fix: Use structured outputs / tool-call / strict schema for decision nodes, or normalize/parse robustly.

Medium: Entry script executes immediately on import

  • File: agent/main.py
  • Problem: Graph streaming runs at module import time.
  • Impact: Importing for tests or reuse triggers real execution side effects.
  • Fix: Wrap execution in if __name__ == "__main__":.

Medium: Incorrect boolean expression in stream filtering

  • File: agent/main.py
  • Problem: ("False" or "True") always evaluates to "False".
  • Impact: Intended filtering logic is incorrect.
  • Fix: Replace with explicit set check, e.g.:
    • chunk[-1][0].content not in {"False", "True"}.

Medium: Relative output path in PNG helper is unstable

  • File: utils/png_print.py
  • Problem: Writes to ../agent/graph_show/graph.png relative to CWD, not module path.
  • Impact: Output can go to wrong location depending on execution directory.
  • Fix: Use pathlib.Path(__file__)-based absolute path resolution.

Medium: Tool API shape inconsistency

  • File: agent/tools/base_tools.py
  • Problem: search_local_position(city: str) requires a city argument even though description says it should fetch current user location.
  • Impact: Model may fail to provide required args; unnecessary invocation errors.
  • Fix: Align signature with intended behavior (search_local_position()) or rename and update prompt/docs.

Low: Dead/placeholder modules and weak tests

  • Files: agent/tools/all_tools.py, agent/tools/mcp_tools.py, test/test_01.py
  • Problem: Empty modules and a non-test script in test/.
  • Impact: Noise and no confidence from automated tests.
  • Fix: Remove placeholders or implement them; add real tests for:
    • graph routing,
    • tool-call/tool-result pairing,
    • reducer behavior under repeated/branch execution.
1 Like

First of all, thank you for carefully reviewing my messy code—this is likely due to the fact that I’m still a beginner. I also appreciate the standardized guidance you’ve provided; I’ve gained a great deal from it. Additionally, due to the time difference, it’s time for me to rest, so please forgive me if I don’t reply promptly.

  1. At that time, I used blank_node to route to two parallel nodes, but I overlooked that it might also concatenate the context.

  2. I had followed some online tutorials, so I chose operator.add without being aware of add_messages.

  3. Could this be the reason why my parallel calls failed?

  4. I was eager to find a solution and only uploaded a demo. I apologize for the inconvenience this caused you when reviewing my code.

…… Thank you again for your guidance.

My pleasure @Huimin-station :slight_smile: Sleep well!

  1. Could this be the reason why my parallel calls failed?

imho, very likely yes - at least one major reason. Check it.

The combination of:

  • full-state return from blank_node, and
  • raw list concatenation (operator.add)

can corrupt message history quickly under branching/parallel flow, which then causes downstream tool-call logic to behave incorrectly

No need to apologize. This is exactly how good debugging happens: you shared a minimal reproducible project, asked precise follow-ups, and iterated quickly. That is excellent engineering behavior.

Hello there! It’s another wonderful day. Thank you so much for your guidance yesterday. I’ve optimized my code accordingly and also implemented parallel processing.:partying_face:

This is my previous flowchart:

This is the flowchart revised based on your suggestions:

Did I make those changes?

1. Based on your suggestion, I removed my empty transfer nodes because they were unnecessary.

2. I stopped using custom tool calls and switched to ToolNode(), which saves me time and helps avoid mistakes.

3. I only kept one decision node, allowing decisions for multiple tools to be called in parallel.

Thinking about your suggestion, I do feel that multiple decision nodes are redundant. The purpose of using them in parallel is to save execution time and respond to users quickly, while using separate decision nodes does not save time; only using the tools separately will save time.

4. I will also provide the node that obtains the current time as a tool for the decision node to call.

Thank you for your help; you are a very good teacher. I appreciate you taking the time to guide me. I enjoy programming and the feeling of having control over the rules, which is why I study programming. However, due to my own circumstances, I don’t have access to many courses and have been relying on the official documentation, trying to understand it on my own, which inevitably leads to some misunderstandings. Thank you for your guidance.:grinning_face:

1 Like

hi there @Huimin-station

yep, sunny day today :slight_smile: happy the optimization works! :rocket:

let me go through your last message and I’ll get back to you soonish

1 Like

Awesome! This is a strong improvement, and your revised flow matches the right direction :flexed_biceps:

First, excellent work iterating this quickly.

Removing the empty transfer node

Good change.
That node was structurally redundant and could accidentally re-propagate state in unsafe ways.
Your new flow is cleaner and has fewer places for state merge errors.

Switching to ToolNode()

Also a very good change.
Using ToolNode reduces manual wiring mistakes around tool call execution and ToolMessage formatting/pairing etc.

One decision node + parallel tool execution

Your reasoning is correct.
Multiple decision (LLM) nodes usually add extra model hops (latency/cost), while the main speed-up comes from parallelizing tool execution after one decision step.
Your revised flowchart reflects this correctly.

Making “current time” a tool

This is valid, especially if “current time” is not always needed.
One nuance:

  • if time is needed for every request, inject it once in context/state (cheaper)
  • if time is conditional, exposing it as a tool is a good pattern

:flexed_biceps: You are doing exactly what strong engineers do: build, observe failures, simplify architecture, and iterate with evidence. Good job @Huimin-station :100: