We’re implementing a conversation branching feature where users can fork a thread at any point to explore alternative paths. However, we’ve encountered a critical issue with the thread copy API.
The Problem
When using the Python SDK’s thread copy endpoint (/threads/{thread_id}/copy), the forked thread throws a KeyError when trying to visit any subgraph node that wasn’t visited in the original thread:
KeyError: ‘interrupt_agent’ # Or any other unvisited subgraph name
Root Cause
Our graph architecture uses subgraphs compiled with checkpointer=True:
def create_agent_node(agent: Agent, state: type[T] = GraphState):
subgraph = StateGraph(state)
*# ... add nodes and edges ...*
subgraph = subgraph.compile(checkpointer=True) # Creates namespace for this subgraph
return AgentNode(agent=agent, subgraph=subgraph)
Each subgraph maintains its own checkpoint namespace. The issue is that the /copy endpoint only copies namespaces that exist (i.e., nodes that were visited) in the source thread. When the forked thread attempts to visit a new node, LangGraph tries to access a namespace that doesn’t exist, causing the KeyError.
Stack Trace
KeyError(‘interrupt_agent’)Traceback (most recent call last):
File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/pregel/init.py”, line 2655, in astream
async for _ in runner.atick(
…<7 lines>…
yield oFile “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langchain_core/tracers/event_stream.py”, line 181, in tap_output_aiter
first = await py_anext(output, default=sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langchain_core/utils/aiter.py”, line 78, in anext_impl
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langchain_core/runnables/base.py”, line 1485, in atransform
async for ichunk in input:
…<14 lines>…
final = ichunkFile “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/pregel/init.py”, line 2596, in astream
async with AsyncPregelLoop(
~~~~~~~~~~~~~~~^
input,
^^^^^^
…<20 lines>…
cache_policy=self.cache_policy,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
) as loop:
^File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/pregel/loop.py”, line 1339, in aenter
saved = await self.checkpointer.aget_tuple(self.checkpoint_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/checkpoint/memory/init.py”, line 437, in aget_tuple
return self.get_tuple(config)
~~~~~~~~~~~~~~^^^^^^^^File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph_runtime_inmem/checkpoint.py”, line 109, in get_tuple
return super().get_tuple(config)
~~~~~~~~~~~~~~~~~^^^^^^^^File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/checkpoint/memory/init.py”, line 178, in get_tuple
if checkpoints := self.storage[thread_id][checkpoint_ns]:
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^KeyError: ‘interrupt_agent’
During task with name ‘interrupt_agent’ and id ‘3820ef0c-10c4-db8a-45e1-614a97a5527f’
Example Scenario
-
Original thread visits: [‘’, ‘guide’, ‘audience’] namespaces
-
Fork thread using /copy API
-
Forked thread tries to visit interrupt_agent node
-
Result: KeyError: ‘interrupt_agent’ because this namespace wasn’t copied
Current Workaround
Instead of copying the thread, we create a new thread with only the state values:
# Get state from original thread
history = await client.threads.get_history(thread_id, limit=1)
current_state = history[0].values
new_thread = await client.threads.create(
graph_id="conversation",
metadata={"parent_thread": thread_id},
# Pass state as initial values, not checkpoint data
config={"configurable": {"thread_id": new_thread_id}},
input={"messages": current_state["messages"], ...}
)
This works because the new thread initializes all subgraph namespaces on-demand as nodes are visited.
Notes
One thing to note that is important for us, is we believe we will need checkpoint history when branching from one conversation to another. In order to be able to do this multiple times ‘in the new threads’.
Questions
-
Is this a known limitation of the copy API with subgraph checkpointers?
-
Is there a way to force initialization of all potential namespaces during copy?
-
Would you recommend our workaround approach, or is there a better pattern for thread branching?
Environment
-
LangGraph version: 0.3.33
-
Python: 3.13
-
Checkpointer: PostgreSQL-backed
Any guidance would be greatly appreciated as this is blocking our branching feature.