Thread copy API fails with KeyError when visiting previously unvisited subgraph nodes

We’re implementing a conversation branching feature where users can fork a thread at any point to explore alternative paths. However, we’ve encountered a critical issue with the thread copy API.

The Problem

When using the Python SDK’s thread copy endpoint (/threads/{thread_id}/copy), the forked thread throws a KeyError when trying to visit any subgraph node that wasn’t visited in the original thread:

KeyError: ‘interrupt_agent’ # Or any other unvisited subgraph name

Root Cause

Our graph architecture uses subgraphs compiled with checkpointer=True:

def create_agent_node(agent: Agent, state: type[T] = GraphState):
  subgraph = StateGraph(state)

  *# ... add nodes and edges ...*

  subgraph = subgraph.compile(checkpointer=True)  # Creates namespace for this subgraph

  return AgentNode(agent=agent, subgraph=subgraph)

Each subgraph maintains its own checkpoint namespace. The issue is that the /copy endpoint only copies namespaces that exist (i.e., nodes that were visited) in the source thread. When the forked thread attempts to visit a new node, LangGraph tries to access a namespace that doesn’t exist, causing the KeyError.

Stack Trace

KeyError(‘interrupt_agent’)Traceback (most recent call last):

File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/pregel/init.py”, line 2655, in astream
async for _ in runner.atick(
…<7 lines>…
yield o

File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langchain_core/tracers/event_stream.py”, line 181, in tap_output_aiter
first = await py_anext(output, default=sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langchain_core/utils/aiter.py”, line 78, in anext_impl
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^

File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langchain_core/runnables/base.py”, line 1485, in atransform
async for ichunk in input:
…<14 lines>…
final = ichunk

File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/pregel/init.py”, line 2596, in astream
async with AsyncPregelLoop(
~~~~~~~~~~~~~~~^
input,
^^^^^^
…<20 lines>…
cache_policy=self.cache_policy,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
) as loop:
^

File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/pregel/loop.py”, line 1339, in aenter
saved = await self.checkpointer.aget_tuple(self.checkpoint_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/checkpoint/memory/init.py”, line 437, in aget_tuple
return self.get_tuple(config)
~~~~~~~~~~~~~~^^^^^^^^

File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph_runtime_inmem/checkpoint.py”, line 109, in get_tuple
return super().get_tuple(config)
~~~~~~~~~~~~~~~~~^^^^^^^^

File “/Users/reecemillsom/IdeaProjects/orchestra/.venv/lib/python3.13/site-packages/langgraph/checkpoint/memory/init.py”, line 178, in get_tuple
if checkpoints := self.storage[thread_id][checkpoint_ns]:
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^

KeyError: ‘interrupt_agent’

During task with name ‘interrupt_agent’ and id ‘3820ef0c-10c4-db8a-45e1-614a97a5527f’

Example Scenario

  1. Original thread visits: [‘’, ‘guide’, ‘audience’] namespaces

  2. Fork thread using /copy API

  3. Forked thread tries to visit interrupt_agent node

  4. Result: KeyError: ‘interrupt_agent’ because this namespace wasn’t copied

Current Workaround

Instead of copying the thread, we create a new thread with only the state values:

# Get state from original thread

history = await client.threads.get_history(thread_id, limit=1)

current_state = history[0].values

new_thread = await client.threads.create(
    graph_id="conversation",
    metadata={"parent_thread": thread_id},
    # Pass state as initial values, not checkpoint data
    config={"configurable": {"thread_id": new_thread_id}},
    input={"messages": current_state["messages"], ...}
)

This works because the new thread initializes all subgraph namespaces on-demand as nodes are visited.

Notes

One thing to note that is important for us, is we believe we will need checkpoint history when branching from one conversation to another. In order to be able to do this multiple times ‘in the new threads’.

Questions

  1. Is this a known limitation of the copy API with subgraph checkpointers?

  2. Is there a way to force initialization of all potential namespaces during copy?

  3. Would you recommend our workaround approach, or is there a better pattern for thread branching?

Environment

  • LangGraph version: 0.3.33

  • Python: 3.13

  • Checkpointer: PostgreSQL-backed

Any guidance would be greatly appreciated as this is blocking our branching feature.

Hi @reecemillsom is it a MUST that the copy has a new thread_id?

If so, then try this

Rehydration (supersteps)

history = await client.threads.get_history(src_thread_id)
supersteps = []
for step in history:
    supersteps.append({
        "updates": [{
            "values": step["values"],      # prune to keys you care about
            "as_node": step["metadata"].get("last_node", "__start__")
        }]
    })

new_thread = await client.threads.create(
    graph_id="conversation",
    metadata={"parent_thread": src_thread_id},
    supersteps=supersteps,
)

If not, then maybe this

Time-travel (branch) within the same thread

# start a new branch from a prior checkpoint within the SAME thread
cp = (<checkpoint_id you want to branch from>)
await client.runs.wait(
    thread_id=thread_id,
    assistant_id=assistant_id,   # or graph_id depending on your setup
    input=None,                  # resume from checkpoint
    config={"configurable": {"thread_id": thread_id, "checkpoint_id": cp}},
)

Let us know if any of those works :slight_smile:

Hi @pawel-twardziak ,

Thanks for helping here.

I will likely be looking at this today / tomorrow.

I will come back to you as soon as I have tried your suggestion.

We would like to try the new thread option due to some limitations we were seeing with a technology we are using with Langgraph.

Thanks,
Reece

Hi @pawel-twardziak

Firstly apologies for the slow reply.

Thank you so much for your help, this was insightful.

We are using the TS SDK on the client for this feature, but what you shared helped me regardless.

We are now able to branch a thread from the end, and we are also branching in various points in the thread.

Thanks for all your help :smiley:

Reece

Hi @reecemillsom
Great to hear! I am happy for your success :slight_smile: