LangGraph thread copy can take 12+ minutes: recommended production pattern?

Hi LangGraph team,

I opened a GitHub issue here with the full details: Thread copy endpoint can take 12+ minutes with no async/progress or shallow-copy option · Issue #7859 · langchain-ai/langgraph · GitHub

We use LangGraph (on a LangSmith US deploy) Platform’s thread copy endpoint:

POST /threads/{thread_id}/copy

to power a user-facing “fork chat” feature. For large conversations, we observed this call taking over 12 minutes.

I understand that a full thread copy may need to copy checkpoint history and writes, so the duration itself may be expected. The production challenge is that the endpoint appears to be synchronous/blocking, and we couldn’t find a documented way to:

  • run the copy asynchronously and poll for completion
  • receive progress/status while copying
  • copy only the latest state/current checkpoint instead of the full checkpoint history
  • estimate whether a thread will be expensive to copy before starting

This is tricky behind serverless/proxy/browser constraints. In our case, AWS Lambda has a 15-minute hard timeout, so long copies can get very close to the runtime ceiling.

My questions:

  1. Is /threads/{thread_id}/copy expected to copy the full checkpoint/write chain?
  2. Is there a supported “shallow copy” pattern for creating a new thread from latest state only?
  3. Is there a recommended production pattern for user-facing fork/copy flows when copies can take several minutes?
  4. Would an async copy API or copy-depth option be feasible?

Thanks!

We adjusted our application to tolerate the current behavior: async backend processing, longer polling, explicit user messaging, and Lambda timeout budgeting. That helps operationally, but it does not solve the underlying uncertainty around a long synchronous copy call.

Hi @gouveags

Going through your four questions with the relevant source/docs pointers:

1. Does /threads/{thread_id}/copy copy the full checkpoint/write chain?

Yes, by contract. BaseCheckpointSaver.copy_thread is explicit that implementations must copy the complete parent chain (all ancestor checkpoints and their checkpoint_writes) - copying only the head would silently break DeltaChannel reconstruction. That’s why there’s no built-in “copy depth” knob. The platform handler (libs/langgraph-api/src/api/threads.mtsstorage/ops.mts) is a thin wrapper that awaits checkpointer.copy(...) synchronously, and the REST schema confirms: no request body, no ?async, no progress endpoint.

One amplifier worth checking: if you’re on BYOC with a custom checkpointer that hasn’t implemented acopy_thread, the platform falls back to a slow path that “re-inserts checkpoints one by one” (custom-checkpointer docs). On a tool-heavy long thread that’s exactly the regime where you hit double-digit minutes.

2. Is there a supported shallow-copy pattern?

Yes - not via copy, but via threads.create(supersteps=...). The SDK docstring literally says “Used for copying a thread between deployments.” and the Prepopulated state doc covers the same shape. It’s O(1) in history size and avoids the DeltaChannel issue because channels get a fresh snapshot through __start__:

state = await client.threads.get_state(thread_id=source_thread_id)

forked = await client.threads.create(
    graph_id="agent",
    metadata={"forked_from": source_thread_id},
    supersteps=[{
        "updates": [{"values": state["values"], "as_node": "__start__"}]
    }],
)

Trade-off: you lose time-travel history and mid-interrupt context. For most “duplicate this conversation” UX that’s fine; for “rewind to any past turn” it isn’t.

3. Recommended production pattern

Two paths, picked by what the user actually needs:

  • Fork latest state (default, user-facing): threads.create(supersteps=...) as above. Single roundtrip, no Lambda timeout risk.
  • Fork with full history: move it off Lambda. Click → enqueue (SQS / Step Functions / Fargate) → return a copy_job_id immediately → worker calls threads.copy(...) with a long HTTP timeout → UI polls. A 12-min synchronous call inside a 15-min Lambda will eventually lose that race no matter how you budget it.

For an “estimate before firing” signal, threads.get_history(thread_id, limit=...) + the size of values is a decent heuristic - flip the UI to “Fork latest state” automatically past a checkpoint-count / payload-size threshold.

Also worth knowing: if you only want to branch from a past point on the same thread, threads.update_state(thread_id, values, checkpoint_id=...) does that without any copy.

4. Feasibility of async API / depth option

Both look feasible. An async mode is straightforward (enqueue + poll, like runs). A “latest only” depth flag would essentially need to go through the same write path as supersteps (write a fresh head, not slice the chain) to stay safe with DeltaChannel. Until that lands, superstepsmode=latest_state and a worker queue ≈ async=true, both supported and stable.

Hope this helps!