How to update graph state while preserving interrupts?

hi @tharshan-elvex

I am back :slight_smile:
Question: do any threads have pending tasks (interrupts) in checkpointer tables?

Hello @pawel-twardziak, yes if I run this query

SELECT * FROM "public"."checkpoint_writes" WHERE "channel"::TEXT LIKE '%__interrupt__%' LIMIT 100 OFFSET 0;

Which returns 62 rows. Which I believe tells us there are 62 threads currently at the interrupt state.

Hi @tharshan-elvex
Ok, thanks for the answer, will continue on investigating :+1:

Hi @tharshan-elvex I am checking in here to keep you in the loop - I still can’t find any native way to proceed with the migration. Each approach wipes out all pending tasks so far.

I will surely inform you if any findings emerge :slight_smile:

1 Like

Thanks for the update @pawel-twardziak. What other options do we have if there is no native way?

Hi @tharshan-elvex

you mentioned that you already have a script that does the migration in the correct way, am I right?

There is an approach i’ve tried where we modify the LC checkpointer tables directly. But I am not sure it’s a good idea, as we don’t fully understand the structure and it’s not well documented.

You are absolutely right. Modifying the table directly is not the best pattern. But for now, I haven’t found any better solution. I might explore more in the near future.

There are some other ideas in my head, but I am short on time to give all of them a shot :pensive_face:

@pawel-twardziak hey Pawel - did you have any other ideas as an alternative to the Raw SQL. Maybe we could explore them together to find a solution.

hi @tharshan-elvex

I would love to, but as I mentioned - always short on time :cry:

What I actually want to explore and maybe draw some inspiration from are the tests:

Have you happened to dig in there?

@pawel-twardziak okay thanks for the links! I will take a look. The last two links you posted are the same btw. Did you mean to share another file?

Yes, sorry, thanks for pointing it out. It should be https://github.com/langchain-ai/langgraph/blob/8b55dff7a52540fece79f4ebf62d5c1457d72377/libs/langgraph/tests/test_interrupt_migration.py

In general, there are some testing files in the folder worth exploring imho

Hi @tharshan-elvex

Since shallow checkpointer will not preserve interrupt during migration, I think this would be useful as more native approach:

from langgraph.types import Interrupt
# 1) Capture existing interrupts
saved = graph.checkpointer.get_tuple(cfg)  # cfg has thread_id (and optionally checkpoint_ns)
existing_interrupt_writes = [
    (task_id, val)
    for (task_id, chan, val) in (saved.pending_writes or [])
    if chan == "__interrupt__"
]

# 2) Migrate your values (e.g., messages)
migrated = migrate_messages((graph.get_state(cfg)).values["messages"])
next_cfg = graph.update_state(cfg, {"messages": migrated})  # returns config with new checkpoint_id

# 3) Re-attach interrupts to the new checkpoint (preserve original task_ids)
for task_id, interrupt_obj in existing_interrupt_writes:
    graph.checkpointer.put_writes(
        next_cfg,
        [("__interrupt__", interrupt_obj)],
        task_id=task_id,
    )

In migrate_messages all AI and task messages can get modified as before.

Hey @pawel-twardziak

Ive tries this approach, and there seems to be a new checkpoint generated, but also some of the old state stays in the old checkpoints. Here is the final state of the checkpoint_writes table.

So this causes the new migrated interrupt to be orphaned.

Were you able to validate this approach working and leaving a valid checkpoint state?

Let me clarify with a before and after of the migration with this new approach:

Before:

(Can only post one image at a time). So posting one image in this post.

After:

  • branch:to:llm is lost
  • two interrupt channels exist, vs a replacement
  • messages channel task_path is lost