StateGraph does not continue state transitions after task cancellation

Hi Langgraph team,
I ran into something I’m not sure about regarding task cancellations in StateGraph.
I was wondering if you could help me understand whether this is intended behavior, or if there’s a recommended way to handle cancellations properly. Any guidance or suggestions would be greatly appreciated! :grinning_face:

Description

When using StateGraph with astream to build a state machine, if a stage catches an asyncio.CancelledError and returns a Command (e.g., to transition to another node), the state machine does not continue executing that Command.

In other words, when the outer task is cancelled, the astream generator immediately raises CancelledError, preventing the next stage from running. It is unclear whether this is intended behavior or a potential issue.


Minimal Reproducible Example

import asyncio
from asyncio import CancelledError

from langgraph.constants import START, END
from langgraph.graph import StateGraph
from langgraph.types import Command
from pydantic import BaseModel, Field


class GraphState(BaseModel):
    sleep_seconds: int = Field(..., description="sleep seconds", gt=0)
    result: str | None = Field(None, description="result")


async def work_stage(state: GraphState, writer):
    writer("# run stage: work")
    try:
        writer(f"---> sleep for total {state.sleep_seconds} seconds")
        for i in range(state.sleep_seconds):
            writer(f"---> sleeping: {i + 1}")
            await asyncio.sleep(1)
        writer(f"---> awake")
    except CancelledError:
        print("................ ️cancellation detected ................")
        writer("❗️cancellation detected, next goto cleanup stage")
        return Command(goto="cleanup_stage")

    return Command(goto=END, update={"result": "done"})


async def cleanup_stage(state: GraphState, writer):
    writer("# run stage: cleanup")
    return Command(goto=END, update={"result": "cleanup"})


graph_builder = StateGraph(GraphState)
graph_builder.add_node("work_stage", work_stage)
graph_builder.add_node("cleanup_stage", cleanup_stage)
graph_builder.add_edge(START, "work_stage")
graph = graph_builder.compile()


async def main():
    async def _task():
        async for chunk_type, chunk in graph.astream(
                input={'sleep_seconds': 10},
                stream_mode=['custom'],
        ):
            print(chunk)

    task_future = asyncio.create_task(_task())

    await asyncio.sleep(3.5)
    print("................ now cancel ................")
    task_future.cancel()

    await task_future


asyncio.run(main())


Actual Behavior

  • The except CancelledError block in work_stage is triggered, and the warning log is printed.

  • However, the state machine does not execute the returned Command.

  • Ultimately, the task raisesCancelledError.

  • Console prints:

# run stage: work
---> sleep for total 10 seconds
---> sleeping: 1
---> sleeping: 2
---> sleeping: 3
---> sleeping: 4
................ now cancel ................
................ ️cancellation detected ................
Traceback (most recent call last):
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1570, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhengyun/PycharmProjects/bit-Agent-server4p/demo.py", line 60, in <module>
    asyncio.run(main())
    ~~~~~~~~~~~^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/base_events.py", line 720, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/Users/zhengyun/PycharmProjects/bit-Agent-server4p/demo.py", line 57, in main
    await task_future
  File "/Users/zhengyun/PycharmProjects/bit-Agent-server4p/demo.py", line 45, in _task
    async for chunk_type, chunk in graph.astream(
    ...<3 lines>...
        print(chunk)
  File "/Users/zhengyun/Library/Caches/pypoetry/virtualenvs/bit-agent-server4p-pg9iQrBf-py3.13/lib/python3.13/site-packages/langgraph/pregel/__init__.py", line 2655, in astream
    async for _ in runner.atick(
    ...<7 lines>...
            yield o
  File "/Users/zhengyun/Library/Caches/pypoetry/virtualenvs/bit-agent-server4p-pg9iQrBf-py3.13/lib/python3.13/site-packages/langgraph/pregel/runner.py", line 368, in atick
    done, inflight = await asyncio.wait(
                     ^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/tasks.py", line 451, in wait
    return await _wait(fs, timeout, return_when, loop)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/tasks.py", line 537, in _wait
    await waiter
asyncio.exceptions.CancelledError

Expected Behavior

  • When a stage catches CancelledError and returns a Command, the state machine should execute that Command rather than being immediately interrupted by the asyncio cancellation.

  • This would allow “safe exit” or cleanup logic to run even when the task is cancelled.


Additional Information

  • Python version: 3.13

  • Langgraph version: 0.4.10

  • Environment: macOS + PyCharm

hi @2HENGYUN

are you using FastAPI?
If so, there is a general approach for how to control task cancellation with state persistence Stopping endpoint for deep agents

Thanks for the reply!

Just to clarify what I’m really trying to understand (separate from FastAPI or persistence):

If a node gets an asyncio.CancelledError because the outer task is hard-cancelled (for example via task.cancel()), and the node catches that exception and returns a Command (expecting to move on to the next node, like a cleanup node)

:backhand_index_pointing_right: is it expected that the graph will not continue to the next node?

From what I’m seeing, even though the node handles CancelledError and returns a Command, the astream task itself is already cancelled, so the graph never proceeds.

I’m mostly trying to understand whether:

  • this is the intended behavior / mental model in LangGraph, or

  • this is more of an edge case or limitation that might change in the future.

Not necessarily looking for a workaround right now — just want to make sure I understand the cancellation semantics correctly.

Thanks again, appreciate the help!

hi @2HENGYUN

Yes - with the current LangGraph implementation this behavior is expected.
Once you cancel the outer task that is running graph.astream(...), the whole run is treated as cancelled, and the graph will not schedule further steps (including a “cleanup” node) even if an inner node catches asyncio.CancelledError and returns a Command.

Your work_stage catching CancelledError is local cleanup for that node; it does not turn a hard cancellation of the run into a normal graph transition.

The core of what you’re seeing is driven by how asyncio.Task cancellation works in Python.

Very condensed version:

  • Cancelling a Task is “top‑level”

When you call task.cancel(), Python marks that Task as cancelled and injects an asyncio.CancelledError at its next await. That’s happening in the task that is running graph.astream(…), not just inside your node.

  • Catching CancelledError in an inner coroutine doesn’t “uncancel” the outer Task

Even if an inner coroutine (work_stage) catches CancelledError and returns normally, the Task that’s driving the whole run is still in a cancelled state. As soon as control returns to another await in that Task (e.g. inside runner.atick / astream), CancelledError will surface again and unwind the whole thing.

  • LangGraph chooses not to override that default

LangGraph doesn’t try to swallow CancelledError at the top level; it lets it propagate out of astream / ainvoke. That’s why your Command(goto=“cleanup_stage”) doesn’t get a chance to drive another tick: the outer Task has already been told “you are cancelled”, and LangGraph respects that.

So: it’s the combination of Python’s Task.cancel() semantics (cancellation is sticky for the whole Task) plus LangGraph’s decision to let that cancellation end the run rather than turning it into an in-graph transition.

Thanks for the explanation — that answers my question. Really appreciate it! :smiley:

1 Like

hi @2HENGYUN

if that answers your question, huge favor, please mark the post as Solved, just for others to make use of it as well :slight_smile: