Free Self-hosted LangGraph Agent Server: how to separate API and queue workers using Docker

Hi everyone,

I’m currently running the LangGraph Agent Server in a self-hosted setup (free version), using the langchain/langgraph-api Docker image.

From the scaling documentation, I understand that in a production setup there is a separation between:

  • API servers (handling HTTP requests)

  • Queue workers (executing runs asynchronously via Redis)

However, in my case I only have a single Docker image configured via environment variables.

My question is:
How can I run the Agent Server in “worker mode” using the Docker image?

More specifically:

  • Is there a specific command, entrypoint, or environment variable to start the container as a queue worker instead of an API server?

  • Or is this mode only available in the LangSmith / managed or Helm-based deployments?

  • If worker mode is supported, what is the expected way to connect it to Redis and have it consume jobs?

For context:

  • I already have Redis configured (REDIS_URI)

  • Currently, my container handles both API and execution

I’m trying to understand how to properly separate API and worker responsibilities in a self-hosted setup.

Thanks a lot!

hi @gdrouet

imho you don’t need Helm or any licensed image for this. The langchain/langgraph-api image already contains a standalone queue-worker entrypoint, so you can reproduce the “Split API and queue” runtime (from Agent Server → Runtime architecture) with plain docker compose: run the same image twice, set N_JOBS_PER_WORKER=0 on the API container so it stops claiming runs, and on the worker container override the entrypoint with ["python", "-m", "langgraph_api.queue_entrypoint"] - both pointed at the same REDIS_URI and POSTGRES_URI. That’s it. The fully Distributed runtime with langchain/langgraph-orchestrator-licensed + langchain/langgraph-executor (what langgraph dockerfile --engine-runtime-mode distributed emits) is a separate, paid tier and not required for this. Just keep ≥1 worker alive (pending runs with no listener become orphaned - explicitly called out in the docs) and build a single image so both pools have your graph code. Happy to share a full compose snippet if helpful.

It works perfectly thanks!