Integrating Celery + Redis with LangGraph for heavy RAG indexing (chunking, embeddings) — best practices?

Hi everyone :waving_hand:,

I’m currently building a LangGraph-based RAG application and I’m looking for best practices / real-world advice on integrating Celery + Redis to offload heavy background work from LangGraph.

Current architecture (naive, probably wrong)

Right now, everything lives inside LangGraph:

  • LangGraph is used as the orchestrator

  • But it also performs heavy work:

    • PDF parsing (PyMuPDF)

    • chunking (Neural/Semantic chunkers)

    • embeddings (external APIs)

    • vector storage (Qdrant)

  • All of this runs inside LangGraph nodes (async Python)

This worked fine at first, but we’ve clearly made a naive architectural mistake.

The problem

LangGraph (especially in managed cloud environments) has limited CPU resources, and when:

  • 10–100 users upload documents at the same time

  • each upload triggers indexing + chunking + embeddings

we run into:

  • CPU contention

  • long blocking workflows

  • poor throughput

  • no real backpressure or queueing

In short: LangGraph is doing work it probably shouldn’t be doing.

What we want to achieve

We want to move heavy indexing work out of LangGraph and treat LangGraph as a control plane only.

Target direction:

  • LangGraph:

    • orchestration

    • state machine

    • decision-making (does this document need indexing?)

  • Celery + Redis:

    • background indexing

    • chunking

    • embeddings

    • vector DB writes

  • LangGraph should enqueue jobs and continue without blocking

Questions

  1. Is Celery + Redis a reasonable choice for offloading heavy RAG indexing from LangGraph?

  2. What is the recommended integration pattern?

    • Should LangGraph enqueue Celery jobs directly?

    • Should LangGraph poll for completion, or resume via callbacks/webhooks?

  3. How do you model state cleanly?

    • Keep all state in LangGraph store / DB?

    • Pass only IDs to Celery tasks?

  4. Are there any anti-patterns to avoid when combining:

    • LangGraph (workflow orchestration)

    • Celery (task execution)

    • Redis (queue / coordination)

  5. Has anyone implemented a similar setup for:

    • document indexing

    • RAG pipelines

    • multi-user concurrent uploads?

Summary

We’ve learned the hard way that:

Putting CPU-heavy indexing directly inside LangGraph does not scale well.

We’re now looking to decouple orchestration from execution, and any advice, examples, or architecture pointers would be hugely appreciated :folded_hands:

Thanks in advance!