Hi everyone
,
I’m currently building a LangGraph-based RAG application and I’m looking for best practices / real-world advice on integrating Celery + Redis to offload heavy background work from LangGraph.
Current architecture (naive, probably wrong)
Right now, everything lives inside LangGraph:
-
LangGraph is used as the orchestrator
-
But it also performs heavy work:
-
PDF parsing (PyMuPDF)
-
chunking (Neural/Semantic chunkers)
-
embeddings (external APIs)
-
vector storage (Qdrant)
-
-
All of this runs inside LangGraph nodes (async Python)
This worked fine at first, but we’ve clearly made a naive architectural mistake.
The problem
LangGraph (especially in managed cloud environments) has limited CPU resources, and when:
-
10–100 users upload documents at the same time
-
each upload triggers indexing + chunking + embeddings
we run into:
-
CPU contention
-
long blocking workflows
-
poor throughput
-
no real backpressure or queueing
In short: LangGraph is doing work it probably shouldn’t be doing.
What we want to achieve
We want to move heavy indexing work out of LangGraph and treat LangGraph as a control plane only.
Target direction:
-
LangGraph:
-
orchestration
-
state machine
-
decision-making (does this document need indexing?)
-
-
Celery + Redis:
-
background indexing
-
chunking
-
embeddings
-
vector DB writes
-
-
LangGraph should enqueue jobs and continue without blocking
Questions
-
Is Celery + Redis a reasonable choice for offloading heavy RAG indexing from LangGraph?
-
What is the recommended integration pattern?
-
Should LangGraph enqueue Celery jobs directly?
-
Should LangGraph poll for completion, or resume via callbacks/webhooks?
-
-
How do you model state cleanly?
-
Keep all state in LangGraph store / DB?
-
Pass only IDs to Celery tasks?
-
-
Are there any anti-patterns to avoid when combining:
-
LangGraph (workflow orchestration)
-
Celery (task execution)
-
Redis (queue / coordination)
-
-
Has anyone implemented a similar setup for:
-
document indexing
-
RAG pipelines
-
multi-user concurrent uploads?
-
Summary
We’ve learned the hard way that:
Putting CPU-heavy indexing directly inside LangGraph does not scale well.
We’re now looking to decouple orchestration from execution, and any advice, examples, or architecture pointers would be hugely appreciated ![]()
Thanks in advance!