Hi everyone
I’ve been exploring LangGraph’s caching system and had an idea that might push it to the next level.
While LangGraph today supports Redis-based caching for deterministic calls (exact-prompt reuse), most agent graphs still end up re-reasoning semantically identical steps,costing both tokens and latency.
Idea: GraphCache, A semantic, state-aware cache for LangGraph
The goal is to cache reasoning patterns, not just string inputs.
Key Features
-
Semantic caching:
Uses embeddings to match equivalent prompts across runs (e.g., “summarize this week’s Jira tickets” ≈ “get this week’s Jira summary”). -
State-aware keys:
Cache key includes graph node + upstream state fingerprint, so it’s context-safe. -
Automatic cache warming:
Precomputes results for frequently hit subgraphs. -
Analytics layer:
Tracks cost savings, latency reduction, and cache-hit ratios (Prometheus/Grafana compatible).
Why it matters
- Agents often repeat 60–80 % of their reasoning across users or sessions.
Intelligent reuse could cut cost by 40–70 % while improving response speed dramatically.
Implementation direction
I’m considering building this as a plug-in layer:
-
before_node_runhook → check semantic cache (Redis + FAISS backend) -
after_node_runhook → store embeddings + output + state fingerprint -
Optional Prometheus metrics + Grafana dashboard
Would love to hear thoughts on:
-
Whether this aligns with LangGraph’s roadmap for caching/memory
-
Best way to expose hooks (middleware vs node-decorator)
-
Any prior work or open design discussions I can build on
If there’s interest, and a green signal, would really love to contribute in some or other form.
Thanks,
Akilesh KR