I’ve been experimenting with RAG systems for evaluation-heavy tasks where single-pass retrieval + generation often leads to shallow or biased outputs.
I built a small applied prototype that explores an intent-aware, multi-stage reasoning pipeline:
- queries are routed by intent (research / idea evaluation / hackathon vs others)
- evaluation-heavy intents follow the same deep reasoning path
- hybrid retrieval is used for grounding
- positives and negatives are reasoned in separate LLM passes
- both are synthesized into structured guidance rather than a single free-form response
The goal is not feature design, but understanding whether structured reasoning improves evaluation quality and reduces bias compared to a single critique or reflection loop.
I’ve included a minimal, documented repo with one concrete end-to-end example showing the full reasoning flow: GitHub - Samanvith1404/Intent-Aware-Multi-Stage-RAG-Reasoning-System: An applied AI system using LLM routing, hybrid retrieval, and structured positive/negative reasoning for decision support.
From an architectural perspective, does separating positives and negatives before synthesis meaningfully improve evaluation quality in RAG systems? Are there obvious failure modes or trade-offs I may be overlooking?
Any feedback would be appreciated.