Hi everyone,
I’m building a local study assistant for university textbooks (mainly PDFs) using a fairly sophisticated RAG stack, but I’m struggling with two persistent issues that significantly hurt user experience:
-
Wrong / inconsistent page citations — The model often cites pages that don’t actually contain the claimed information, or the right sidebar shows different pages than what the model referenced in the answer.
-
Occasional hallucinations + repetition — Sometimes the model starts repeating words/phrases mid-sentence or adds plausible but ungrounded information.
My current architecture:
-
Document processing: MinerU (quality mode) + PyMuPDF (fast mode) → Markdown with markers
-
Chunking: Custom ParentChildChunker using MarkdownHeaderTextSplitter + RecursiveCharacterTextSplitter
-
Parents: larger sections (~300-2400 chars)
-
Children: ~500 char chunks with overlap for retrieval
-
-
Vector store: FAISS (multilingual-e5-base) + BM25 hybrid with RRF fusion
-
Reranking: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
-
Context building: Retrieve → rerank → parent expansion (using ParentStore) → limited to ~9000 chars
-
Generation: LangGraph pipeline (rewrite → retrieve → rerank → expand → generate) with gemma3:4b (Ollama), temp=0.0-0.1, repeat_penalty=1.15
-
Main problems I see:
-
Parent vs Child mismatch: When I expand to parents for better context, the source_docs passed to the UI still come from child chunks → citation filtering fails or shows wrong pages.
Questions:
- Where is the biggest weakness in this setup — chunking strategy, parent expansion logic, citation post-processing, or the prompt itself?
Any insights, similar experiences, or suggested improvements would be greatly appreciated. I’m happy to share whole python files that contains the logic (document processor.py, rag_graph.py,vector_store.py).