Hi!
I am quite new to the world of LLMs and AI. Because of the current information overload on the web (contradictory advice, outdated documentation, etc.), I am struggling to separate the signal from the noise and find validated best practices.
I am currently building a RAG-based tool to automate email drafts for learning purposes.
The Stack
- Environment: Node.js
- Language: TypeScript with LangChain
- Vector DB: Qdrant
- Models:
mistral-small-2506/mistral-embed(via API)
As a beginner, I am facing technical bottlenecks regarding my retrieval strategy and conversational context management, and I would love some guidance or reference docs.
Current issues
1. Missing context (Poor chunk diversity)
The retrieved chunks are relevant to the query but miss necessary details spread elsewhere.
- Example: Looking for the cost of a “60x40cm sign”. The retriever fetches chunks talking about prices, but misses the chunk specifying dimensions. As a result, the LLM guesses it’s the price of a “medium sign” instead of a small one.
- Current Setup: I use
similaritySearchWithScore(Top-K ranking with a minimum score threshold).
2. Thread context (Poor conversational deduction)
The model (or my architecture) struggles to look back up the email chain to link questions and answers. Could this be happening because the email input is not correctly parsed?
Example
Incoming Email 1: “I need info to create a sign.”
AI Reply 1: “Sure, I need the size and the location (indoor/outdoor).”
Incoming Email 2: “I want it to be 40x60cm and it’s for both.” (Meaning indoor and outdoor)
AI Reply 2: “Great, but I still need the location of the sign.”
Generic questions
- Framework choice (LangGraph): I keep hearing about LangGraph for complex workflows. Given my email automation use case (handling threads, context history, and multi-step decisions), is vanilla LangChain enough or should I pivot and migrate to LangGraph right away before my codebase grows too large?
- Retrieval & Chunking Strategy: I am currently using 6 chunks of 512 characters with a 50-character overlap. Is this optimal for generic text (users will upload their own PDF, txt, Word documents, etc.)? Does it matter if the chunk size is a power of 2 (e.g., 512, 1024, …)?
- Chunk Size vs. Quantity tradeoff: To fix my context missing issues, is it generally better to retrieve a large number of small chunks (e.g., Top-K with 10 chunks of 256 chars) or a small number of large chunks (e.g., Top-K with 3 chunks of 1024 chars)? What are the best practices regarding token budget vs. retrieval precision?
- Cross-referencing: Do most models natively manage to cross-reference data spread across multiple different chunks, or should I implement a specific retriever?
- Preprocessing: Should I perform specific cleaning, parsing, or serialization on raw incoming emails before embedding/feeding them to the LLM?
- Testing/Eval: Do you have any recommendations or lightweight toolstacks compatible with TypeScript to efficiently monitor, evaluate, and test a RAG system?
Please excuse any technical nonsense in my explanations. If you have any reliable documentation, up-to-date tutorials, or references to share, I would be deeply interested and grateful!
Thanks for your help!