Beginner questions & Troubleshooting: RAG Best practices for Mail Automation

Hi!

I am quite new to the world of LLMs and AI. Because of the current information overload on the web (contradictory advice, outdated documentation, etc.), I am struggling to separate the signal from the noise and find validated best practices.

I am currently building a RAG-based tool to automate email drafts for learning purposes.

The Stack

  • Environment: Node.js
  • Language: TypeScript with LangChain
  • Vector DB: Qdrant
  • Models: mistral-small-2506 / mistral-embed (via API)

As a beginner, I am facing technical bottlenecks regarding my retrieval strategy and conversational context management, and I would love some guidance or reference docs.


Current issues

1. Missing context (Poor chunk diversity)

The retrieved chunks are relevant to the query but miss necessary details spread elsewhere.

  • Example: Looking for the cost of a “60x40cm sign”. The retriever fetches chunks talking about prices, but misses the chunk specifying dimensions. As a result, the LLM guesses it’s the price of a “medium sign” instead of a small one.
  • Current Setup: I use similaritySearchWithScore (Top-K ranking with a minimum score threshold).

2. Thread context (Poor conversational deduction)

The model (or my architecture) struggles to look back up the email chain to link questions and answers. Could this be happening because the email input is not correctly parsed?

Example

Incoming Email 1: “I need info to create a sign.”
AI Reply 1: “Sure, I need the size and the location (indoor/outdoor).”
Incoming Email 2: “I want it to be 40x60cm and it’s for both.” (Meaning indoor and outdoor)
AI Reply 2: “Great, but I still need the location of the sign.”


Generic questions

  • Framework choice (LangGraph): I keep hearing about LangGraph for complex workflows. Given my email automation use case (handling threads, context history, and multi-step decisions), is vanilla LangChain enough or should I pivot and migrate to LangGraph right away before my codebase grows too large?
  • Retrieval & Chunking Strategy: I am currently using 6 chunks of 512 characters with a 50-character overlap. Is this optimal for generic text (users will upload their own PDF, txt, Word documents, etc.)? Does it matter if the chunk size is a power of 2 (e.g., 512, 1024, …)?
  • Chunk Size vs. Quantity tradeoff: To fix my context missing issues, is it generally better to retrieve a large number of small chunks (e.g., Top-K with 10 chunks of 256 chars) or a small number of large chunks (e.g., Top-K with 3 chunks of 1024 chars)? What are the best practices regarding token budget vs. retrieval precision?
  • Cross-referencing: Do most models natively manage to cross-reference data spread across multiple different chunks, or should I implement a specific retriever?
  • Preprocessing: Should I perform specific cleaning, parsing, or serialization on raw incoming emails before embedding/feeding them to the LLM?
  • Testing/Eval: Do you have any recommendations or lightweight toolstacks compatible with TypeScript to efficiently monitor, evaluate, and test a RAG system?

Please excuse any technical nonsense in my explanations. If you have any reliable documentation, up-to-date tutorials, or references to share, I would be deeply interested and grateful!

Thanks for your help!

Mail RAG has some quirks that catch first-timers by surprise. A few patterns worth setting up correctly from the start, because they’re harder to fix retroactively than to build in.

Threading is structural, not text. Email is conversational data with reply chains, forwarded content, and quoted history. Treating each message as a flat document and chunking it the standard way produces chunks where 60-80% of the content is quoted history from previous messages in the thread. Your vector DB ends up with massive duplication — the same quoted text appearing dozens of times across replies. Strip the quoted reply chain at ingestion. Keep only the new content per message. Reference the thread relationship as metadata, not as embedded text.

Headers carry signal that gets lost in standard chunking. From, To, CC, Subject, Date, In-Reply-To — these aren’t just routing metadata. They’re decision-relevant context. “Was this email between the legal team and an external counterparty” is a structural question that vector similarity won’t answer well, but metadata filtering will. Index headers as structured metadata on every chunk. Don’t bury them in the chunk body.

Personal data is everywhere. Signatures, phone numbers, addresses, financial figures, names of people not relevant to the query. If your mail RAG is going to be queried by anyone other than the original recipient, you have a privacy obligation that vector retrieval doesn’t naturally enforce. Decide upfront: do you redact PII at ingestion (cleaner but irreversible), or do you tag chunks with PII flags and filter at query time (more flexible but easier to get wrong). Either approach is fine. No approach is dangerous.

Date is critical context for mail queries. “What did we discuss about the Q3 contract last month” is a temporal query. Standard vector retrieval will surface every chunk semantically similar to “Q3 contract” across all time, equally weighted. Add date as a primary filter — either by querying with explicit date ranges, or by weighting recency in your reranking step.

Attachments are separate documents, not part of the email. PDF attachments, spreadsheets, image attachments — these need their own ingestion pipeline with their own chunking strategies. Trying to extract attachment content into the email chunk produces noise. Treat the email as a pointer to attachments and ingest each attachment as its own indexed document linked back to the email by metadata.

The biggest mistake I see in mail automation RAG: treating an email like a blog post. Email is structured conversational data with privacy implications, temporal weight, and threading relationships. The chunks you create at ingestion need to respect that structure or the retrieval layer fights against the data shape forever after.

If you want a starting framework: ingest each email with structured metadata (sender, recipients, date, thread_id, message_id, in_reply_to, has_attachments). Chunk only the new content (strip quoted history). Index attachments as separate documents linked by metadata. Filter on metadata first, retrieve semantically second.