I’m currently working on my company’s RAG system using LangChain and LM Studio.
I found that when I upload a large number of documents to the knowledge base, the AI sometimes cannot retrieve all the relevant content, so its answers may be incomplete.
I asked ChatGPT and Claude, and they both suggested storing the uploaded documents in SQL during the upload process.
If I don’t want to use SQL, is there another way to solve this?
My current architecture is:
-
LangChain for the RAG pipeline
-
LM Studio as the local LLM provider
-
A vector database for document embeddings
-
Uploaded documents are split into chunks and stored in the vector database
My questions are:
-
Is SQL necessary for handling many uploaded documents in a RAG system?
-
If I don’t want to use SQL, what are the common alternatives?
-
Should I improve the retriever settings, chunking strategy, metadata filtering, or vector database structure instead?
I would appreciate any suggestions or best practices for this kind of RAG architecture.