Record_Manager vectorStore

I’m trying to use LangChain’s record_manager together with the indexing API to keep my FAISS vector store in sync, instead of doing a full refresh each time.

Here’s my approach:

I set up a SQLRecordManager with SQLite and can see that the record_manager_cache is getting updated correctly.

I run initial indexing with some documents and persist the FAISS vector store to disk.

Later, I update one document (doc1), insert a new one (doc3), and remove another (doc2) using index(…, cleanup=“incremental”).

Problem: while the record manager database reflects the updates, the FAISS vector store still contains the old documents (it doesn’t seem to delete/update as expected).

Question: How can I properly keep my FAISS vector store in sync with the record manager so that updates and deletions are reflected?

Here’s a snippet of my setup (simplified):

vector_store = load_or_create_vectorstore(persist_dir, embeddings)

record_manager = SQLRecordManager(namespace, db_url=RECORD_MANAGER_DB_URL)

record_manager.create_schema()

# Initial indexing

index(

docs_source=initial_documents,

record_manager=record_manager,

vector_store=vector_store,

cleanup=None,

source_id_key=“source”,

)

vector_store.save_local(FAISS_PERSIST_DIR)

# Later: update + insert + delete

index(

docs_source=new_documents,

record_manager=record_manager,

vector_store=vector_store,

cleanup=“incremental”,

source_id_key=“source”,

)

Has anyone successfully done this with FAISS? Am I missing a step to apply the record_manager changes back into the vector store?

Hi @DivyanshJain0001

IMHO your setup looks fine. the behavior is by design. In incremental mode the indexer will update mutated sources and dedupe, but it will not remove sources that disappeared. To hard-delete doc2 from FAISS you must run a full cleanup pass with the complete set of documents that should remain. And if you persist FAISS to disk, save again after the pass.

Why your FAISS still shows old docs:

  • Incremental cleanup: cleans up mutations of a source, not sources removed from the corpus. Deleting missing sources is handled only by cleanup="full"
  • FAISS supports delete-by-id, so the indexer can remove entries -if you run a mode that asks it to

Check these sources:

what about this?

# initial as before...
index(
    docs_source=initial_documents,
    record_manager=record_manager,
    vector_store=vector_store,
    cleanup=None,                  # or "incremental"
    source_id_key="source",
)
vector_store.save_local(FAISS_PERSIST_DIR)

# later: doc1 updated, doc3 new, doc2 removed
# run FULL cleanup with EXACTLY the docs that should remain:
surviving_docs = [doc1_updated, doc3]

index(
    docs_source=surviving_docs,
    record_manager=record_manager,
    vector_store=vector_store,
    cleanup="full",                # this deletes doc2's chunks
    source_id_key="source",
)
vector_store.save_local(FAISS_PERSIST_DIR)  # persist deletions