I’m trying to use LangChain’s record_manager together with the indexing API to keep my FAISS vector store in sync, instead of doing a full refresh each time.
Here’s my approach:
I set up a SQLRecordManager with SQLite and can see that the record_manager_cache is getting updated correctly.
I run initial indexing with some documents and persist the FAISS vector store to disk.
Later, I update one document (doc1), insert a new one (doc3), and remove another (doc2) using index(…, cleanup=“incremental”).
Problem: while the record manager database reflects the updates, the FAISS vector store still contains the old documents (it doesn’t seem to delete/update as expected).
Question: How can I properly keep my FAISS vector store in sync with the record manager so that updates and deletions are reflected?
Here’s a snippet of my setup (simplified):
vector_store = load_or_create_vectorstore(persist_dir, embeddings)
record_manager = SQLRecordManager(namespace, db_url=RECORD_MANAGER_DB_URL)
record_manager.create_schema()
# Initial indexing
index(
docs_source=initial_documents,
record_manager=record_manager,
vector_store=vector_store,
cleanup=None,
source_id_key=“source”,
)
vector_store.save_local(FAISS_PERSIST_DIR)
# Later: update + insert + delete
index(
docs_source=new_documents,
record_manager=record_manager,
vector_store=vector_store,
cleanup=“incremental”,
source_id_key=“source”,
)
Has anyone successfully done this with FAISS? Am I missing a step to apply the record_manager changes back into the vector store?