I am using Langchain’s ElasticsearchStore from langchain-elasticsearch and have noticed that when I use vectorstore.search(), the Langchain Documents returned do not have ids attached to them. When I used vectorstore.add_documents() the ids returned as expected. I can use the Elasticsearch client to see the ids are stored with the Documents and can use vectorstore.delete_by_ids() to delete the documents.
When I tried to run vectorstore.get_by_ids() I got this error “NotImplementedError: ElasticsearchStore does not yet support get_by_ids.”
Is this as expected for where the ElasticsearchStore is at development wise or is there something I should change?
Include the ES _id in your results via a custom doc builder. The search API accepts a doc_builder to transform raw ES hits into Documents. You can use it to surface _id in metadata:
from typing import Dict
from langchain_core.documents import Document
def doc_with_id(hit: Dict) -> Document:
src = hit.get("_source", {})
es_id = hit.get("_id")
# Adjust the content field to your mapping (e.g., "content", "text", etc.)
content = src.get("content") or src.get("text") or ""
# Preserve your other metadata as needed
meta = {**{k: v for k, v in src.items() if k != "content"}, "es_id": es_id}
return Document(page_content=content, metadata=meta)
results = vectorstore.search("your query", k=4, doc_builder=doc_with_id)
# Access: results[i].metadata["es_id"]
If you need direct fetch-by-ID today, use the underlying Elasticsearch client’s get/mget against your index while get_by_ids is pending.