ElasticsearchStore search method does not return Document IDs

I am using Langchain’s ElasticsearchStore from langchain-elasticsearch and have noticed that when I use vectorstore.search(), the Langchain Documents returned do not have ids attached to them. When I used vectorstore.add_documents() the ids returned as expected. I can use the Elasticsearch client to see the ids are stored with the Documents and can use vectorstore.delete_by_ids() to delete the documents.

When I tried to run vectorstore.get_by_ids() I got this error “NotImplementedError: ElasticsearchStore does not yet support get_by_ids.”

Is this as expected for where the ElasticsearchStore is at development wise or is there something I should change?

Thanks!

Hi @sanne

the issue there Missing `get_by_ids` Implementation in `ElasticsearchStore` · Issue #49 · langchain-ai/langchain-elastic · GitHub is near one year old.
I think you have to find a workaroud for now.

What you can do now:

  • Include the ES _id in your results via a custom doc builder. The search API accepts a doc_builder to transform raw ES hits into Documents. You can use it to surface _id in metadata:
from typing import Dict
from langchain_core.documents import Document

def doc_with_id(hit: Dict) -> Document:
    src = hit.get("_source", {})
    es_id = hit.get("_id")
    # Adjust the content field to your mapping (e.g., "content", "text", etc.)
    content = src.get("content") or src.get("text") or ""
    # Preserve your other metadata as needed
    meta = {**{k: v for k, v in src.items() if k != "content"}, "es_id": es_id}
    return Document(page_content=content, metadata=meta)

results = vectorstore.search("your query", k=4, doc_builder=doc_with_id)
# Access: results[i].metadata["es_id"]
  • If you need direct fetch-by-ID today, use the underlying Elasticsearch client’s get/mget against your index while get_by_ids is pending.