Elastic v7 deprecation version integration with tools for big data indexes

Hi guys, I have a question. I’m working on a multi-agent AI system, powered entirely by Long Chain and the Supervisor and Handoff design pattern. The problem with my system is that the knowledge base is hosted on Elasticsearch v7, so MCP technologies for tool usage aren’t available. This means that to answer data queries, I have to pre-build basic queries in DSL to answer a very limited set of questions. Given this, what recommendations can you suggest for building the necessary tools and connections to answer a much wider range of questions, beyond the pre-built queries?

hi @matpg

have you heard about ElasticsearchDatabaseChain for natural language to dsl conversion?

This is likely the single biggest upgrade you can make. Instead of pre-building a limited set of dsl queries, let the llm generate elasticsearch query dsl dynamically based on natural language questions.

ElasticsearchDatabaseChain:

  1. introspects your index mappings and samples documents for context
  2. uses an llm to convert natural language questions into elasticsearch json dsl queries
  3. executes the query against your es cluster
  4. passes the results back through the llm to generate a human-readable answer

This shoud be fully compatible with elasticsearch v7 since it uses the standard elasticsearch-py client and basic search/mapping APIs.

from elasticsearch import Elasticsearch
from langchain_openai import ChatOpenAI
from langchain.chains.elasticsearch_database.base import ElasticsearchDatabaseChain

es_client = Elasticsearch("http://your-es-v7-host:9200")

llm = ChatOpenAI(model="gpt-4o", temperature=0)

db_chain = ElasticsearchDatabaseChain.from_llm(
    llm=llm,
    database=es_client,
    top_k=10,
    include_indices=["your_main_index", "your_secondary_index"],  
    return_intermediate_steps=True,
)

result = db_chain.invoke({"question": "How many orders were placed last month with total > $1000?"})

print(result["result"])
print(result["intermediate_steps"])

The built-in prompt instructs the llm to generate syntactically correct elasticsearch JSON queries using only columns that actually exist in your mappings. It even samples documents from each index to give the llm context about the data shape.

Another way would be wrapping es queries as custom @tool functions for your agents or create a dedicated es agent within your supervisor architecture.
Or maybe use the ElasticSearchBM25Retriever for text search + rag.
Hmmm or llm-powered dsl generation tool…
There is also ElasticsearchTranslator for self-query retrieval.