This approach of using query as a parameter, which can consist of multiple different fields, much like SQL seems the most viable, rather than hardcoding all fields individually:
async def query_field(company_name: str, query: str) -> str:
"""
Second interface — keyword-based field matching.
"""
fields = match_fields(query)
if not fields:
return (
f"No fields related to '{query}' were matched. "
"Please try a more specific description."
)
result = await api.kyc(company_name)
data = result["result"]["Data"]
lines = []
for field in fields:
value = deep_get(data, field)
if value == "unknown":
lines.append(f"\n【{field}】No relevant data found")
else:
lines.append(f"\n【{field}】")
lines.append(format_value(value))
return "\n".join(lines)
That said, I would recommend improving the Tool Description by encoding within it the fields that can be part of the query. Including a few examples of valid queries in the Tool Description could also help improve performance (just an example).
@tool
async def query_company_fields(company_name: str, query: str) → str:
“”"
Query specific fields from a company’s KYC profile.
Available fields:
Registration:
registered_name, incorporation_date, registered_address,
operating_status, business_scope, company_type
Financials:
registered_capital, paid_in_capital, annual_revenue,
credit_rating, tax_id
Legal:
licenses, litigation_records, administrative_penalties,
bankruptcy_status
Personnel:
legal_representative, shareholders, beneficial_owners,
board_members
Example queries and the fields they map to:
"Is the company still active?"
→ operating_status
"Who founded the company?"
→ legal_representative, shareholders
"Any lawsuits or penalties?"
→ litigation_records, administrative_penalties
"What is the registered capital?"
→ registered_capital, paid_in_capital
"Tell me about the beneficial owners"
→ beneficial_owners, shareholders
Args:
company_name: The company to look up.
query: Natural language description of what information is needed.
Returns:
Formatted field values from the company's KYC record.
"""
return await query_field(company_name, query)
Additional Suggestion for Improving query_field
I have another suggestion to improve your query_field. Rather than doing deterministic matching as you are doing now, you can make use of another LLM call for matching (below is just an example and given this will be only one LLM call with no past Message history the LLM context window can handle large schema token size):
FIELD_SCHEMA = """
field1: Legal registered name of the company
field2: Date of incorporation
field3: Registered address (full)
field4: Operating status (active/inactive/dissolved)
...
field255: ...
"""
FIELD_SELECTOR_PROMPT = """
You are a data field selector. Given a user question, return ONLY the relevant field names from the schema below as a JSON list.
Schema:
{schema}
User question: {query}
Return only a JSON array of field names, e.g. ["field1", "field42"]. No explanation.
"""
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
async def select_fields_with_llm(query: str) -> list[str]:
"""Use LLM to select relevant fields from schema based on user query."""
prompt = ChatPromptTemplate.from_template(FIELD_SELECTOR_PROMPT)
chain = prompt | llm | JsonOutputParser()
fields = await chain.ainvoke({
"schema": FIELD_SCHEMA,
"query": query
})
# Validate: only return fields that actually exist
valid = set(ALL_FIELD_NAMES)
return [f for f in fields if f in valid]
async def query_field(company_name: str, query: str) -> str:
fields = await select_fields_with_llm(query)
if not fields:
return f"No relevant fields found for: '{query}'"
result = await api.kyc(company_name)
data = result["result"]["Data"]
lines = []
for field in fields:
value = deep_get(data, field)
if value == "unknown":
lines.append(f"【{field}】No data available")
else:
lines.append(f"【{field}】")
lines.append(format_value(value))
return "\n".join(lines)
I hope this helps