400 error from concatenated annotation fields during chunk aggregation (OpenAI + built-in file_search)

dnogueira · October 7, 2025, 2:48pm

I’m hitting the following error while streaming with OpenAI using the built-in file_search tool:

Error code: 400 - {‘error’: {‘message’: “Invalid ‘input[3].content[0].annotations[0].file_id’: string too long. Expected a string with maximum length 64, but got a string with length 135 instead”}}

What I see:

The model sometimes returns identical annotation deltas (same annotation payload, same index) on the same content block.
When LangChain aggregates these identical annotation deltas into an AIMessage, the two dicts are merged (matched by index) and their string fields are concatenated (e.g., file_id, filename), producing invalid values and breaching provider limits (e.g., file_id max 64 chars).

Setup (simplified from my app):

from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import InMemorySaver

model = ChatOpenAI(model="gpt-4.1-mini")

agent = create_agent(
    model=model,
    tools=[{"type": "file_search", "vector_store_ids": ["vs_..."]}],
    checkpointer=InMemorySaver()
    # streaming with .astream(..., stream_mode="messages")
)

Observed chunks:

content=[{‘type’: ‘text’, ‘annotations’: [{‘type’: ‘file_citation’, ‘file_id’: ‘file-CV8…’, ‘filename’: name_of_file.md’, ‘index’: 1724}], ‘index’: 1}] additional_kwargs={} response_metadata={‘model_provider’: ‘openai’} id=‘lc_run–9e7b058b…’

Minimal repro (mirrors what I’m seeing when two identical annotation deltas arrive with the same index):

from langchain_core.messages.ai import AIMessageChunk, add_ai_message_chunks

def make_chunk(ann_index: int) -> AIMessageChunk:
    return AIMessageChunk(
        content=[{
            "type": "text",
            "annotations": [{
                "type": "file_citation",
                "file_id": "file-CV8...",
                "filename": "name_of_file.md",
                "index": ann_index,
            }],
            "index": 1,
        }],
        additional_kwargs={},
        response_metadata={"model_provider": "openai"},
        id="lc_run--example",
    )

a = make_chunk(1724)
b = make_chunk(1724)   # identical annotation, same index
ab = add_ai_message_chunks(a, b)

# ab.content[0]["annotations"][0] now has concatenated strings:
# file_id == "file-...file-..."
# filename == "name_of_file.mdname_of_file.md"
# -> leads to length > 64 and the 400 error above, depending on the size of [name_of_file]

Question:

Given this occurs specifically when OpenAI returns identical annotation deltas with the same index while using the built-in file_search tool, should clients dedupe these, or should LangChain avoid concatenating string fields for annotation dicts?

Environment:

langchain>=1.0.0a10
langchain-core>=1.0.0a5
langchain-openai>=1.0.0a3

chester-lc · October 8, 2025, 2:51pm

Thanks for reporting this, this is a bug. I’ve opened a PR and will release a fix.

Topic		Replies	Views
Timestamps and strings in concatenated streaming responses are added LangChain js-help	1	116	January 13, 2026
LangSmith Studio file upload with Deep Agents causing OpenAI 400 error Deep Agents langsmith-studio	2	158	March 8, 2026
LangChain Agents: stream_mode="messages" intermittently emits cumulative AIMessageChunk (large duplicate text mid-response) LangChain python-help	6	551	November 21, 2025
Send PDF File as a ToolMessage LangChain python-help	3	792	October 22, 2025
Error "variable agent_scratchpad should be a list of base messages", do not know how to debug LangChain python-help	0	451	July 25, 2025

400 error from concatenated annotation fields during chunk aggregation (OpenAI + built-in file_search)

Related topics