So I tried to update the code like this
But second option works fine for both providers, but it doesnt triger any tool, because when I use with_structured_output, langchain tells the model this as one tool, so ignoring other tools
I am trying to resolve the structured output problem by creating the separated node for structuring the output, it works functionally. but it doesn’t feel okay to me,
any other production solution for it?
However, you tried it with gemini 3.0 flash, but what happened is:
response_format=… in LangChain is not a cross-provider contract. It works for OpenAI because OpenAI supports response_format natively. Gemini’s native mechanism is response_mime_type=“application/json” + response_json_schema=… (JSON Schema / Pydantic).
For Gemini, “tools + native structured output” is still constrained by the provider. Google’s docs i provided above only explicitly advertise combining structured outputs with built-in tools (Google Search, URL context, code execution, file search) as a Gemini 3 preview feature, and do not show custom function tools + JSON schema together in one call.
So my recommendation would be keep a two-node StateGraph:
Tool node: model bound only with your tools (planning + tool calling).
Formatting node: a separate model call that formats the final answer into your Pydantic schema (Gemini JSON schema mode if you want strictness).
This is actually a very common production pattern when providers can’t reliably do tool calling AND hard JSON-schema constrained final output in one request.
def call_model(state: AgentState):
"""Node that produces the AI tool-call message."""
return {"messages": [planner_model.invoke(state["messages"])]}
def format_output(state: AgentState):
"""Node that turns the final text into structured output."""
message = formatter_model.invoke(state["messages"])
# Parse the model's JSON string into the Pydantic object.
structured = Answer.model_validate_json(message.content)
return {"final": structured, "messages": state["messages"]}
The flow requires 2 LLM calls. However, unfortunately be aware of cost/token Impact:
Latency: around 2x slower due to sequential calls
Cost: around 2x more tokens (you pay for both conversations)
But if you use a simple model, this is worth it for production systems where structured output reliability is important. Since OpenAi supports single call with structured output the two-call approach is the production-ready solution for Gemini + structured output only.
Just curious - what are the limitations? I’m pretty sure create_agent is flexible enough with its middlewares - you can control almost entire behaviour of the agent with it.