Clarification on how Pydantic schema descriptions are used in with_structured_output

Hi all,

I’ve been digging into how with_structured_output works with Pydantic models, and I came across this note in the docs:

“Beyond just the structure of the Pydantic class, the name of the Pydantic class, the docstring, and the names and provided descriptions of parameters are very important. Most of the time, with_structured_output is using a model’s function/tool-calling API, and you can effectively think of all of this information as being added to the model prompt.”

From what I can tell in the source code:

  • with_structured_output relies on PydanticOutputParser.

  • That parser’s get_format_instructions calls pydantic_object.model_json_schema() to generate the schema (including field descriptions).

  • These instructions are then embedded into the prompt with _PYDANTIC_FORMAT_INSTRUCTIONS.

What I’m unclear on is the phrase “most of the time”.

  • Is get_format_instructions always injected into the final prompt when using with_structured_output?

  • Or are there specific contexts (e.g. when wrapped in certain tools or chains, or when using the function-calling API differently) where the schema isn’t passed through the prompt?

I’d love to understand in practice when the schema descriptions are guaranteed to be included vs. when they might not be.

Thanks!

Short version: No - get_format_instructions() is not always injected into the prompt. with_structured_output picks the best available strategy per model/provider:

  • Most of the time: If the chat model supports native structured outputs (tool/function-calling or JSON/response-format modes), LangChain passes the Pydantic schema via the model’s API (tools/response_format), not as prompt text. The class name, docstring, and field descriptions become the tool/function definition metadata the model sees.

  • Fallback: If the model lacks native structured output, LangChain falls back to a prompt+parser approach and injects PydanticOutputParser.get_format_instructions() (which uses _PYDANTIC_FORMAT_INSTRUCTIONS) into the prompt.

So:

  • Is get_format_instructions always injected? - No. It’s only injected in the fallback path.

  • When isn’t it injected? - When the model supports tool/function-calling or JSON/response-format. In those cases the schema is sent via request parameters, not prompt text.

  • When is it guaranteed to be included? - When using text-only models without native structured output (or when you explicitly force the prompt+parser route).

How to verify in practice:

  • Check the request in your traces: you’ll either see a tools/response_format payload (native path) or a prompt containing the format instructions (fallback path).

  • In LangGraph’s prebuilt agent flow, response_format=YourModel adds a separate final LLM call for structured output; it uses the same logic: native tool/JSON when available, prompt+parser otherwise. See “Configure structured output” in the quickstart here.

If you need to force a specific path, some model wrappers expose a parameter to prefer tool-calling vs JSON mode vs prompt+parser; otherwise it’s auto-selected based on model capabilities.

Thank you very much for the detailed answer! Makes totally sense!

1 Like

hi @dds

something in addition :slight_smile: Structured output from LLMs: more than just prompt engineering | by Paweł Twardziak | Sep, 2025 | Medium

Great! This adds further clarity! Thanks again :slightly_smiling_face:

1 Like