Feature (Code attached): Add JSON Schema Sanitizer for OpenAI Structured Outputs Compatibility

aviadr1 · August 14, 2025, 2:07pm

Pydantic to OpenAI Structured Outputs Compatibility Solution (with implementation)

I’ve developed a comprehensive solution (available as a MIT-licensed gist) for the widespread compatibility issues between Pydantic models and OpenAI’s Structured Outputs that I believe should be integrated into LangChain to benefit the entire community.

gist.github.com

https://gist.github.com/aviadr1/2d1186625d67fba9c8f421d273bf7a53

json_schema.py

"""
File: gv/ai/common/llm/json_schema.py
Author: Aviad Rozenhek

OpenAI Structured Outputs (`response_format={"type":"json_schema"}`) supports only a subset of JSON Schema.
Many perfectly valid Pydantic constructs won't fly as-is. Use these patterns:

-------------------------------------------------------------------------
1) Optional / nullable / Default fields
-------------------------------------------------------------------------

This file has been truncated. show original

test_json_schema.py

# tests/test_sanitizer_and_safe_structured_real.py

import enum
import os
from typing import Any, Dict, List, Literal, Optional, Union

import pytest
import pytest_asyncio
from dotenv import load_dotenv
from pydantic import BaseModel, Field

This file has been truncated. show original

test_json_schema_edge_cases.py

"""
Additional test cases for edge cases and OpenAI JSON Schema requirements that might be missing from your current test
suite.
"""

import json
from datetime import date, datetime
from decimal import Decimal
from enum import Enum
from typing import Any, Dict, List, Literal, Optional, Union

This file has been truncated. show original

The Problem

Many LangChain users encounter frustrating errors when using .with_structured_output() with OpenAI’s strict JSON schema mode. The root cause is that OpenAI’s Structured Outputs only support a very limited subset of JSON Schema, while Pydantic generates rich, full-featured schemas. This mismatch causes failures for common patterns:

Optional fields (Optional[str] = None) generate {"type": ["string", "null"]} which OpenAI rejects
Numeric constraints (Field(ge=0, le=100)) use minimum/maximum keywords that OpenAI doesn’t support
Recursive models (tree structures, linked lists) use $ref which OpenAI explicitly forbids
Union types generate anyOf/oneOf which aren’t supported in strict mode
Missing or empty additionalProperties cause validation errors

Currently, developers must either rewrite their Pydantic models (breaking compatibility with other providers) or manually craft OpenAI-specific schemas (error-prone and tedious).

The Solution

I’ve created a sanitize_for_openai_schema() function that automatically transforms any Pydantic model into an OpenAI-compatible schema. The implementation:

Converts optionals to OpenAI’s preferred required+nullable pattern
Detects recursive models early and fails with helpful error messages
Migrates constraints to descriptions for app-side validation
Collapses unions intelligently while preserving nullability
Fixes additionalProperties to always have proper typing
Preserves field order from the original Pydantic model

Integration with LangChain

This could be integrated into LangChain in several ways:

# Option 1: Automatic in with_structured_output
llm.with_structured_output(
    MyModel, 
    method="json_schema",
    strict=True,
    sanitize_schema=True  # New parameter
)

# Option 2: Standalone utility
from langchain.output_parsers.openai_tools import sanitize_for_openai_schema
schema = sanitize_for_openai_schema(MyModel)

# Option 3: Auto-detect and sanitize when strict=True
llm.with_structured_output(MyModel, method="json_schema", strict=True)
# Automatically applies sanitization when needed

Benefits for the Community

Zero model changes required - Existing Pydantic models work immediately
Cross-provider compatibility - Same models work with OpenAI, Anthropic, local LLMs
Better developer experience - Clear errors for unsupported patterns
Production-tested - Includes 1000+ lines of tests covering all edge cases
MIT licensed - Free to use even outside LangChain

The Code

The gist includes:

json_schema.py - The complete sanitizer implementation with extensive documentation
test_json_schema_edge_cases.py - Comprehensive test suite

The code is MIT licensed, so anyone can use it in their projects immediately, whether or not it gets integrated into LangChain. However, I believe this would be a valuable addition to LangChain core, saving countless developers from these frustrating compatibility issues.

Example Usage

from pydantic import BaseModel, Field
from typing import Optional, List

class User(BaseModel):
    name: str
    age: Optional[int] = Field(None, ge=0, le=120)
    tags: List[str] = []
    metadata: dict = {}

# Without sanitizer: OpenAI API errors
# With sanitizer: Works perfectly!
schema = sanitize_for_openai_schema(User)

I’m happy to if this makes it into LangChain code. The implementation is production-ready and has been battle-tested with 20+ different Pydantic models. Let me know your thoughts or if you need any clarification!

Topic		Replies	Views
In langchain's ChatOpenai doc it's mentioned how to use structured output with tools, but what if I just wanna use strucutured output, how do I do it? LangSmith Product Help python-help	3	232	October 25, 2025
Clarification on how Pydantic schema descriptions are used in with_structured_output LangChain python-help	4	432	September 24, 2025
Simplest way to setup a structured output with dict LangChain python-help	2	542	August 27, 2025
Should .withStructuredOutput throw an error when get some wrong input? LangChain js-help	1	310	August 22, 2025
Tool call and structured ouput LangChain python-help	2	224	October 23, 2025

Feature (Code attached): Add JSON Schema Sanitizer for OpenAI Structured Outputs Compatibility

Pydantic to OpenAI Structured Outputs Compatibility Solution (with implementation)

The Problem

The Solution

Integration with LangChain

Benefits for the Community

The Code

Example Usage

Related topics