Any best practices about extract structured json output from model?

I am learning structured_output in langchain from the doc How to return structured data from a model | ๐Ÿฆœ๏ธ๐Ÿ”— Langchain

Is there any best practice and the better packages in JS stack to extract JSON response from the llm without using `.withStructuredOutput`๏ผŸ

The llm I am using is not good at `extract` tool based on the `withStructuredOutput` approach like below, here is the extract schema:

{
    "name": "extract",
    "parameters": {
        "type": "object",
        "properties": {
            "summary": {
                "type": "string",
                "description": "Concise summary of the webpage content"
            },
            "key_excerpts": {
                "type": "string",
                "description": "Important quotes and excerpts from the content"
            }
        },
        "required": [
            "summary",
            "key_excerpts"
        ],
        "additionalProperties": false,
        "$schema": "http://json-schema.org/draft-07/schema#"
    }
}

Most of time, itโ€™s failed to extract (maybe the llm is poor):

So, the only way for me is to use prompt to let llm return json code block, and build a custom extactor.

I am finding some help and the packages can improve the efficiency:

  1. better way or packages to extract code block from markdown

  2. packages recommendation like dirty-json or `jsonrepair`

I have read the topic https://forum.langchain.com/t/should-withstructuredoutput-throw-an-error-when-get-some-wrong-input/1315

It seems like my llm provider also doesnโ€™t support the method :joy:

Have you tried using a tool to enforce structured output.

There is an example here, that doesnโ€™t rely on WithStructuredOutput but rather passing a schema to a tool

Hope this help!

1 Like

I will try this, thanks U!