How to get the stop_reason in `llm.invoke`?

septemberlemon · December 12, 2025, 9:45am

hello, I’m using code like:

from langchain_openai import ChatOpenAI

llm1 = ChatOpenAI(
    model=model_name,
    openai_api_key=vllm-key,
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.3,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": False},
    }
)

llm2 = ChatOpenAI(
    model=model_name,
    openai_api_key=vllm-key,
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.3,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": False},
        "stop": ["\n"]
    }
)
if __name__ == "__main__":
    response1 = llm1.invoke([HumanMessage("give me a short poem by Yeats,separated by line breaks")])
    response2 = llm2.invoke([HumanMessage("give me a short poem by Yeats,separated by line breaks")])
    print(response1.content)
    print("#" * 50)
    print(response2.content)

then I got result like:

D:\Code\langchain\.venv\Scripts\python.exe D:\Code\langchain\agent\character\llm.py 
Sure! Here's a short poem by W.B. Yeats, *"The Lake Isle of Innisfree"*, separated by line breaks as requested:

I will arise and go now, and go to Innisfree,  
And a small cabin build there, of clay and twigs.  
Nine bean-rows will I have there, a hive for the honey-bee,  
And live alone in the bare land there, on the lake island of Innisfree.  

And I shall have some peace there, for the sounds of peace,  
A lake water lapping with low sounds by the shore;  
While peace comes dropping by, dropping from the veils of the morning,  
And evening full of the linnet's wings.  

I will arise and go now, for always night and day  
I hear lake water lapping with low sounds by the shore;  
While Ballylee is a ruin, and the towers crumble slowly,  
Yet peace comes dropping by, dropping from the veils of the morning.
##################################################
Sure! Here's a short poem by W.B. Yeats, *"The Lake Isle of Innisfree"*, separated by line breaks as requested:

Process finished with exit code 0

but seems I can’t get the stop_reason in content, because if I print response2, I just got:

AIMessage(
    content='Sure! Here\'s a short poem by W.B. Yeats, *"The Lake Isle of Innisfree"*, separated by line breaks as requested:',
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 31,
            'prompt_tokens': 26,
            'total_tokens': 57,
            'completion_tokens_details': None,
            'prompt_tokens_details': None
        },
        'model_provider': 'openai',
        'model_name': 'qwen3-32b-bnb-4bit',
        'system_fingerprint': None,
        'id': 'chatcmpl-62daf15d21304bb58b242dcaec3fbd81',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='lc_run--b63d046e-3244-4f87-8312-e949aed7a0bb-0',
    usage_metadata={
        'input_tokens': 26,
        'output_tokens': 31,
        'total_tokens': 57,
        'input_token_details': {},
        'output_token_details': {}}
)

there is no stop_reason inside it but only finish_reason
but I did find stop_reason in http response, it’s like "stop_reason": "\n" or "stop_reason": null
I need it to determine which specific stop token triggered the termination, what should I do?

pawel-twardziak · December 12, 2025, 10:32am

hi @septemberlemon

afaik, with the current langchain_openai.ChatOpenAI integration, llm.invoke only exposes OpenAI’s standard finish_reason in AIMessage.response_metadata. Provider-specific fields like stop_reason from your vLLM/Qwen server are not propagated, so you can’t read them from AIMessage without customizing the integration or bypassing LangChain.

You have two practical options today:

Option A - Call your vLLM endpoint directly (bypass LangChain for that call)
Option B - Fork / subclass ChatOpenAI to propagate stop_reason

I assume option B is preferable:

from typing import Any, Dict, Union, List

import openai
from langchain_openai import ChatOpenAI
from langchain_core.outputs import ChatResult, ChatGeneration
from langchain_core.messages import AIMessage


class ChatOpenAIWithStopReason(ChatOpenAI):
    """ChatOpenAI subclass that propagates provider-specific `stop_reason`
    from the raw OpenAI/vLLM response into `AIMessage.response_metadata`
    and `ChatResult.llm_output`.

    This assumes your backend returns something like:

        {
          "choices": [
            {
              "finish_reason": "stop",
              "stop_reason": "\\n",
              ...
            }
          ],
          ...
        }
    """

    def _create_chat_result(
        self,
        response: Union[dict, openai.BaseModel],
        generation_info: Dict[str, Any] | None = None,
    ) -> ChatResult:
        # First, let the base class do all its normal work
        result = super()._create_chat_result(response, generation_info)

        # Normalize response to a dict so we can inspect provider-specific fields
        try:
            response_dict = (
                response if isinstance(response, dict) else response.model_dump()
            )
        except Exception:
            # If for some reason we can't dump it, just return the base result
            return result

        choices = response_dict.get("choices") or []
        if not choices:
            return result

        # For simplicity, read stop_reason from the first choice.
        # Adjust if your backend encodes it differently or per-choice.
        stop_reason = choices[0].get("stop_reason")
        if stop_reason is None:
            # Backend didn't send stop_reason (or openai client stripped it)
            return result

        # Attach stop_reason to overall llm_output metadata
        if isinstance(result.llm_output, dict):
            result.llm_output["stop_reason"] = stop_reason

        # Attach to each generation's AIMessage.response_metadata
        for gen in result.generations:
            msg = gen.message
            # Only AIMessage has response_metadata; skip tool / system messages, just in case
            if isinstance(msg, AIMessage):
                # response_metadata is a plain dict on AIMessage
                metadata = dict(getattr(msg, "response_metadata", {}) or {})
                metadata["stop_reason"] = stop_reason
                msg.response_metadata = metadata

        return result

from langchain_core.messages import HumanMessage
from my_openai import ChatOpenAIWithStopReason

llm = ChatOpenAIWithStopReason(
    model=model_name,
    openai_api_key="vllm-key",
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.3,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": False},
        "stop": ["\n"],
    },
)

resp = llm.invoke([HumanMessage("give me a short poem by Yeats,separated by line breaks")])

print(resp.content)
print("finish_reason:", resp.response_metadata.get("finish_reason"))
print("stop_reason:", resp.response_metadata.get("stop_reason"))

I haven’t tested that so it may need some alignments or fixes.

Let me know it this works for you.

Topic		Replies	Views
LangChain LLMs chatbot Weird responses and cut off LangChain python-help	3	80	November 28, 2025
@langchain/openai for AzureChatOpenAI isn't returning reasoning content or tokens LangChain js-help	2	129	November 28, 2025
Langchain not compatible with GPT-5.1 reasoning model to extract reasoning? LangChain python-help	1	568	November 22, 2025
Missing `reasoning_content` field LangChain python-help	1	484	December 10, 2025
How to extract GPT-5 reasoning summaries with @langchain/openai? LangChain js-help	14	3027	October 17, 2025

How to get the stop_reason in `llm.invoke`?

Related topics