How to get the stop_reason in `llm.invoke`?

hello, I’m using code like:

from langchain_openai import ChatOpenAI

llm1 = ChatOpenAI(
    model=model_name,
    openai_api_key=vllm-key,
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.3,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": False},
    }
)

llm2 = ChatOpenAI(
    model=model_name,
    openai_api_key=vllm-key,
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.3,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": False},
        "stop": ["\n"]
    }
)
if __name__ == "__main__":
    response1 = llm1.invoke([HumanMessage("give me a short poem by Yeats,separated by line breaks")])
    response2 = llm2.invoke([HumanMessage("give me a short poem by Yeats,separated by line breaks")])
    print(response1.content)
    print("#" * 50)
    print(response2.content)

then I got result like:

D:\Code\langchain\.venv\Scripts\python.exe D:\Code\langchain\agent\character\llm.py 
Sure! Here's a short poem by W.B. Yeats, *"The Lake Isle of Innisfree"*, separated by line breaks as requested:

I will arise and go now, and go to Innisfree,  
And a small cabin build there, of clay and twigs.  
Nine bean-rows will I have there, a hive for the honey-bee,  
And live alone in the bare land there, on the lake island of Innisfree.  

And I shall have some peace there, for the sounds of peace,  
A lake water lapping with low sounds by the shore;  
While peace comes dropping by, dropping from the veils of the morning,  
And evening full of the linnet's wings.  

I will arise and go now, for always night and day  
I hear lake water lapping with low sounds by the shore;  
While Ballylee is a ruin, and the towers crumble slowly,  
Yet peace comes dropping by, dropping from the veils of the morning.
##################################################
Sure! Here's a short poem by W.B. Yeats, *"The Lake Isle of Innisfree"*, separated by line breaks as requested:

Process finished with exit code 0

but seems I can’t get the stop_reason in content, because if I print response2, I just got:

AIMessage(
    content='Sure! Here\'s a short poem by W.B. Yeats, *"The Lake Isle of Innisfree"*, separated by line breaks as requested:',
    additional_kwargs={'refusal': None},
    response_metadata={
        'token_usage': {
            'completion_tokens': 31,
            'prompt_tokens': 26,
            'total_tokens': 57,
            'completion_tokens_details': None,
            'prompt_tokens_details': None
        },
        'model_provider': 'openai',
        'model_name': 'qwen3-32b-bnb-4bit',
        'system_fingerprint': None,
        'id': 'chatcmpl-62daf15d21304bb58b242dcaec3fbd81',
        'finish_reason': 'stop',
        'logprobs': None
    },
    id='lc_run--b63d046e-3244-4f87-8312-e949aed7a0bb-0',
    usage_metadata={
        'input_tokens': 26,
        'output_tokens': 31,
        'total_tokens': 57,
        'input_token_details': {},
        'output_token_details': {}}
)

there is no stop_reason inside it but only finish_reason
but I did find stop_reason in http response, it’s like "stop_reason": "\n" or "stop_reason": null
I need it to determine which specific stop token triggered the termination, what should I do?

hi @septemberlemon

afaik, with the current langchain_openai.ChatOpenAI integration, llm.invoke only exposes OpenAI’s standard finish_reason in AIMessage.response_metadata. Provider-specific fields like stop_reason from your vLLM/Qwen server are not propagated, so you can’t read them from AIMessage without customizing the integration or bypassing LangChain.

You have two practical options today:

  1. Option A - Call your vLLM endpoint directly (bypass LangChain for that call)
  2. Option B - Fork / subclass ChatOpenAI to propagate stop_reason

I assume option B is preferable:

from typing import Any, Dict, Union, List

import openai
from langchain_openai import ChatOpenAI
from langchain_core.outputs import ChatResult, ChatGeneration
from langchain_core.messages import AIMessage


class ChatOpenAIWithStopReason(ChatOpenAI):
    """ChatOpenAI subclass that propagates provider-specific `stop_reason`
    from the raw OpenAI/vLLM response into `AIMessage.response_metadata`
    and `ChatResult.llm_output`.

    This assumes your backend returns something like:

        {
          "choices": [
            {
              "finish_reason": "stop",
              "stop_reason": "\\n",
              ...
            }
          ],
          ...
        }
    """

    def _create_chat_result(
        self,
        response: Union[dict, openai.BaseModel],
        generation_info: Dict[str, Any] | None = None,
    ) -> ChatResult:
        # First, let the base class do all its normal work
        result = super()._create_chat_result(response, generation_info)

        # Normalize response to a dict so we can inspect provider-specific fields
        try:
            response_dict = (
                response if isinstance(response, dict) else response.model_dump()
            )
        except Exception:
            # If for some reason we can't dump it, just return the base result
            return result

        choices = response_dict.get("choices") or []
        if not choices:
            return result

        # For simplicity, read stop_reason from the first choice.
        # Adjust if your backend encodes it differently or per-choice.
        stop_reason = choices[0].get("stop_reason")
        if stop_reason is None:
            # Backend didn't send stop_reason (or openai client stripped it)
            return result

        # Attach stop_reason to overall llm_output metadata
        if isinstance(result.llm_output, dict):
            result.llm_output["stop_reason"] = stop_reason

        # Attach to each generation's AIMessage.response_metadata
        for gen in result.generations:
            msg = gen.message
            # Only AIMessage has response_metadata; skip tool / system messages, just in case
            if isinstance(msg, AIMessage):
                # response_metadata is a plain dict on AIMessage
                metadata = dict(getattr(msg, "response_metadata", {}) or {})
                metadata["stop_reason"] = stop_reason
                msg.response_metadata = metadata

        return result
from langchain_core.messages import HumanMessage
from my_openai import ChatOpenAIWithStopReason

llm = ChatOpenAIWithStopReason(
    model=model_name,
    openai_api_key="vllm-key",
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.3,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": False},
        "stop": ["\n"],
    },
)

resp = llm.invoke([HumanMessage("give me a short poem by Yeats,separated by line breaks")])

print(resp.content)
print("finish_reason:", resp.response_metadata.get("finish_reason"))
print("stop_reason:", resp.response_metadata.get("stop_reason"))

I haven’t tested that so it may need some alignments or fixes.

Let me know it this works for you.