Token usage not included in output when stream_mode includes "messages"

awade · August 8, 2025, 4:19pm

I am trying to log token usage for LLM calls, but the streamed responses, when streaming tokens directly via stream_mode = [‘messages’] do not include usage in the final response as it does when stream_mode = [‘updates’]

I’ve pair everything back to a minimal example below

from langgraph.prebuilt import create_react_agent
from databricks_langchain import ChatDatabricks
from langchain_core.callbacks import UsageMetadataCallbackHandler
from langchain_core.callbacks import get_usage_metadata_callback

agent = create_react_agent(
    model=ChatDatabricks(endpoint=os.getenv('LLM_ENDPOINT_PRIMARY'), temperature=0.0, stream_usage=True),
    # tools=[get_weather],
    tools=[],
    prompt="You are a helpful assistant"
)

callback_handler = UsageMetadataCallbackHandler()
with get_usage_metadata_callback() as cb:
    for event in agent.stream(
        {"messages": [{"role": "user", "content": "hello!"}]},
        config={"callbacks": [callback_handler]},
        stream_mode=["updates"]
    ):
        print(event)

yields:

AIMessage(content=‘Hello! How can I assist you today?’, additional_kwargs={}, response_metadata={‘created’: 1754669019, ‘id’: ‘chatcmpl-C2Jr93ePJ2iEdqo5K0hbcZNxtg7v2’, ‘model’: ‘gpt-4o-2024-08-06’, ‘object’: ‘chat.completion’, ‘system_fingerprint’: ‘fp_ee1d74bde0’, 'usage’: {‘completion_tokens’: 10, ‘completion_tokens_details’: {‘accepted_prediction_tokens’: 0, ‘audio_tokens’: 0, ‘reasoning_tokens’: 0, ‘rejected_prediction_tokens’: 0}, ‘prompt_tokens’: 18, ‘prompt_tokens_details’: {‘audio_tokens’: 0, ‘cached_tokens’: 0}, ‘total_tokens’: 28}, ‘model_name’: ‘gpt-4o-2024-08-06’}, id=‘run–ad5a92fc-fd7c-43b1-a3ff-f88c46da786a-0’)

However, if I use `stream_mode=[“updates”, “messages”]` neither the chunks nor the final update includes the usage information

AIMessage(content=‘Hello! How can I assist you today?’, additional_kwargs={}, response_metadata={‘finish_reason’: ‘stop’}, id=‘run–bb73d7a0-e585-4938-8b18-609f7e50ccbc’)

I have tried included both CallbackHandler methods (as callback or as context manager), together (as above) and individually but neither is populated following the call to the endpoint.

Tested with

langgraph==0.5.3 and langgraph==0.6.4

databricks-lanchain==0.6.0

Topic		Replies	Views
How to obtain token usage from langgraph? LangGraph intro-to-langgraph , python-help	3	2765	October 2, 2025
Unable to get usage while using streaming function LangChain python-help	4	754	August 1, 2025
Stream_usage in multiagentic langchain workflow LangChain python-help	3	733	July 9, 2025
Understanding langgraph usage_metadata LangGraph python-help	2	856	July 9, 2025
Ability to access llm metadata in callback LangChain product-feedback , js-help	2	472	August 19, 2025

Token usage not included in output when stream_mode includes "messages"

Related topics