I am trying to log token usage for LLM calls, but the streamed responses, when streaming tokens directly via stream_mode = [‘messages’] do not include usage in the final response as it does when stream_mode = [‘updates’]
I’ve pair everything back to a minimal example below
from langgraph.prebuilt import create_react_agent
from databricks_langchain import ChatDatabricks
from langchain_core.callbacks import UsageMetadataCallbackHandler
from langchain_core.callbacks import get_usage_metadata_callback
agent = create_react_agent(
model=ChatDatabricks(endpoint=os.getenv('LLM_ENDPOINT_PRIMARY'), temperature=0.0, stream_usage=True),
# tools=[get_weather],
tools=[],
prompt="You are a helpful assistant"
)
callback_handler = UsageMetadataCallbackHandler()
with get_usage_metadata_callback() as cb:
for event in agent.stream(
{"messages": [{"role": "user", "content": "hello!"}]},
config={"callbacks": [callback_handler]},
stream_mode=["updates"]
):
print(event)
yields:
AIMessage(content=‘Hello! How can I assist you today?’, additional_kwargs={}, response_metadata={‘created’: 1754669019, ‘id’: ‘chatcmpl-C2Jr93ePJ2iEdqo5K0hbcZNxtg7v2’, ‘model’: ‘gpt-4o-2024-08-06’, ‘object’: ‘chat.completion’, ‘system_fingerprint’: ‘fp_ee1d74bde0’, 'usage’: {‘completion_tokens’: 10, ‘completion_tokens_details’: {‘accepted_prediction_tokens’: 0, ‘audio_tokens’: 0, ‘reasoning_tokens’: 0, ‘rejected_prediction_tokens’: 0}, ‘prompt_tokens’: 18, ‘prompt_tokens_details’: {‘audio_tokens’: 0, ‘cached_tokens’: 0}, ‘total_tokens’: 28}, ‘model_name’: ‘gpt-4o-2024-08-06’}, id=‘run–ad5a92fc-fd7c-43b1-a3ff-f88c46da786a-0’)
However, if I use `stream_mode=[“updates”, “messages”]` neither the chunks nor the final update includes the usage information
AIMessage(content=‘Hello! How can I assist you today?’, additional_kwargs={}, response_metadata={‘finish_reason’: ‘stop’}, id=‘run–bb73d7a0-e585-4938-8b18-609f7e50ccbc’)
I have tried included both CallbackHandler methods (as callback or as context manager), together (as above) and individually but neither is populated following the call to the endpoint.
Tested with
langgraph==0.5.3 and langgraph==0.6.4
databricks-lanchain==0.6.0