Hey, I’m using ChatLiteLLMRouter with LangGraph. Token count is tracked correctly in LangSmith, but cost is never shown - for any model (OpenAI, Anthropic, Gemini).
I tried adding langsmith callback as described here:
import litellm
from langchain_litellm import ChatLiteLLMRouter
litellm.success_callback = ["langsmith"]
llm = ChatLiteLLMRouter( # then use `llm` in my graph
router=litellm_router,
model="heavy",
temperature=0,
streaming=True,
)
This does show tokens + cost, but creates separate traces for each LLM call - outside my graph trace. And it’s not even consistent - as you can see in the screenshot, the first separate trace (Gemini) shows no cost at all, while the rest (Anthropic) do:
I don’t want that. What I want is the standard LangSmith behaviour:
- Root graph run shows total tokens + total cost
- Drilling into the run shows per-step token/cost breakdown
Do I need to write custom callbacks to achieve this? Any tips how to do it? Is there a built-in way? Should I do smth like described here?
Thanks in advance!
hi @VettHor
the docs page you linked - “a. Set a usage_metadata field on the run’s metadata” - is the right mental model, but it’s only step 1 of the cost pipeline: run.set(usage_metadata=...) only delivers tokens (or, optionally, raw input_cost/output_cost), it doesn’t price anything by itself. LangSmith prices a run only when three things are present together - a usage_metadata block, the pair ls_provider + ls_model_name on the run, and a matching regex row in Settings → Workspace → Models (Cost tracking, Metadata reference). ChatLiteLLMRouter never emits the ls_* pair (no _get_ls_params() override) and its response_metadata.model_name is the router group (e.g. "heavy"), not the model LiteLLM actually dispatched to - so LangSmith sees tokens with nothing to price them against, which is why cost is blank and why any occasional Anthropic figure is a misleading partial-match, not real tracking. The fix is to stop using litellm.success_callback = ["langsmith"] (it posts sibling traces outside your LangGraph tree) and instead apply that docs pattern from inside LangChain’s callback lifecycle: either (a) subclass the router and implement _get_ls_params() that reads the real model from LiteLLM’s response["model"] in _create_chat_result, then add matching pricing rows in the workspace; (a′) attach a BaseCallbackHandler whose on_llm_end patches the run’s metadata with ls_provider/ls_model_name - this is the direct LangGraph equivalent of the snippet on the page you linked; or (b) for non-linear pricing, compute the dollar amount with litellm.completion_cost() and write it onto the current run via get_current_run_tree().set(usage_metadata={"input_cost": ..., "output_cost": ...}) - all three keep everything inside one nested LangGraph trace with correct per-node cost aggregation.