LangSmith cost/token attribution differs between LangGraph v2 streaming and v3 `astream_events` with Anthropic prompt caching

Hi LangSmith team,

I’m seeing confusing LangSmith cost/token reporting when using DeepAgents streaming with Anthropic prompt caching.

Setup:

  • DeepAgents: 0.6.8
  • Model: claude-sonnet-4-6
  • Provider: Anthropic via ChatAnthropic
  • AnthropicPromptCachingMiddleware is enabled
  • cache_control is present in the model run metadata
  • LangSmith SDK: 0.8.11
  • Streaming path tested: * v2: runtime_agent.astream(..., version="v2")
  • v3: runtime_agent.astream_events(..., version="v3")

Observed behavior:

When using v3 streaming, LangSmith trace cost appears much higher and does not seem to show prompt cache read/write attribution correctly. It looks like cached prompt tokens may be counted as normal input tokens or otherwise misattributed. However, Anthropic’s usage dashboard shows prompt caching is actually working correctly. When I switch the same agent back to v2 streaming locally, LangSmith starts showing cache reads correctly again.

So the confusing part is:

  • Anthropic dashboard confirms caching is working.
  • LangSmith with non-streaming / v2 streaming shows cache reads.
  • LangSmith with v3 astream_events does not appear to show the same cache-read attribution correctly.

This makes production cost monitoring difficult because LangSmith trace cost can look significantly higher than the actual Anthropic billing delta.

Question: is Anthropic prompt-cache cost attribution currently supported correctly with v3 astream_events? Or is the recommendation to stay on v2 streaming for production observability until v3 usage/cost attribution is fully supported?

Hi! Thanks for this report! I’ve identified a bug and am working on a fix right now.