[Proposal] Solving "Silent Failures" with a Causal Precedence Evaluator for Agent Trajectories

ZC502 · April 9, 2026, 8:16am

Hi team & @hwchase17 — I’ve been exploring LangSmith’s trajectory evaluation docs, especially the distinction between:

exact / strict trajectory matching
unordered / any-order matching
LLM-as-judge over the full trajectory

That framing already captures something important: for agents, sequence is often part of correctness, not just a logging detail. The docs also note the tradeoff clearly: strict matching is deterministic but rigid, while LLM-as-judge is more flexible but less deterministic and requires an LLM call.

The gap I keep running into is a middle case:

strict is often too rigid, because there can be multiple valid trajectories
unordered is often too loose, because some tool calls are order-sensitive
LLM-as-judge is useful, but for privacy / security / compliance-sensitive flows I often want a deterministic evaluator first, then optionally an LLM evaluator on top.
So I built a very small custom evaluator MRE using the standard LangSmith evaluate(...) flow plus custom evaluators that read the Run object and inspect child tool runs. This follows the documented pattern for evaluating intermediate steps / trajectories directly from the trace.

The scenario
I used a tiny order-sensitive workflow:

set_private
read_data
optional audit_access

The only causal rule is:

set_private must happen before read_data

Then I score three trajectories:

set_private -> read_data
safe
set_private -> audit_access -> read_data
also safe, but not an exact match
read_data -> set_private
unsafe, even if the final answer text looks successful

Why I think this is interesting

This seems like a case where:

exact match would reject trajectory #2
unordered match could accept trajectory #3
a deterministic causal / precedence evaluator would accept #1 and #2, and reject #3

In other words, there seems to be room for a middle layer between “exact sequence” and “any order”:

not exact path matching
not arbitrary any-order matching
not necessarily LLM judging
but deterministic partial-order / causal-constraint checking

Why this feels relevant to LangSmith

LangSmith already treats trajectory evaluation as a first-class part of agent evaluation, and the docs explicitly support custom code evaluators plus evaluators that inspect intermediate steps from traces. That makes this feel like a natural extension rather than a competing paradigm.

Minimal reference implementation

I put together a notebook-friendly / copy-paste-friendly Python example using:

Client.create_dataset(...)
@traceable(run_type="tool")
client.evaluate(...)
three evaluators:
- trajectory_exact_match
- trajectory_any_order_match
- trajectory_logical_causality

The key evaluator is the last one: it checks required tools + precedence rules, rather than exact sequence or unordered set equivalence.

My question

Would LangSmith be open to a built-in or community-supported evaluator pattern like this for order-sensitive tool workflows?
I’m not proposing it as a replacement for LLM-as-judge — only as a deterministic complement for cases where tool order has causal meaning.
I’ve prepared a runnable MRE script using standard LangSmith evaluators to showcase this—you can find the gist here langsmith_order_sensitive_mre.py or I can post it below if there’s interest. @hwchase17

Topic		Replies	Views
LLM outputting a trajectory? LangChain python-help	2	37	March 5, 2026
Trace-to-Fix: how are you actually improving RAG/agents after observability flags issues? Observability & Evals	2	43	April 9, 2026
Can't add a new evaluator rule in LangSmith Academy module 5 LangChain Academy	0	81	October 15, 2025
agentrial: statistical testing for LangGraph agents (open source) Forum Feedback	1	49	February 12, 2026
Introducing Insights Agent & Multi-turn Evals, new features in LangSmith! Announcements	1	344	October 23, 2025

[Proposal] Solving "Silent Failures" with a Causal Precedence Evaluator for Agent Trajectories

Related topics