Hi all, sharing a third-party package built on top of AgentMiddleware, plus some empirical data, looking for feedback before pushing it further.
The gap I’m trying to close
create_agent lets you specify what tools the agent has, but not what order it must call them in, or what other events must happen between calls (user approval, options presented, etc.). For most agents that’s fine. For agents that touch payments, bookings, or anything irreversible, it isn’t - “agent forgot the approval step” becomes a real incident class.
What I built
llmsessioncontract: a runtime monitor based on session-type theory. The LangChain integration (llmsessioncontract[langchain], v0.3.1) is a drop-in AgentMiddleware subclass that:
- Defines the protocol as an explicit FSM (
ProtocolFSM+Transition), with optional per-edgeguardandactioncallbacks - Tool refs are derived from the
@toolcallables (ref(search)- no magic strings) - Mixes tool-call events (fired automatically by
wrap_tool_call) with non-tool events like!PresentOptionsor?UserApproval(fired explicitly by the orchestrator viamonitor.transition_event(...))- On violation, the user-supplied
on_violationcallback decides whether to log, raise, or surface aToolMessage(status="error")so the agent self-corrects on its next turn
- On violation, the user-supplied
from llmcontract.langchain import (
ProtocolFSM, Transition, ProtocolMonitor,
ProtocolEnforcerMiddleware, ref,
)
fsm = (
ProtocolFSM(initial="idle")
.add_transition(Transition(source="idle", tool=search_ref, phase="send", target="searching"))
.add_transition(Transition(source="searching", tool=search_ref, phase="recv", target="results"))
.add_transition(Transition(source="results", phase="send", target="presented",
event_label="PresentOptions"))
.add_transition(Transition(source="presented", phase="recv", target="approved",
event_label="UserApproval"))
.add_transition(Transition(source="approved", tool=book_ref, phase="send", target="booking"))
.add_transition(Transition(source="booking", tool=book_ref, phase="recv", target="done"))
.mark_terminal("done")
)
middleware = ProtocolEnforcerMiddleware(monitor=monitor, tool_refs=[...]).middleware agent = create_agent(model=..., tools=[...], middleware=[middleware])
Empirical data
I ran a small study on a flight-booking protocol (10 tasks × 3 models × 2 trials = 60 trajectories) with create_agent driving real ChatAnthropic against the FSM above:
| Model | violation rate |
|---|---|
| Claude Haiku 4.5 | 15% |
| Claude Sonnet 4.6 | 10% |
| Claude Opus 4.7 | 0% |
A separate study on Playwright MCP showed the opposite gradient - larger models took more shortcuts. Different protocols, different “skip” semantics, but the monitor surfaces both. Full breakdown in the repo’s examples/langchain_booking/reports/findings.md.
What I’d like feedback on
- Is AgentMiddleware the right extension point for this? I read it as exactly what you’d want - wrap_tool_call for tool events, orchestrator-side firing for everything else -but I’d love to hear if there’s a better factoring I missed.
- Should non-tool events flow through middleware too? Right now !PresentOptions (agent text replies) and ?UserApproval (projected user replies) are fired by the orchestrator. I considered hooking wrap_model_call to fire !PresentOptions automatically, but that couples the middleware to text-reply semantics that aren’t always what the protocol means. Curious how others have handled this.
- Linking from langchain docs - would it be appropriate to add a pointer from the AgentMiddleware page to this and similar third-party middlewares as ecosystem examples? Happy to draft a PR to a docs page if there’s a natural home for it.
Not asking for a PR into the main repo - per the contributing guide, third-party integrations stay on PyPI, which is where this lives. Just genuinely want eyes on the design before I scale up the case studies.
Thanks!