Merge or Die: a GitHub game for testing AI agents

aashmawy · March 8, 2026, 1:14pm

I built something that I think is relevant for anyone working with agents, tools, and multi-step LLM workflows.

It’s called Merge or Die:
https://www.trajectly.dev/merge-or-die

The idea is to make a real problem more visible:

A lot of agent workflows can look fine at the final answer level, while the actual trajectory is broken.

Examples:

tool calls happen in the wrong order
a forbidden tool gets used
the agent skips a required step
behavior regresses even though the output still sounds reasonable

So I built a GitHub-native challenge around Trajectly to show this in a more concrete way.

When an agent fails, it doesn’t just say “failed.” It shows:

the exact witness step where it broke
which behavioral contract was violated
how to reproduce the run
and a minimized failing trace

I wanted to make agent testing feel less abstract and more developer-native, especially for people building with frameworks like LangChain where multi-step behavior matters a lot more than just the final text.

Would genuinely love feedback from people here building agent systems:
How are you testing trajectory-level behavior today, beyond just checking the final response?

https://www.trajectly.dev/merge-or-die

Topic		Replies	Views
agentrial: statistical testing for LangGraph agents (open source) Forum Feedback	1	107	February 12, 2026
multi-agent systems debugging agent-to-agent LangGraph python-help	1	73	April 20, 2026
ArkSim: a Testing framework for LangChain/LangGraph Talking Shop	1	238	April 2, 2026
React agent into langgraph workflow LangSmith Product Help python-help	6	709	October 3, 2025
When is it actually a failure? Diagnosing agent behavior beyond LangGraph traces LangGraph intro-to-langgraph , product-feedback , python-help	19	269	April 29, 2026

Merge or Die: a GitHub game for testing AI agents

Related topics