ArkSim: a Testing framework for LangChain/LangGraph

Just open sourced ArkSim a testing framework for LangChain/LangGraph Agents.

ArkSim simulates multi-turn conversations with diverse synthetic users. It is meant to detect and capture issues early on before they hit production. There’s currently integration examples for LangChain/LangGraph.

repo: arksim/examples/integrations/langchain at main · arklexai/arksim · GitHub
docs: https://docs.arklex.ai/

Happy to answer any questions and would love feedback from people currently working on agents!

Update:
We’ve added CI integration (GitHub Actions, GitLab CI, and others), so ArkSim can now run automatically on every push, PR, or deploy.

We wanted to make multi-turn agent evals a natural part of the dev workflow, rather than something you have to run manually. This way, regressions and failures show up early—before they reach production.

Would really appreciate feedback from people working on agents about what other features would be helpful.