multi-agent systems debugging agent-to-agent

I’m building multi-agent systems with LangGraph/CrewAI and I keep running into pain when debugging agent-to-agent failures, figuring out which agent caused a cascade, why an agent made a specific decision, and tracing MCP tool calls across agents.
I’ve tried Maxim AI and Galileo but curious — what’s your experience? What’s the #1 thing that frustrates you about debugging multi-agent workflows that no existing tool solves well?

For me, Langsmith solves almost all of the issues. I don’t stick to using a third-party platform. The granularity at which Langsmith provides the traces is amazing. Though we do use those insights actively to improve our agent through periodic cron jobs as well + having proper metadata for better filtering.
So far, I have not faced any major issue when it comes to tracking the trajectories of Agentic System through Langsmith, not sure about the other tools.
I would recommend you go through the following awesome blogs: