I’m new to building AI agents, but after a lot of research I’ve decided to go all-in on the LangChain ecosystem.
My goal is to build a serious AI networking agent focused on troubleshooting and analysis (think real-world network ops, not demos).
Some context:
-
Most integrations will be via MCP servers (network devices, APIs, etc.)
-
I don’t see much need for filesystem-based tools
-
The agent should be able to:
-
Run structured troubleshooting workflows
-
Potentially spawn sub-agents for parallel analysis (e.g., per device or hypothesis)
-
Reuse modular “skills” over time
-
Where I’m stuck is how to actually structure this within the ecosystem:
-
Should I start simple with LangChain and evolve later?
-
Jump straight into LangGraph for more control?
-
Or think in terms of deeper agent systems from day one?
I also don’t know whether I’m overengineering this vs missing something fundamental.
Additional questions:
-
Are there recommended architectural patterns for multi-agent troubleshooting systems?
-
Is a hybrid approach (structured workflow + agent decision-making) common in practice?
-
Any real-world examples of similar systems (infra/networking/observability agents)?
Would really appreciate guidance from folks who’ve built non-trivial agent systems. I’m optimizing for something that can scale beyond a prototype.