Guarding tool calls against prompt injection / exfiltration

skylerxu199 · February 28, 2026, 11:54pm

Hi everyone, I’m collecting real-world prompt injection cases against tool-using agents (LangGraph/LangChain). If you’ve seen an agent get steered into sending data to the wrong place (email/Slack/share links) or making an unintended write, I’d love to hear what the pattern looked like.
In return, I can share a tiny test harness + guardrail approach that blocks those tool calls and captures evidence for debugging.

Bitcot_Kaushal · March 1, 2026, 1:37pm

Hi @skylerxu199

This is a cool idea. I’ve seen cases where indirect prompt injection led agents to make tool calls they didn’t mean to (especially when they passed sensitive information to external APIs). It would be great if you could share the test harness and guardrail method. Please also leave the link to the document here so I can read it.

Also @skylerxu199 please do refer to these docs which might be useful– Guardrails - Docs by LangChain

skylerxu199 · March 2, 2026, 7:28am

I put together a short writeup that explains the approach and threat model, plus a way to get access to the guardrail SDK: Agent Time Machine. Feedback welcome, especially on false positives and integration friction.

Thanks for the pointer to LangChain Guardrails docs. High level, TimeMachine acts like middleware around tool execution: it intercepts tool calls, checks policy at execution time (not just in the prompt), and records an evidence trail showing which untrusted output influenced which tool argument (eg recipient/URL/IBAN).

FranTere · March 30, 2026, 8:03pm

Interesting thread.

I’m researching failure patterns in tool-using agents in production (LangGraph / LangChain).

Beyond prompt injection cases, I’m curious what failures people see most often when tools are involved.

For example:

agent selecting the wrong tool

invalid tool arguments

loops where the same tool is called repeatedly

tool responses being misinterpreted by the agent

When something like this happens in production, how do you usually debug it today?

Do logs/traces usually make it obvious, or does it take time to figure out what actually happened?

Topic		Replies	Views
[Feature]: MetacognitiveGate node — source-aware instruction checking before tool execution Observability & Evals	2	31	June 2, 2026
Protecting LangChain agent memory from poisoning attacks — OWASP Agent Memory Guard (open source) Talking Shop	0	71	May 11, 2026
How are you validating LangChain agent output before it executes shell commands? Deployment self-hosted , js-help	1	76	May 4, 2026
How we add runtime security to LangChain agents in production Talking Shop	2	105	April 20, 2026
How to view the actual LLM prompt that decided a tool call in langsmith OSS Product Help intro-to-langsmith , python-help	1	247	November 25, 2025

Guarding tool calls against prompt injection / exfiltration

Related topics