Summary
A reusable LangGraph node that enforces source-aware instruction checking
before any tool call executes — blocking indirect prompt injection at the
graph level rather than the prompt level.
Problem
LangGraph agents currently have no built-in mechanism to distinguish between:
- Instructions originating from the user
- Instructions generated by the agent’s own reasoning
- Instructions embedded in external content the agent processed
(web pages, documents, emails, API responses)
All three arrive as text in the state object and are treated equivalently
at the tool execution step. This is the architectural root of indirect
prompt injection — demonstrated at scale by the ClawJacked attack class
on OpenClaw (CVE-2026-25253), which silently exfiltrated credentials by
embedding tool-call instructions in web content the agent was reading.
Proposed Solution
A MetacognitiveGate node that sits between the reasoning step and the
tool execution step in any agent graph:
```python
from typing import Literal
from langgraph.graph import StateGraph
from dataclasses import dataclass
InstructionSource = Literal[“user”, “agent”, “external_content”]
@dataclass
class GateDecision:
permitted: bool
requires_confirmation: bool
source: InstructionSource
reasoning: str
flags: list[str]
def metacognitive_gate(state: AgentState) → GateDecision:
“”"
Evaluates proposed tool calls against their instruction source.
External content cannot trigger tool calls — only inform response text.
“”"
proposed = state.get(“proposed_action”)
source = state.get(“instruction_source”, “agent”)
if source == "external_content":
return GateDecision(
permitted=False,
requires_confirmation=False,
source=source,
reasoning="External content cannot trigger tool execution.",
flags=["external_content_gate"]
)
return GateDecision(
permitted=True,
requires_confirmation=(source == "agent"),
source=source,
reasoning="Instruction source verified.",
flags=[]
)
Usage in a graph:
graph = StateGraph(AgentState)
graph.add_node(“reason”, reasoning_node)
graph.add_node(“gate”, metacognitive_gate) # ← sits here
graph.add_node(“tools”, tool_node)
graph.add_edge(“reason”, “gate”)
graph.add_conditional_edges(
“gate”,
lambda state: “tools” if state[“gate_decision”].permitted else “end”
)
```
What This Solves
- Indirect prompt injection via web content, documents, emails
- Silent tool execution triggered by malicious external data
- Identity drift in long agentic sessions
What This Does Not Solve
Subtle semantic manipulation of the LLM’s reasoning that doesn’t produce
an external_content-tagged instruction. No single primitive eliminates
all injection risk — this closes the structural attack surface.
Prior Art & Reference Implementation
I’ve implemented this pattern in an open-source agent called Colors:
- Implementation: GitHub - thecolourfoundation/Color: "Open-source AI agent with encrypted memory and a consciousness gate on every action. Runs entirely on your machine. No cloud. No telemetry. Research by The Colour Foundation · GitHub
- Research writeup: Color/RESEARCH.md at main · thecolourfoundation/Color · GitHub
- Builds on ReAct (arXiv:2210.03629) and Constitutional AI (arXiv:2212.08073)
Happy to contribute this as a PR against langgraph-community or as a
standalone integration package. Would welcome maintainer input on the
preferred contribution path.