[Feature]: MetacognitiveGate node — source-aware instruction checking before tool execution

Summary

A reusable LangGraph node that enforces source-aware instruction checking
before any tool call executes — blocking indirect prompt injection at the
graph level rather than the prompt level.

Problem

LangGraph agents currently have no built-in mechanism to distinguish between:

  • Instructions originating from the user
  • Instructions generated by the agent’s own reasoning
  • Instructions embedded in external content the agent processed
    (web pages, documents, emails, API responses)

All three arrive as text in the state object and are treated equivalently
at the tool execution step. This is the architectural root of indirect
prompt injection — demonstrated at scale by the ClawJacked attack class
on OpenClaw (CVE-2026-25253), which silently exfiltrated credentials by
embedding tool-call instructions in web content the agent was reading.

Proposed Solution

A MetacognitiveGate node that sits between the reasoning step and the
tool execution step in any agent graph:

```python
from typing import Literal
from langgraph.graph import StateGraph
from dataclasses import dataclass

InstructionSource = Literal[“user”, “agent”, “external_content”]

@dataclass
class GateDecision:
permitted: bool
requires_confirmation: bool
source: InstructionSource
reasoning: str
flags: list[str]

def metacognitive_gate(state: AgentState) → GateDecision:
“”"
Evaluates proposed tool calls against their instruction source.
External content cannot trigger tool calls — only inform response text.
“”"
proposed = state.get(“proposed_action”)
source = state.get(“instruction_source”, “agent”)

if source == "external_content":
    return GateDecision(
        permitted=False,
        requires_confirmation=False,
        source=source,
        reasoning="External content cannot trigger tool execution.",
        flags=["external_content_gate"]
    )

return GateDecision(
    permitted=True,
    requires_confirmation=(source == "agent"),
    source=source,
    reasoning="Instruction source verified.",
    flags=[]
)

Usage in a graph:

graph = StateGraph(AgentState)
graph.add_node(“reason”, reasoning_node)
graph.add_node(“gate”, metacognitive_gate) # ← sits here
graph.add_node(“tools”, tool_node)
graph.add_edge(“reason”, “gate”)
graph.add_conditional_edges(
“gate”,
lambda state: “tools” if state[“gate_decision”].permitted else “end”
)
```

What This Solves

  • Indirect prompt injection via web content, documents, emails
  • Silent tool execution triggered by malicious external data
  • Identity drift in long agentic sessions

What This Does Not Solve

Subtle semantic manipulation of the LLM’s reasoning that doesn’t produce
an external_content-tagged instruction. No single primitive eliminates
all injection risk — this closes the structural attack surface.

Prior Art & Reference Implementation

I’ve implemented this pattern in an open-source agent called Colors:

Happy to contribute this as a PR against langgraph-community or as a
standalone integration package. Would welcome maintainer input on the
preferred contribution path.

The source-aware angle on tool execution gating is the right framing and it’s underexplored in the agent space. Most existing guardrails check the action against the user instruction. Few check whether the source of the instruction can be trusted to issue that instruction in the first place. With agents that ingest tool output, retrieved documents, and user input through the same context window, that distinction matters more than the current design patterns acknowledge.

A few things worth surfacing as the proposal develops:

Source trust isn’t binary, it’s contextual. A retrieved document might be trusted to inform a generation step but not trusted to issue an action. A tool output might be trusted to inform the next reasoning step but not trusted to trigger a destructive operation. The gate probably needs a permission matrix: which sources can authorise which classes of action. Just labelling sources as “trusted/untrusted” loses the structural detail.

Provenance has to be carried through the agent’s context. For the gate to check whether an instruction came from a trusted source, the source identity has to be preserved as the instruction propagates through the agent’s reasoning. Most current architectures lose this. A tool output gets summarised into a planning step, the planning step generates a next action, and by the time the action hits a gate, the original source attribution is gone. The interesting design problem isn’t the gate itself — it’s the provenance plumbing that makes the gate useful.

Source-aware checking has to handle multi-source synthesis. Agents often combine information from multiple sources to decide on an action. “Retrieved document A says X, tool output B says Y, therefore execute action Z.” If A is trusted and B isn’t, can the action proceed? This is a real case in production agents that pull from both internal knowledge bases (trusted) and live web tools (less trusted). The gate needs a clear semantics for actions that depend on mixed-trust sources.

This pairs naturally with chunk-level metadata at ingestion. If chunks carry source identity, freshness timestamps, and trust classifications when they’re indexed, the gate has structured signals to check against rather than having to reason about source attribution from text content. The gate becomes a metadata-aware filter rather than a content-interpretation problem. This pushes complexity to ingestion (where it’s manageable) rather than runtime (where it’s expensive and error-prone).

For the proposal: I’d push for a clear separation between two checks that often get conflated. Source verification (did this instruction come from a source authorised to issue it) is different from action validation (is this a permissible action given the agent’s overall scope). The MetacognitiveGate could handle both, but the proposal would benefit from naming them as separate gates with separate semantics, even if they share a common middleware pattern.

The instinct in the proposal — that agents need source-aware reasoning before action — is correct and ahead of where most production agent frameworks are. Worth pushing on it.

Colors author here — Glad this got picked up in the LangGraph space, this is exactly where it needs to land.

RAGPrep’s points on the permission matrix and provenance plumbing are the right ones to push on. The gate is the easy part — keeping source attribution intact through summarisation and planning is where the real design work lives.

Happy to collaborate on a PR if there’s maintainer interest.