Hi everyone — I’ve been working on a concept I’d love feedback on from the LangChain community.
As LLM and agent systems get embedded into real workflows, most mitigation strategies today fall into two buckets:
• Retrieval / RAG to ground answers
• Guardrails to filter unsafe or policy-violating content
These are important — but they don’t fully answer a different question:
How do we independently assess whether a model’s output is reliable enough to act on?
I’ve been building an MVP system (called TruCite) that acts as a verification layer, not a generator or retriever. Instead of asking “where did this answer come from?”, it asks:
• Does the structure of the answer make sense?
• Are uncertainty signals appropriate?
• Do numeric or high-liability claims have supporting evidence?
• Are there hallucination risk patterns?
It outputs:
→ A Truth Score (0–100)
→ A Decision Gate action (ALLOW / REVIEW / BLOCK)
→ A structured explanation for audit or human review
The goal isn’t to replace guardrails or RAG, but to sit after generation as a reliability filter before output reaches users, charts, contracts, or clinical/financial workflows.
Why I think this matters
As models improve, hallucinations become more subtle and confident. The failure mode shifts from obvious nonsense → plausible but wrong. That’s where structural and risk-signal evaluation may be as important as source retrieval.
Curious about the community’s thoughts:
Do you see a need for output reliability scoring that is model-agnostic?
How are people currently handling decision gating (block vs human review) in LangChain-based systems?
Are there open efforts around automated hallucination risk detection beyond citation checks?
Would love pointers to related projects or approaches others are exploring.
Thanks — looking forward to the discussion.