PAS2 - Hallucination Detector

Advanced AI Response Verification Using Model-as-Judge

This tool detects hallucinations in AI responses by comparing answers to semantically equivalent questions and using a specialized judge model.

How It Works

This tool implements the Paraphrase-based Approach for Scrutinizing Systems (PAS2) with a model-as-judge enhancement:

  1. Paraphrase Generation: Your question is paraphrased multiple ways while preserving its core meaning
  2. Multiple Responses: All questions (original + paraphrases) are sent to a randomly selected generator model
  3. Expert Judgment: A randomly selected judge model analyzes all responses to detect factual inconsistencies

Why This Approach?

When an AI hallucinates, it often provides different answers to the same question when phrased differently. By using a separate judge model, we can identify these inconsistencies more effectively than with metric-based approaches.

Understanding the Results

  • Confidence Score: Indicates the judge's confidence in the hallucination detection
  • Conflicting Facts: Specific inconsistencies found across responses
  • Reasoning: The judge's detailed analysis explaining its decision

Privacy Notice

Your queries and the system's responses are saved to help improve hallucination detection. No personally identifiable information is collected.

Enter Your Question

Or Try an Example

Help Improve the System

Your feedback helps us refine the hallucination detection system.

Was there actually a hallucination in the responses?
Did the judge model correctly identify the situation?

Paraphrase-based Approach for Scrutinizing Systems (PAS2) - Advanced Hallucination Detection

Multiple LLM models tested as generators and judges for optimal hallucination detection

Models in testing: mistral-large, gpt-4o, Qwen3-235B-A22B, grok-3, o4-mini, gemini-2.5-pro, deepseek-r1