Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

Zepeng Bao,Shen Zhou,Qiankun Pi,Jianhao Chen,Mayi Xu,Ming Zhong,Yuanyuan Zhu,Tieyun Qian

Published 2025 in arXiv.org

ABSTRACT

Hallucination in large language models (LLMs) remains a critical barrier to their safe deployment. For hallucination detection to be practical in real-world scenarios, the use of efficient small models is essential to ensure low latency and minimal resource consumption. However, existing methods rely on fixed verification strategies, where simply tuning small models to mimic fixed verification trajectories fails to capture the adaptability required for diverse hallucination patterns, thereby inducing planning instability. To address this limitation, we propose a ``Learning to Evaluate and Adaptively Plan''(LEAP) framework, which shifts hallucination detection from fixed execution to dynamic strategy learning. Specifically, LEAP first employs a powerful teacher model to iteratively explore and refine verification strategies through a failure-driven loop. This dynamic planning capability is then distilled into an efficient student model, augmented by a novel proactive correction mechanism that enables the model to evaluate and optimize its verification strategy before execution. Experiments on three benchmarks demonstrate that LEAP outperforms state-of-the-art methods, offering an effective and scalable solution for reliable hallucination detection.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-11-08
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2511.05854 arXiv 2511.05854
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Luna: A Lightweight Evaluation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
2025cited by this paper
G2LDetect: A Global-to-Local Approach for Hallucination Detection
2025cited by this paper
Hallucination Detection and Hallucination Mitigation: An Investigation
2024cited by this paper
INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection
2024cited by this paper
Large Language Models are Learnable Planners for Long-Term Recommendation
2024cited by this paper
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
2024influential reference
FIRE: Fact-checking with Iterative Retrieval and Verification
2024influential reference
The Llama 3 Herd of Models
2024cited by this paper
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
2024cited by this paper
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
2024influential reference
AgentTuning: Enabling Generalized Agent Abilities for LLMs
2023cited by this paper
Reflexion: language agents with verbal reinforcement learning
2023cited by this paper
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
2023cited by this paper
FacTool: Factuality Detection in Generative AI - A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
2023influential reference
Efficient Memory Management for Large Language Model Serving with PagedAttention
2023cited by this paper
LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
2023cited by this paper
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
2023cited by this paper
LLMLight: Large Language Models as Traffic Signal Control Agents
2023cited by this paper
ReAct: Synergizing Reasoning and Acting in Language Models
2022cited by this paper
Out-of-Distribution Detection and Selective Generation for Conditional Language Models
2022cited by this paper
Uncertainty Estimation in Autoregressive Structured Prediction
2021cited by this paper
Billion-Scale Similarity Search with GPUs
2017cited by this paper
Policy Gradient Methods for Reinforcement Learning with Function Approximation
1999cited by this paper
Language Processing
1983cited by this paper

CITED BY

No citing papers are available for this paper.