March 15, 2025
A New Benchmark for Assessing Hallucination in Medical Large Language Models
[Submitted on 25 Dec 2024 (v1), last revised 13 Mar 2025 (this version, v3)] View a PDF of the paper titled MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models, by Kaiwen Zuo and 1 other authors View PDF HTML (experimental) Abstract:Medical Large Language Models (MLLMs) have demonstrated potential in healthcare applications, yet their propensity for hallucinations — generating medically implausible or inaccurate information — presents substantial risks to patient care. This paper introduces MedHallBench, a comprehensive benchmark framework for evaluating and mitigating hallucinations in MLLMs. Our methodology integrates expert-validated medical case scenarios with established medical databases