maifoundationsHumbleBench

0

9 stars0 forksPython0 viewsarxiv.org/pdf/2509.09658

HumbleBench

Overview Hallucinations in multimodal large language models (MLLMs)---where the model generates content inconsistent with the input image---pose significant risks in real-world applications, from misinformation in visual question answering to unsafe errors in decision-making. Existing benchmarks…

Features

Evaluation Benchmarks - Measures epistemic humility in vision-language models.

HumbleBench

Features

Star history