lmms-eval is a benchmarking system and performance analysis suite designed to measure the capabilities of large multimodal models. It provides a framework for evaluating models across text, image, audio, and video datasets, serving as a multimodal dataset orchestrator and benchmarking tool to quantify accuracy and efficiency.
The project distinguishes itself through a unified multimodal message protocol that structures diverse media inputs for consistent model consumption. It features specialized benchmarking for audio, video, visual, document, and spatial reasoning, alongside tools for model safety evaluation focused on hallucinations, biases, and jailbreak susceptibility.
The system covers a broad range of capability areas, including performance analysis for throughput and token usage, statistical result validation for reproducibility, and inference optimization via response caching and multi-threaded media decoding. It also supports agentic loop execution for multi-round evaluations and provides a browser-based graphical interface for interactive configuration and launching.
Users can trigger evaluations programmatically through a functional API or an asynchronous HTTP server.