What are the best open-source alternatives to Lighteval?

30 open-source projects similar to huggingface/lighteval, ranked by shared features. Top picks: openai/simple-evals, giskard-ai/giskard, eleutherai/lm-evaluation-harness, agenta-ai/agenta, packtpublishing/llm-engineers-handbook, lmnr-ai/lmnr, evolvinglmms-lab/lmms-eval, open-compass/opencompass, huggingface/evaluate, openai/evals.

Is openai/simple-evals a good alternative to Lighteval?

This project is a language model evaluation framework and benchmarking tool designed to measure the accuracy and performance of models across diverse datasets. It provides a system for implementing model-based graders, running standardized tests for mathematical reasoning, coding, and factuality, a…

Is giskard-ai/giskard a good alternative to Lighteval?

Giskard is an evaluation framework, testing library, and quality monitoring system for large language models and AI agents. It serves as a toolkit for quantifying model performance and reliability, providing specialized capabilities for validating retrieval-augmented generation pipelines. The proj…

Is eleutherai/lm-evaluation-harness a good alternative to Lighteval?

This project is a standardized framework for benchmarking large language models across a wide range of academic and reasoning datasets. It provides a platform for executing automated evaluation tasks to measure model accuracy and performance, ensuring consistent assessment through a structured conf…

Is agenta-ai/agenta a good alternative to Lighteval?

Agenta is a Prompt Ops lifecycle manager and prompt management platform that decouples prompt engineering from application code. It serves as a centralized system for developing, versioning, and deploying prompt templates and model configurations across different environments. The platform functio…

Is packtpublishing/llm-engineers-handbook a good alternative to Lighteval?

This project is an educational resource and engineering guide for building, deploying, and optimizing large language model applications and production pipelines. It serves as a blueprint for cloud AI infrastructure, providing a framework for orchestrating inference endpoints, data warehouses, and s…

Is lmnr-ai/lmnr a good alternative to Lighteval?

Lmnr is an LLM observability platform and evaluation framework designed for tracing, logging, and monitoring language model executions. It provides the tools necessary to debug agent behavior, analyze performance, and identify failure patterns in AI agents. The platform differentiates itself throu…

Is evolvinglmms-lab/lmms-eval a good alternative to Lighteval?

lmms-eval is a benchmarking system and performance analysis suite designed to measure the capabilities of large multimodal models. It provides a framework for evaluating models across text, image, audio, and video datasets, serving as a multimodal dataset orchestrator and benchmarking tool to quant…

Is open-compass/opencompass a good alternative to Lighteval?

OpenCompass is an open-source framework for standardized benchmarking of large language models. It provides a configurable evaluation pipeline that supports both objective and subjective assessment, using a dual-engine architecture to handle closed-form answer comparison and open-ended response rat…

Is huggingface/evaluate a good alternative to Lighteval?

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

Is openai/evals a good alternative to Lighteval?

Evals is a framework designed for automating, managing, and executing repeatable benchmarking suites to analyze the quality and performance of language models. It provides a platform for running standardized tests to measure model accuracy and track behavioral changes over time. The system disting…

Back to huggingface/lighteval

Open-source alternatives to Lighteval

30 open-source projects similar to huggingface/lighteval, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Lighteval alternative.

openai/simple-evals
openai/simple-evals
4,354View on GitHub
This project is a language model evaluation framework and benchmarking tool designed to measure the accuracy and performance of models across diverse datasets. It provides a system for implementing model-based graders, running standardized tests for mathematical reasoning, coding, and factuality, and calculating quantified performance metrics such as precision, recall, F1 scores, and pass-at-k. The framework utilizes model-based grading and rubrics to validate response quality against expert-defined criteria. It includes a multi-model benchmarking loop and a model-agnostic API interface to co
Python
View on GitHub4,354
giskard-ai/giskard
Giskard-AI/giskard
5,434View on GitHub
Giskard is an evaluation framework, testing library, and quality monitoring system for large language models and AI agents. It serves as a toolkit for quantifying model performance and reliability, providing specialized capabilities for validating retrieval-augmented generation pipelines. The project distinguishes itself through an automated red teaming tool and security scanner designed to identify vulnerabilities, prompt injections, and safety risks. It utilizes adversarial probing and synthetic edge case generation to quantify model robustness and detect information disclosure. The platfo
Python
View on GitHub5,434
eleutherai/lm-evaluation-harness
EleutherAI/lm-evaluation-harness
11,460View on GitHub
This project is a standardized framework for benchmarking large language models across a wide range of academic and reasoning datasets. It provides a platform for executing automated evaluation tasks to measure model accuracy and performance, ensuring consistent assessment through a structured configuration schema. The framework distinguishes itself by incorporating a dedicated utility for data decontamination, which identifies and removes overlapping training samples from evaluation sets to prevent data leakage. It also features a flexible task builder that allows users to define custom benc
Pythonevaluation-frameworklanguage-modeltransformer
View on GitHub11,460

Open-source alternatives to Lighteval

openai/simple-evals

Giskard-AI/giskard

EleutherAI/lm-evaluation-harness

Agenta-AI/agenta

PacktPublishing/LLM-Engineers-Handbook

lmnr-ai/lmnr

EvolvingLMMs-Lab/lmms-eval

open-compass/opencompass

huggingface/evaluate

openai/evals

confident-ai/deepeval

explodinggradients/ragas

Helicone/helicone

promptslab/Promptify

willccbb/verifiers

kiln-ai/kiln

Arize-ai/phoenix

open-mmlab/mmdetection3d

sjwhitworth/golearn

open-mmlab/mmsegmentation

OpenGVLab/LLaMA-Adapter

Instruction-Tuning-with-GPT-4/GPT-4-LLM

lm-sys/RouteLLM

InternLM/opencompass

microsoft/vscode-copilot-chat

traceloop/openllmetry

deepchecks/deepchecks

evidentlyai/evidently

comet-ml/opik

evalplus/evalplus