What are the best open-source alternatives to Opencompass?

30 open-source projects similar to open-compass/opencompass, ranked by shared features. Top picks: internlm/opencompass, openai/simple-evals, arize-ai/phoenix, giskard-ai/giskard, open-compass/vlmevalkit, oumi-ai/oumi, mlflow/mlflow, ibm/mcp-context-forge, huggingface/lighteval, comet-ml/comet-llm.

Is internlm/opencompass a good alternative to Opencompass?

OpenCompass is a comprehensive evaluation platform, benchmarking suite, and distributed model evaluator designed to measure the performance and accuracy of large language models. It provides a framework for benchmarking both open-source and API-based models against diverse datasets using standardiz…

Is openai/simple-evals a good alternative to Opencompass?

This project is a language model evaluation framework and benchmarking tool designed to measure the accuracy and performance of models across diverse datasets. It provides a system for implementing model-based graders, running standardized tests for mathematical reasoning, coding, and factuality, a…

Is arize-ai/phoenix a good alternative to Opencompass?

Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing te…

Is giskard-ai/giskard a good alternative to Opencompass?

Giskard is an evaluation framework, testing library, and quality monitoring system for large language models and AI agents. It serves as a toolkit for quantifying model performance and reliability, providing specialized capabilities for validating retrieval-augmented generation pipelines. The proj…

Is open-compass/vlmevalkit a good alternative to Opencompass?

VLMEvalKit is a vision-language model evaluation framework and inference engine designed to run standardized benchmarks and measure model accuracy across diverse visual datasets. It serves as a multimodal model benchmark and performance toolkit for calculating metrics and comparing model responses.…

Is oumi-ai/oumi a good alternative to Opencompass?

Oumi is a comprehensive large language model development platform designed for synthesizing data, fine-tuning models, and running performance evaluations. It serves as a unified environment for the entire model lifecycle, encompassing a training and fine-tuning suite, an evaluation framework, and t…

Is mlflow/mlflow a good alternative to Opencompass?

mlflow/mlflow is an open-source alternative to Opencompass.

Is ibm/mcp-context-forge a good alternative to Opencompass?

mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified…

Is huggingface/lighteval a good alternative to Opencompass?

Lighteval is an open-source framework for running standardized benchmarks and custom evaluation tasks against language models. It provides a system for defining new evaluation tasks with custom prompts, metrics, and scoring in YAML configuration files, and integrates with the Hugging Face Hub for s…

Is comet-ml/comet-llm a good alternative to Opencompass?

Comet LLM is an observability platform and evaluation framework designed for large language model applications and agentic workflows. It functions as a system for tracing, monitoring, and debugging execution flows while providing tools for prompt optimization and the enforcement of AI safety guardr…

Back to open-compass/opencompass

Open-source alternatives to Opencompass

30 open-source projects similar to open-compass/opencompass, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Opencompass alternative.

internlm/opencompass
InternLM/opencompass
7,096View on GitHub
OpenCompass is a comprehensive evaluation platform, benchmarking suite, and distributed model evaluator designed to measure the performance and accuracy of large language models. It provides a framework for benchmarking both open-source and API-based models against diverse datasets using standardized metrics and reproducible pipelines. The project features an automated judging framework that uses language models as judges to score and verify the quality of generated text. It includes a performance leaderboard system for comparing the relative capabilities of various models across industry-sta
Python
View on GitHub7,096
openai/simple-evals
openai/simple-evals
4,354View on GitHub
This project is a language model evaluation framework and benchmarking tool designed to measure the accuracy and performance of models across diverse datasets. It provides a system for implementing model-based graders, running standardized tests for mathematical reasoning, coding, and factuality, and calculating quantified performance metrics such as precision, recall, F1 scores, and pass-at-k. The framework utilizes model-based grading and rubrics to validate response quality against expert-defined criteria. It includes a multi-model benchmarking loop and a model-agnostic API interface to co
Python
View on GitHub4,354
arize-ai/phoenix
Arize-ai/phoenix
8,605View on GitHub
Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing telemetry and experiments. The platform differentiates itself through a specialized embedding visualization tool used to detect data drift and optimize vector search. It provides a comprehensive evaluation suite that utilizes judge-based evaluators and ground-truth datasets to score model outputs, and
Jupyter Notebookagentsai-monitoringai-observability
View on GitHub8,605

Open-source alternatives to Opencompass

InternLM/opencompass

openai/simple-evals

Arize-ai/phoenix

Giskard-AI/giskard

open-compass/VLMEvalKit

oumi-ai/oumi

mlflow/mlflow

IBM/mcp-context-forge

huggingface/lighteval

comet-ml/comet-llm

evidentlyai/evidently

openai/evals

EleutherAI/lm-evaluation-harness

thunlp/UltraChat

comet-ml/opik

OpenLMLab/GAOKAO-Bench

llm-attacks/llm-attacks

kiln-ai/kiln

Agenta-AI/agenta

PAIR-code/lit

PacktPublishing/LLM-Engineers-Handbook

EvolvingLMMs-Lab/lmms-eval

huggingface/open-r1

huggingface/alignment-handbook

facebookresearch/map-anything

NVIDIA/Isaac-GR00T

p-e-w/heretic

lmnr-ai/lmnr

verazuo/jailbreak_llms

autogluon/autogluon