What are the best open-source alternatives to Evidently?

30 open-source projects similar to evidentlyai/evidently, ranked by shared features. Top picks: vibrantlabsai/ragas, giskard-ai/giskard, arize-ai/phoenix, mlflow/mlflow, agenta-ai/agenta, comet-ml/comet-llm, typpo/promptfoo, marker-inc-korea/autorag, oumi-ai/oumi, comet-ml/opik.

Is vibrantlabsai/ragas a good alternative to Evidently?

Ragas is an evaluation framework designed to measure the performance of retrieval-augmented generation pipelines and autonomous agent workflows. It provides a comprehensive suite of tools for benchmarking system outputs, utilizing language models as automated judges to score performance against def…

Is giskard-ai/giskard a good alternative to Evidently?

Giskard is an evaluation framework, testing library, and quality monitoring system for large language models and AI agents. It serves as a toolkit for quantifying model performance and reliability, providing specialized capabilities for validating retrieval-augmented generation pipelines. The proj…

Is arize-ai/phoenix a good alternative to Evidently?

Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing te…

Is mlflow/mlflow a good alternative to Evidently?

mlflow/mlflow is an open-source alternative to Evidently.

Is agenta-ai/agenta a good alternative to Evidently?

Agenta is a Prompt Ops lifecycle manager and prompt management platform that decouples prompt engineering from application code. It serves as a centralized system for developing, versioning, and deploying prompt templates and model configurations across different environments. The platform functio…

Is comet-ml/comet-llm a good alternative to Evidently?

Comet LLM is an observability platform and evaluation framework designed for large language model applications and agentic workflows. It functions as a system for tracing, monitoring, and debugging execution flows while providing tools for prompt optimization and the enforcement of AI safety guardr…

Is typpo/promptfoo a good alternative to Evidently?

promptfoo is an evaluation framework for measuring the performance of large language model prompts, agents, and retrieval augmented generation pipelines. It provides a suite of tools for conducting comparative benchmarking and executing automated quality and security regressions. The system featur…

Is marker-inc-korea/autorag a good alternative to Evidently?

AutoRAG is an automation layer and optimization tool for retrieval-augmented generation. It provides a framework for measuring pipeline performance through an evaluation system and an automated search strategy that identifies the most effective combinations of retrieval and generation modules. The…

Is oumi-ai/oumi a good alternative to Evidently?

Oumi is a comprehensive large language model development platform designed for synthesizing data, fine-tuning models, and running performance evaluations. It serves as a unified environment for the entire model lifecycle, encompassing a training and fine-tuning suite, an evaluation framework, and t…

Is comet-ml/opik a good alternative to Evidently?

Opik is an observability and evaluation platform designed for generative AI applications and agentic workflows. It provides a centralized environment for tracing execution flows, managing prompt templates, and monitoring production performance, allowing teams to gain visibility into complex model i…

Back to evidentlyai/evidently

Open-source alternatives to Evidently

30 open-source projects similar to evidentlyai/evidently, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Evidently alternative.

vibrantlabsai/ragas
vibrantlabsai/ragas
12,659View on GitHub
Ragas is an evaluation framework designed to measure the performance of retrieval-augmented generation pipelines and autonomous agent workflows. It provides a comprehensive suite of tools for benchmarking system outputs, utilizing language models as automated judges to score performance against defined rubrics and reference data. By standardizing inputs, retrieved contexts, and generated responses into a unified schema, the project enables consistent analysis across complex AI applications. The framework distinguishes itself through its ability to generate synthetic test datasets from existin
Pythonevaluationllmllmops
View on GitHub12,659
giskard-ai/giskard
Giskard-AI/giskard
5,434View on GitHub
Giskard is an evaluation framework, testing library, and quality monitoring system for large language models and AI agents. It serves as a toolkit for quantifying model performance and reliability, providing specialized capabilities for validating retrieval-augmented generation pipelines. The project distinguishes itself through an automated red teaming tool and security scanner designed to identify vulnerabilities, prompt injections, and safety risks. It utilizes adversarial probing and synthetic edge case generation to quantify model robustness and detect information disclosure. The platfo
Python
View on GitHub5,434
arize-ai/phoenix
Arize-ai/phoenix
8,605View on GitHub
Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing telemetry and experiments. The platform differentiates itself through a specialized embedding visualization tool used to detect data drift and optimize vector search. It provides a comprehensive evaluation suite that utilizes judge-based evaluators and ground-truth datasets to score model outputs, and
Jupyter Notebookagentsai-monitoringai-observability
View on GitHub8,605

Open-source alternatives to Evidently

vibrantlabsai/ragas

Giskard-AI/giskard

Arize-ai/phoenix

mlflow/mlflow

Agenta-AI/agenta

comet-ml/comet-llm

typpo/promptfoo

Marker-Inc-Korea/AutoRAG

oumi-ai/oumi

comet-ml/opik

promptfoo/promptfoo

confident-ai/deepeval

kiln-ai/kiln

langchain-ai/deepagents

Helicone/helicone

explodinggradients/ragas

IBM/mcp-context-forge

ydataai/ydata-profiling

BoundaryML/baml

raga-ai-hub/RagaAI-Catalyst

ydataai/pandas-profiling

openai/evals

PAIR-code/lit

microsoft/vscode-copilot-chat

pydantic/pydantic-ai

stanfordnlp/dspy

microsoft/promptflow

Giskard-AI/giskard-oss

mastra-ai/mastra

lmnr-ai/lmnr