Evidently is an AI observability platform and evaluation framework designed to quantify the performance of machine learning models and large language models. It functions as a monitoring tool for detecting data drift and quality degradation in tabular datasets, while providing a specialized analyzer for the faithfulness and correctness of retrieval augmented generation systems.
The project distinguishes itself through an evaluation framework that utilizes judge models and custom rubrics to score language model outputs. It includes tools for iterative prompt optimization and the generation of synthetic test datasets, including adversarial inputs for risk and brand safety testing.
The platform covers a broad range of capabilities including real-time telemetry tracing for AI workflows, automated quality assurance via CI/CD integration, and performance trend tracking. It provides visual dashboards for reporting and a threshold-based alerting system to notify users when quality metrics cross predefined limits.
Users can deploy a local workspace to manage projects and reports or use a no-code interface to configure evaluation workflows.