What are the best open-source alternatives to Promptfoo?

30 open-source projects similar to promptfoo/promptfoo, ranked by shared features. Top picks: confident-ai/deepeval, typpo/promptfoo, vibrantlabsai/ragas, mlflow/mlflow, ibm/mcp-context-forge, camel-ai/camel, langchain-ai/deepagents, openai/evals, mastra-ai/mastra, kilo-org/kilocode.

Is confident-ai/deepeval a good alternative to Promptfoo?

Deepeval is a framework for testing and evaluating large language model applications. It provides a suite of tools for executing automated regression tests, validating model output quality against defined standards, and tracing the execution of complex agent workflows. By integrating these capabili…

Is typpo/promptfoo a good alternative to Promptfoo?

promptfoo is an evaluation framework for measuring the performance of large language model prompts, agents, and retrieval augmented generation pipelines. It provides a suite of tools for conducting comparative benchmarking and executing automated quality and security regressions. The system featur…

Is vibrantlabsai/ragas a good alternative to Promptfoo?

Ragas is an evaluation framework designed to measure the performance of retrieval-augmented generation pipelines and autonomous agent workflows. It provides a comprehensive suite of tools for benchmarking system outputs, utilizing language models as automated judges to score performance against def…

Is mlflow/mlflow a good alternative to Promptfoo?

mlflow/mlflow is an open-source alternative to Promptfoo.

Is ibm/mcp-context-forge a good alternative to Promptfoo?

mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified…

Is camel-ai/camel a good alternative to Promptfoo?

This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models…

Is langchain-ai/deepagents a good alternative to Promptfoo?

Deepagents is an LLM agent orchestration platform and stateful application server designed for deploying and managing AI agents built with computational graphs. It provides a containerized runtime environment that handles agent execution, state persistence, and the versioning of AI assistants. The…

Is openai/evals a good alternative to Promptfoo?

Evals is a framework designed for automating, managing, and executing repeatable benchmarking suites to analyze the quality and performance of language models. It provides a platform for running standardized tests to measure model accuracy and track behavioral changes over time. The system disting…

Is mastra-ai/mastra a good alternative to Promptfoo?

Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic…

Is kilo-org/kilocode a good alternative to Promptfoo?

Kilocode is an autonomous engineering platform designed to orchestrate AI agents for complex software development tasks. It functions as a comprehensive system for automating coding, testing, and repository management by integrating directly with your codebase and terminal. The platform provides a…

Back to promptfoo/promptfoo

Open-source alternatives to Promptfoo

30 open-source projects similar to promptfoo/promptfoo, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Promptfoo alternative.

confident-ai/deepeval
confident-ai/deepeval
13,733View on GitHub
Deepeval is a framework for testing and evaluating large language model applications. It provides a suite of tools for executing automated regression tests, validating model output quality against defined standards, and tracing the execution of complex agent workflows. By integrating these capabilities into development pipelines, the platform ensures consistent performance and reliability throughout the software lifecycle. The platform distinguishes itself through its focus on programmatic validation and observability. It utilizes secondary language models to score output quality and employs
Pythonevaluation-frameworkevaluation-metricsllm-evaluation
View on GitHub13,733
typpo/promptfoo
typpo/promptfoo
22,295View on GitHub
promptfoo is an evaluation framework for measuring the performance of large language model prompts, agents, and retrieval augmented generation pipelines. It provides a suite of tools for conducting comparative benchmarking and executing automated quality and security regressions. The system features a benchmarking suite for running identical prompts across different model providers to compare output quality side-by-side. It also includes a dedicated red teaming tool for identifying security vulnerabilities and prompt injection risks through automated penetration testing. The framework suppor
TypeScript
View on GitHub22,295
vibrantlabsai/ragas
vibrantlabsai/ragas
12,659View on GitHub
Ragas is an evaluation framework designed to measure the performance of retrieval-augmented generation pipelines and autonomous agent workflows. It provides a comprehensive suite of tools for benchmarking system outputs, utilizing language models as automated judges to score performance against defined rubrics and reference data. By standardizing inputs, retrieved contexts, and generated responses into a unified schema, the project enables consistent analysis across complex AI applications. The framework distinguishes itself through its ability to generate synthetic test datasets from existin
Pythonevaluationllmllmops
View on GitHub12,659

Open-source alternatives to Promptfoo

confident-ai/deepeval

typpo/promptfoo

vibrantlabsai/ragas

mlflow/mlflow

IBM/mcp-context-forge

camel-ai/camel

langchain-ai/deepagents

openai/evals

mastra-ai/mastra

Kilo-Org/kilocode

raga-ai-hub/RagaAI-Catalyst

evidentlyai/evidently

mshumer/gpt-prompt-engineer

BoundaryML/baml

Agenta-AI/agenta

NVIDIA/Isaac-GR00T

VoltAgent/voltagent

stanfordnlp/dspy

lm-sys/FastChat

pydantic/pydantic-ai

Giskard-AI/giskard-oss

comet-ml/opik

microsoft/vscode-copilot-chat

keirp/automatic_prompt_engineer

letta-ai/letta

Azure/PyRIT

livekit/agents

nndl/llm-beginner

Helicone/helicone

NirDiamant/agents-towards-production