The visitor wants a tool to manage, version, and iterate on LLM prompts using a workflow similar to software development.

agenta-ai/agenta is the closest match — Agente is a comprehensive Prompt Ops platform that provides centralized versioning, templating, and evaluation workflows, allowing you to manage and iterate on LLM prompts with the same rigor as software development.. Other strong matches: arize-ai/phoenix, typpo/promptfoo, mshumer/gpt-prompt-engineer, microsoft/poml.

Why does agenta-ai/agenta match “a tool for versioning and managing prompts”?

Agente is a comprehensive Prompt Ops platform that provides centralized versioning, templating, and evaluation workflows, allowing you to manage and iterate on LLM prompts with the same rigor as software development.

Why does arize-ai/phoenix match “a tool for versioning and managing prompts”?

Arize Phoenix is a comprehensive LLMOps platform that provides prompt versioning, templating, and evaluation tools, making it a robust solution for managing the prompt development lifecycle.

Why does typpo/promptfoo match “a tool for versioning and managing prompts”?

This tool provides a robust framework for testing, evaluating, and benchmarking LLM prompts within CI/CD pipelines, though it focuses more on the validation and quality assurance side of prompt engineering than on a full Git-based versioning and templating workflow.

Why does mshumer/gpt-prompt-engineer match “a tool for versioning and managing prompts”?

This tool provides an automated framework for iteratively testing, benchmarking, and refining prompts, though it focuses more on algorithmic optimization than the Git-based versioning and management workflows typical of a prompt engineering platform.

Why does microsoft/poml match “a tool for versioning and managing prompts”?

This framework provides a structured approach to authoring, templating, and versioning prompts, offering the core logic needed for a development-centric prompt management workflow.

Prompt Management and Versioning Systems

Tools for tracking, versioning, and managing LLM prompts using software development workflows and version control.

Find the best repos with AI.We'll search the best matching repositories with AI.

agenta-ai/agenta
Agenta-AI/agenta
3,860View on GitHub
Agenta is a Prompt Ops lifecycle manager and prompt management platform that decouples prompt engineering from application code. It serves as a centralized system for developing, versioning, and deploying prompt templates and model configurations across different environments. The platform functions as an AI agent orchestrator with a visual interface for building agent workflows and connecting models to external tools. It further acts as an evaluation framework and observability tool, utilizing OpenTelemetry to capture execution traces, monitor latency, and track token costs. The system covers a broad range of capabilities including judge-based evaluation for scoring model outputs, registry-based prompt management for version control, and environment-based deployment to promote configurations through development and production stages. It also provides tools for converting production traces into test datasets and managing role-based access control for multi-tenant organizations. The platform can be installed using Docker Compose with reverse proxy options for traffic management.
Agente is a comprehensive Prompt Ops platform that provides centralized versioning, templating, and evaluation workflows, allowing you to manage and iterate on LLM prompts with the same rigor as software development.
TypeScriptPrompt Configuration TestingPrompt Evaluation ToolsPrompt Template Testing
View on GitHub3,860
arize-ai/phoenix
Arize-ai/phoenix
8,605View on GitHub
Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing telemetry and experiments. The platform differentiates itself through a specialized embedding visualization tool used to detect data drift and optimize vector search. It provides a comprehensive evaluation suite that utilizes judge-based evaluators and ground-truth datasets to score model outputs, and includes tools for RAG troubleshooting to inspect retrieval documents. Capabilities cover the entire development lifecycle, including automated output validation, systemic performance benchmarking, and prompt engineering optimization. The system also incorporates security and access controls, such as role-based access and sensitive data masking, alongside collaborative workspaces for sharing observability data. The platform can be deployed locally via a CLI or notebook, or scaled through Docker and Kubernetes.
Arize Phoenix is a comprehensive LLMOps platform that provides prompt versioning, templating, and evaluation tools, making it a robust solution for managing the prompt development lifecycle.
Jupyter NotebookPrompt Configuration TestingPrompt Evaluation ToolsPrompt Synchronization APIs
View on GitHub8,605
typpo/promptfoo
typpo/promptfoo
22,295View on GitHub
promptfoo is an evaluation framework for measuring the performance of large language model prompts, agents, and retrieval augmented generation pipelines. It provides a suite of tools for conducting comparative benchmarking and executing automated quality and security regressions. The system features a benchmarking suite for running identical prompts across different model providers to compare output quality side-by-side. It also includes a dedicated red teaming tool for identifying security vulnerabilities and prompt injection risks through automated penetration testing. The framework supports declarative evaluation pipelines and metric-based scoring to quantify model reliability. These capabilities are designed for integration into continuous integration and deployment workflows to prevent regressions in model behavior. Results can be visualized in shared reports to facilitate team reviews of performance data and security findings.
This tool provides a robust framework for testing, evaluating, and benchmarking LLM prompts within CI/CD pipelines, though it focuses more on the validation and quality assurance side of prompt engineering than on a full Git-based versioning and templating workflow.
TypeScriptPrompt Evaluation ToolsAutomated Prompt TestingLLM Evaluation
View on GitHub22,295
mshumer/gpt-prompt-engineer
mshumer/gpt-prompt-engineer
9,659View on GitHub
This project is an automated prompt engineering and optimization tool designed to iteratively create, test, and refine prompts using a language model to improve output quality. It functions as a framework for generating candidate prompts and ranking their performance through correctness matching and ELO-based ratings. The system includes capabilities for model distillation, generating high-quality example pairs from frontier models to create training data for smaller models. It also provides tools to condense prompts for smaller models and transform instruction-tuned prompts into completion-based patterns for base language models. The toolkit covers prompt performance benchmarking, classification tuning via ground-truth comparisons, and experiment tracking to record configurations and performance metrics over time.
This tool provides an automated framework for iteratively testing, benchmarking, and refining prompts, though it focuses more on algorithmic optimization than the Git-based versioning and management workflows typical of a prompt engineering platform.
Jupyter NotebookPrompt Evaluation ToolsPrompt Version TrackersAutomated Prompt Testing
View on GitHub9,659
microsoft/poml
microsoft/poml
4,853View on GitHub
Poml is a prompt management framework and templating engine designed for authoring, versioning, and rendering structured prompts for large language models. It uses a semantic markup language to organize prompts into reusable templates, combining them with dynamic context and data to generate formatted inputs. The system distinguishes itself by decoupling core prompt logic from final presentation through a stylesheet-based approach. It provides a dedicated JSON schema output generator to enforce strict, machine-parsable model responses and a configuration interface for managing function tool schemas and the exchange of requests and responses between prompts and models. The project covers a broad surface of prompt engineering capabilities, including modular composition, conditional rendering, and data iteration. It includes tools for data acquisition from external documents and webpages, as well as observability features for logging execution and capturing prompt snapshots. Developer tooling is provided via an SDK and IDE integrations that support real-time syntax validation and live render previews.
This framework provides a structured approach to authoring, templating, and versioning prompts, offering the core logic needed for a development-centric prompt management workflow.
TypeScriptPrompt Template TestingPrompt TemplatesPrompt Templates
View on GitHub4,853
linshenkx/prompt-optimizer
linshenkx/prompt-optimizer
30,927View on GitHub
Prompt Optimizer is a framework designed for the iterative refinement and testing of text-based instructions for large language models. It functions as an automated evaluation pipeline that systematically adjusts prompt structure, constraints, and clarity to improve the accuracy and consistency of model outputs. The system distinguishes itself through a model-agnostic interface that standardizes communication across different artificial intelligence providers. It incorporates a versioned asset management system to track prompt history, enabling developers to maintain consistency and perform rollbacks across various projects. By utilizing a batch-based evaluation approach, the tool measures performance metrics across multiple test cases to verify the reliability of prompt changes. Beyond core optimization, the project supports complex conversational testing, including multi-turn interactions and function call verification. It also provides integration capabilities through the Model Context Protocol, allowing local optimization workflows to connect with external artificial intelligence applications and development environments. The toolset further extends to media generation tasks, applying specific style parameters to produce visual content.
This framework provides a structured environment for prompt versioning, automated testing, and iterative refinement, aligning well with the requirements for managing LLM prompts like software code.
TypeScriptPrompt Evaluation ToolsAutomated Prompt Testing
View on GitHub30,927
mlflow/mlflow
mlflow/mlflow
26,554View on GitHub
MLflow is a comprehensive MLOps platform that includes dedicated tools for prompt engineering, versioning, and evaluation, providing a robust workflow for managing LLM lifecycles even if it is broader than a prompt-only system.
PythonPrompt RepositoriesLLM Evaluation
View on GitHub26,554
danielmiessler/fabric
danielmiessler/Fabric
42,408View on GitHub
Fabric is a command-line orchestrator designed to automate complex data processing and content generation tasks by chaining artificial intelligence models with modular prompt templates. It functions as a terminal-based tool that utilizes standard input and output streams, allowing users to pipe data directly into predefined reasoning strategies. By providing a model-agnostic abstraction layer, the system decouples execution logic from specific artificial intelligence vendors, normalizing requests and responses across different service providers. The platform distinguishes itself through its pattern-based orchestration, which enables the organization, storage, and reuse of custom prompt collections for consistent task execution. It includes a built-in server component that exposes these local prompt workflows as standard web endpoints, allowing external software and graphical interfaces to interact with custom logic as if it were a native model. Users can manage these interactions through a dedicated directory for private templates or via a graphical web dashboard, providing flexibility in how automated workflows are configured and monitored. Beyond its core orchestration capabilities, the tool offers a suite of utilities for development tasks, including document analysis, code context generation, and system interaction. It supports advanced reasoning techniques, such as chain-of-thought processing, and allows for specific model-to-pattern mapping to balance performance and operational costs. The system maintains state and configuration through local filesystem storage, ensuring portability across different operating environments.
Fabric provides a robust system for organizing, storing, and executing modular prompt templates via a CLI and local server, though it focuses more on workflow orchestration and automation than on formal Git-based versioning or automated evaluation suites.
GoAI Command-Line InterfacesModel Abstraction LayersTerminal AI Automation
View on GitHub42,408

Prompt Management and Versioning Systems

Agenta-AI/agenta

Arize-ai/phoenix

typpo/promptfoo

mshumer/gpt-prompt-engineer

microsoft/poml

linshenkx/prompt-optimizer

mlflow/mlflow

danielmiessler/Fabric