30 open-source projects similar to agentops-ai/agentops, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Agentops alternative.
Helicone is an AI gateway and observability platform designed to intercept, manage, and monitor interactions with large language models. By acting as a reverse-proxy, it provides a centralized layer for routing requests across multiple AI providers, allowing developers to maintain consistent application logic while gaining deep visibility into model performance, usage, and costs. The platform distinguishes itself through a robust suite of traffic management and prompt engineering tools. It enables policy-driven control, including automatic failover between providers, rate limiting, and edge-b
This project is a self-hosted AI monitoring stack that functions as an LLM observability platform, AI evaluation framework, and OpenTelemetry trace analyzer. It is designed to capture and analyze LLM traces, sessions, and telemetry to monitor AI agent performance. The platform distinguishes itself as a Model Context Protocol server, exposing workspace functions as tools for AI coding agents. It enables the conversion of failing production traces into test datasets for regression testing and utilizes semantic-based session clustering to discover emerging user behavior patterns. The system cov
Deepagents is an LLM agent orchestration platform and stateful application server designed for deploying and managing AI agents built with computational graphs. It provides a containerized runtime environment that handles agent execution, state persistence, and the versioning of AI assistants. The platform distinguishes itself through deep integration with the Model Context Protocol, allowing agents to function as servers that expose tools and capabilities to external clients. It features a sophisticated observability suite for capturing execution traces, performing LLM-based evaluations agai
mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified set of tools, prompts, and resources. The project distinguishes itself through a multi-protocol translation bridge that converts communication between standard I/O, SSE, gRPC, and REST to enable interoperability between disparate tool servers. It includes a comprehensive LLM evaluation framework for
OSWorld is an evaluation framework and multimodal agent benchmark designed to test the ability of large language models to complete complex tasks within virtualized operating system environments. It provides a virtualized desktop sandbox and a virtual machine orchestrator to deploy, snapshot, and reset cloud-based desktops, ensuring reproducible test states for AI agent interactions. The system distinguishes itself by providing an OS-level action space that translates model decisions into mouse clicks, keyboard inputs, and system commands. It employs a standardized interface to integrate vari
Genkit is an open-source framework for building AI-powered applications. It provides a unified interface for connecting to hundreds of generative AI models from multiple providers, enabling text, image, audio, and video generation through a single API. The framework structures multi-step AI interactions—including chat, retrieval-augmented generation, tool use, and agentic workflows—as composable, traceable flows with built-in streaming and state management. The framework distinguishes itself through a comprehensive developer toolkit that includes a command-line interface and a local developer
Rivet is a visual LLM workflow designer and AI agent orchestration engine. It serves as a development environment for building retrieval augmented generation pipelines and a TypeScript library for embedding visual AI graphs and prompt logic into JavaScript applications. The system differentiates itself through a node-based editor that maps data flow between language models, vector databases, and external APIs. It provides specialized tools for prompt engineering, including interfaces for iterative prompt refinement and A/B testing to improve model response quality. The platform covers a broa
This project is an educational curriculum and architectural framework for building autonomous AI agents and multi-agent systems. It provides a structured learning path focused on the development of independent software components capable of planning, executing tasks, and utilizing external tools to achieve high-level goals. The framework emphasizes multi-agent system orchestration through distributed architectures where specialized agents collaborate using standardized communication protocols. It details specific design patterns such as dual-memory systems for maintaining short-term plans and
Beehave is a behavior tree AI framework and plugin for the Godot engine. It serves as an agent logic orchestrator for designing adaptive non-player character behaviors using hierarchical behavior trees. The system features a visual debugger that provides runtime visualization of the execution flow and internal state of active trees. This allows for AI execution analysis and the identification of logic errors through a state-based runtime interface. The framework utilizes a composite-decorator pattern and a tick-based polling cycle to execute tree-based logic. It integrates directly with the
Lmnr is an LLM observability platform and evaluation framework designed for tracing, logging, and monitoring language model executions. It provides the tools necessary to debug agent behavior, analyze performance, and identify failure patterns in AI agents. The platform differentiates itself through a trace-to-dataset pipeline that converts production logs into labeled test sets for regression testing. It includes a prompt-variant replay engine to compare different prompts or models side-by-side and a state-cached debugging system to replay agent loops without restarting the process. The sys
vibe-vibe is an LLM agent engineering framework and toolchain optimizer designed for orchestrating multi-agent systems. It serves as a comprehensive guide and methodology for transforming conceptual ideas into deployed applications through agentic software engineering. The project focuses on the orchestration of specialized AI agent roles with defined collaboration boundaries and iterative feedback loops. It provides frameworks for toolchain optimization, including the selection and evaluation of protocols that extend model capabilities and the design of standardized tool interfaces. The sys
Auto-GPT is an autonomous agent framework designed for creating and deploying AI agents that use large language models to plan and execute complex goals independently. The system provides a comprehensive environment for managing the entire agent lifecycle, from initial design and testing to live production deployment. The project features a low-code workflow designer that allows users to define agent behaviors by connecting functional blocks in a visual interface. It includes an agent marketplace for discovering and deploying pre-configured agent templates and a standardized evaluation tool t
CoAI is an enterprise-grade, self-hostable AI gateway platform that unifies access to over 200 AI models from more than 35 providers through a single OpenAI-compatible API endpoint. It functions as a multi-tenant gateway, routing requests across providers with load balancing, automatic failover, and priority-based routing, while exposing standard OpenAI API endpoints for chat, image generation, model listing, and billing to enable seamless integration with existing tools and clients. The platform distinguishes itself through a comprehensive set of operational capabilities built around the gat
OpenLLMetry is an OpenTelemetry-based observability framework and instrumentation library for generative AI applications. It provides toolsets for tracing and monitoring large language model workflows, capturing telemetry from model providers, agent frameworks, and vector databases using standardized semantic conventions. The project distinguishes itself by providing a specialized evaluation and experimentation suite that associates user feedback and prompt version hashes with specific execution traces. It includes a system for tracking model reasoning paths and enforcing security guardrails
HyperDX is an OpenTelemetry observability platform that provides centralized log management, distributed tracing, and a self-hosted monitoring stack. It functions as a unified system for collecting, indexing, and visualizing logs, metrics, and traces from cloud and container environments. The platform distinguishes itself with specialized tooling for large language model monitoring and session replay, allowing user interactions in the browser to be linked to backend telemetry. It employs schema-less JSON parsing to index structured logs dynamically and uses source maps to resolve minified sta
53AIHub is a centralized orchestration platform for deploying and managing AI agents and prompts across multiple large language model providers. It functions as a multi-model AI gateway and an operation portal for AI services, providing a unified interface to coordinate agents and prompts from various external platforms. The project distinguishes itself as a white-label AI portal designed for self-hosted infrastructure, allowing for full control over operational data on private servers or containers. It includes a comprehensive AI SaaS administration layer with a multi-tenant subscription eng
lmms-eval is a benchmarking system and performance analysis suite designed to measure the capabilities of large multimodal models. It provides a framework for evaluating models across text, image, audio, and video datasets, serving as a multimodal dataset orchestrator and benchmarking tool to quantify accuracy and efficiency. The project distinguishes itself through a unified multimodal message protocol that structures diverse media inputs for consistent model consumption. It features specialized benchmarking for audio, video, visual, document, and spatial reasoning, alongside tools for model
This project provides a translation layer and set of adapters designed to bridge AI agents with the Model Context Protocol. It functions as an integration layer that allows agents to operate as protocol-compliant servers and enables the conversion of protocol-based tools into formats compatible with agent frameworks and logic graphs. The adapters facilitate tool interoperability by wrapping external protocol tools for use within agent workflows and exposing internal agent capabilities to any client implementing the Model Context Protocol. This creates a communication bridge that supports inte
BetterChatGPT is a cross-platform user interface and OpenAI API client designed for interacting with large language models. It functions as a prompt engineering workspace and a self-hosted AI frontend that allows users to connect to models via API keys or custom proxy endpoints. The project distinguishes itself through conversation management tools, including the ability to organize chats into color-coded folders and maintain a library of reusable prompt templates. It also includes a real-time cost monitoring system that tracks token consumption and calculates estimated pricing for interactio
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
PraisonAI is an autonomous AI agent platform that coordinates multiple LLM-powered agents for research, planning, and execution of complex workflows. It functions as a multi-agent orchestration framework, a workflow builder, and a Model Context Protocol server, while also providing retrieval-augmented generation through vector knowledge bases. Agents can interact via CLI, web, or standardized protocols with sandboxed code execution. The platform distinguishes itself with a rich set of agent communication protocols, including A2A, REST, WebSocket, voice and telephony integration, and MCP, allo
OpenSquilla is an LLM agent orchestration framework designed to coordinate multi-step AI workflows and tool execution using directed acyclic graphs. It functions as a centralized system for managing specialized skill packages and executing complex reasoning sequences. The project distinguishes itself through a routing gateway that directs tasks to different AI providers based on complexity, cost, and performance. It utilizes a multi-tier AI memory system that organizes working, episodic, and semantic knowledge using local embeddings and SQLite, alongside a secure execution sandbox that isolat
Agno is an agent operating system designed to manage the lifecycle, tool execution, and persistent state of autonomous agents across distributed infrastructure. It provides a unified runtime environment that wraps diverse agent frameworks into a consistent, interoperable protocol, allowing developers to build and deploy complex multi-agent systems that coordinate tasks and delegate sub-processes. The platform distinguishes itself through a robust governance and orchestration layer that includes human-in-the-loop approval gates, role-based access control, and a centralized API gateway. It feat
The BeeAI Framework is an LLM agent framework and multi-agent orchestration engine used to build autonomous agents that coordinate reasoning, tool execution, and complex workflows. It functions as a structured AI output controller and RAG integration library, providing a unified interface to manage multiple language model providers. The framework is distinguished by its implementation of the Model Context Protocol, allowing agents, tools, and models to be shared between different AI platforms and hosted as agentic tooling servers. It enables the design of collaborative agent teams through dec
Archgw is a gateway proxy and data plane designed for agentic applications, providing a centralized layer for routing, safety, and orchestration between application logic and multiple large language model providers. It functions as an AI agent orchestrator that automates the execution of agent workflows to remove repetitive plumbing from the core codebase. The system features a provider-agnostic interface layer that standardizes disparate model APIs into a single format and a transparent proxy data plane to intercept traffic. It employs rule-based routing to decouple application logic from sp
Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing telemetry and experiments. The platform differentiates itself through a specialized embedding visualization tool used to detect data drift and optimize vector search. It provides a comprehensive evaluation suite that utilizes judge-based evaluators and ground-truth datasets to score model outputs, and
Vanna is a Python framework designed to build conversational interfaces that translate natural language into executable database queries. It functions as an enterprise-grade toolkit that connects language models to relational databases, allowing users to retrieve information through conversational prompts rather than manual code. The system maintains context across interactions by utilizing vector databases to store historical query patterns and schema metadata. The framework distinguishes itself through a focus on security and schema-aware generation. It incorporates granular access control,
This project provides a framework for managing multi-agent systems, designed to automate complex software development, infrastructure, and business workflows. It functions as a multi-agent workflow orchestrator that routes tasks to domain-specific workers while maintaining state persistence and infrastructure automation. By leveraging large language models, the system decomposes high-level objectives into actionable plans, ensuring that complex operations are executed with consistency and reliability. The framework distinguishes itself through its hierarchical agent registry and policy-driven
Swarms is a multi-agent orchestration framework and autonomous agent toolkit designed to coordinate large language model agents. It serves as a workflow engine for managing agent relationships, providing the infrastructure to build autonomous agents with integrated memory, tool-calling capabilities, and reasoning loops. The framework is distinguished by its multi-agent consensus systems, which utilize voting, adversarial debates, and judge agents to synthesize high-quality responses. It supports a variety of collaboration patterns, including director-worker hierarchies, expert synthesis, and
Coze-loop is an optimization platform and orchestration management suite for large language model agents. It functions as a comprehensive environment for the development, debugging, evaluation, and monitoring of AI agent performance. The project provides a dedicated prompt engineering playground for real-time iteration and validation of model responses. It includes an evaluation framework that runs automated assessments against datasets to generate performance metrics and verify output accuracy. The system covers observability through real-time execution tracing and historical analysis of ag