12 repositorios
Utilities for observing and refining the reasoning process of AI agents.
Distinguishing note: Focuses on the observability and debugging of agent logic.
Explore 12 awesome GitHub repositories matching artificial intelligence & ml · Agent Debugging Tools. Refine with filters or upvote what's useful.
Nanoclaw is an LLM agent orchestrator and multi-platform chat gateway designed to deploy and manage isolated AI agents. It provides a containerized runtime that executes agents within sandboxed Linux containers, ensuring filesystem and state isolation through dedicated workspaces and host bind-mounts. The project distinguishes itself through a unified routing pipeline that connects agents to diverse messaging platforms, including WhatsApp, Discord, Slack, Telegram, Signal, and iMessage. It integrates the Model Context Protocol to extend agent capabilities via managed external data and functio
Includes tools for diagnosing communication failures by analyzing host logs and querying session databases.
This framework provides a development toolkit for building autonomous agents that utilize language models to solve complex, non-deterministic tasks. Its core design centers on a code-executing architecture where agents generate and run Python code snippets to perform logic, data manipulation, and tool interactions. By moving beyond structured data formats, the system enables agents to manage program flow and object state through iterative reasoning cycles. The project distinguishes itself through its focus on code-based agent implementation and secure execution environments. Developers can ch
Enables observation and debugging of step-by-step agent decision-making processes.
Rasa is a chatbot development platform and conversational AI framework used to design, deploy, and integrate multi-turn conversational agents. It functions as an LLM orchestration engine and NLU dialogue manager, combining large language model fluency with structured business logic to control agent behavior. The framework enables the development of conversational assistants that automate text and voice interactions. It allows for the definition of conversational flows using flexible sequences and provides tools to inspect agent decisions to debug and validate the internal reasoning process.
Includes utilities for observing and refining the reasoning process of AI agents to fix dialogue logic.
Claude Code Templates is a comprehensive framework for orchestrating specialized AI agents and automating development workflows within local environments. It provides a structured system for defining, configuring, and deploying AI personas that handle specific technical tasks, ranging from backend architecture and frontend implementation to security auditing and infrastructure management. The project distinguishes itself through a configuration-driven approach that allows teams to standardize development environments and share reusable agent definitions across projects. It includes a robust C
Provides diagnostic tools for observing and debugging the reasoning process and system prompts of AI agents.
Coze Studio is a development platform for building intelligent agents and conversational applications. It provides a visual environment where users construct agents by linking workflows, knowledge bases, and custom prompts to automate complex tasks. The system functions as a central hub for managing AI model services, allowing developers to connect various providers to serve as the intelligence layer for their applications. The platform distinguishes itself through a node-based workflow orchestrator that enables the design of automated logic sequences on a visual canvas. It includes a modular
Provides a live runtime environment to monitor and debug agent behavior and workflow performance.
Comet LLM is an observability platform and evaluation framework designed for large language model applications and agentic workflows. It functions as a system for tracing, monitoring, and debugging execution flows while providing tools for prompt optimization and the enforcement of AI safety guardrails. The platform distinguishes itself through a combination of model-based scoring and heuristic metrics to quantify output quality and detect hallucinations. It includes a dedicated prompt and agent optimizer with an interactive playground for refining templates and tool configurations. For retri
Records execution spans and conversation histories to diagnose logic errors in complex autonomous agent behaviors.
LangChain.js is a framework for building, executing, and monitoring stateful agentic applications. It provides an orchestration engine that models workflows as directed graphs, allowing developers to connect language models, data sources, and external tools into modular, multi-step processes. The platform distinguishes itself through its focus on stateful execution and human-in-the-loop control. It manages agent lifecycles by persisting execution state across threads, enabling fault tolerance and the ability to pause workflows at designated breakpoints for manual review or modification. This
Provides an interactive environment to inspect node states, modify execution flow mid-run, and replay specific checkpoints for troubleshooting.
RagaAI-Catalyst is a suite of software implementation tools providing an SDK, dashboard, and platform for monitoring, debugging, red-teaming, and evaluating agentic AI workflows. It serves as an observability framework for tracing the execution paths of large language models and multi-agent systems. The project distinguishes itself through a security suite for automated red-teaming and vulnerability scanning to detect biases, alongside a centralized prompt registry that decouples templates from application code. It further provides an evaluation platform that combines synthetic data generatio
Provides specialized tools for analyzing interaction timelines and execution graphs to debug agent logic.
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Maintains historical logs of agent interactions to diagnose reasoning issues and improve response accuracy.
SerpentAI is a game AI development kit and computer vision framework designed for building autonomous agents that interact with video games. It serves as a game input automation tool and a machine learning model integration engine, allowing developers to create agents that perceive game states and execute actions. The framework utilizes a plugin-based agent architecture to provide modular extensions for game-specific logic and behaviors. It features a specialized system for training, bundling, and deploying machine learning classifiers to recognize visual contexts and game states in real time
Provides a dedicated debugger to visualize the internal state and decision-making process of AI agents.
This repository contains the comprehensive documentation for a code editor focused on AI-assisted software development and remote development workflows. It covers the implementation of AI agents and language models used for autonomous code generation, large-scale refactoring, and task iteration. The project is distinguished by its deep integration of autonomous AI agents capable of web navigation, application logic validation, and orchestrating multi-step development processes. It provides specialized frameworks for tailoring AI behavior through custom instructions, model context protocols, a
Provides utilities for observing and debugging the reasoning process and execution logs of AI agents.
PraisonAI is an autonomous AI agent platform that coordinates multiple LLM-powered agents for research, planning, and execution of complex workflows. It functions as a multi-agent orchestration framework, a workflow builder, and a Model Context Protocol server, while also providing retrieval-augmented generation through vector knowledge bases. Agents can interact via CLI, web, or standardized protocols with sandboxed code execution. The platform distinguishes itself with a rich set of agent communication protocols, including A2A, REST, WebSocket, voice and telephony integration, and MCP, allo
Provides an interactive debugger to pause, inspect state, set breakpoints, and step through agent execution.