Ragas is an evaluation framework designed to measure the performance of retrieval-augmented generation pipelines and autonomous agent workflows. It provides a comprehensive suite of tools for benchmarking system outputs, utilizing language models as automated judges to score performance against defined rubrics and reference data. By standardizing inputs, retrieved contexts, and generated responses into a unified schema, the project enables consistent analysis across complex AI applications. The framework distinguishes itself through its ability to generate synthetic test datasets from existin
This project is an automated prompt engineering and optimization tool designed to iteratively create, test, and refine prompts using a language model to improve output quality. It functions as a framework for generating candidate prompts and ranking their performance through correctness matching and ELO-based ratings. The system includes capabilities for model distillation, generating high-quality example pairs from frontier models to create training data for smaller models. It also provides tools to condense prompts for smaller models and transform instruction-tuned prompts into completion-b
This project is a Python framework for building autonomous, event-driven agent systems. It provides a unified runtime for orchestrating multi-agent workflows, managing persistent conversation state, and executing code within secure, isolated sandbox environments. The framework is designed to handle complex task delegation, allowing agents to invoke other agents as tools while maintaining context across multi-turn interactions. The framework distinguishes itself through its deep integration with the Model Context Protocol, enabling agents to connect to external data sources and remote services
Nanoclaw is an LLM agent orchestrator and multi-platform chat gateway designed to deploy and manage isolated AI agents. It provides a containerized runtime that executes agents within sandboxed Linux containers, ensuring filesystem and state isolation through dedicated workspaces and host bind-mounts. The project distinguishes itself through a unified routing pipeline that connects agents to diverse messaging platforms, including WhatsApp, Discord, Slack, Telegram, Signal, and iMessage. It integrates the Model Context Protocol to extend agent capabilities via managed external data and functio