# LLM Hallucination Detection Tools

> Search results for `detect hallucinations in LLM responses` on awesome-repositories.com. 119 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/detect-hallucinations-in-llm-responses

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/detect-hallucinations-in-llm-responses).**

## Results

- [comet-ml/opik](https://awesome-repositories.com/repository/comet-ml-opik.md) (17,787 ⭐) — Opik is an observability and evaluation platform designed for generative AI applications and agentic workflows. It provides a centralized environment for tracing execution flows, managing prompt templates, and monitoring production performance, allowing teams to gain visibility into complex model interactions and tool usage without requiring manual application code changes.

The platform distinguishes itself through its integrated approach to the AI development lifecycle, combining distributed trace instrumentation with automated evaluation frameworks. It supports model-as-a-judge scoring, synthetic data generation, and the conversion of production traces into structured test cases, enabling developers to iteratively refine prompts and agent behavior. By offering a collaborative debugger and chat-based workspace management, it facilitates direct interaction with execution data to identify errors and implement code remediations.

Beyond core observability, the system includes tools for dataset versioning, custom metric definition, and cost analysis to track resource allocation across teams. It also features a model gateway to standardize logging and security across diverse model providers. The platform is built for flexible deployment, supporting containerized execution and orchestration via Kubernetes to ensure consistency across local and cloud environments.
- [dair-ai/prompt-engineering-guide](https://awesome-repositories.com/repository/dair-ai-prompt-engineering-guide.md) (75,678 ⭐) — This project is a comprehensive educational resource and technical guide focused on the development, optimization, and application of large language models. It provides a structured curriculum for mastering prompt engineering, ranging from foundational principles of instruction design to advanced techniques for improving model reasoning, accuracy, and reliability.

The guide distinguishes itself by offering deep technical insights into agentic workflows and autonomous system design. It covers the implementation of multi-step reasoning chains, tool integration through function calling, and stateful memory management. Beyond basic prompting, it explores sophisticated frameworks that combine reasoning and acting, as well as methodologies for retrieval-augmented generation and the creation of synthetic datasets to address data scarcity in specialized domains.

The documentation also addresses the broader engineering surface of AI development, including defensive strategies for application security and automated evaluation loops for model verification. These resources are designed to support developers in building complex, task-oriented AI systems that can interact with external APIs and maintain continuity across long-running processes.
- [vectara/hallucination-leaderboard](https://awesome-repositories.com/repository/vectara-hallucination-leaderboard.md) (3,279 ⭐) — Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
- [noefabris/opencode-antigravity-auth](https://awesome-repositories.com/repository/noefabris-opencode-antigravity-auth.md) (8,563 ⭐) — This project is an authentication proxy and quota manager designed for accessing large language models via Google credentials. It serves as an orchestrator that handles identity management, session recovery, and the distribution of API requests across multiple authenticated accounts.

The system focuses on maintaining continuous service availability through dynamic account rotation and quota routing to bypass rate limits. It includes a grounding engine that links model responses to real-time web search results to reduce hallucinations and improve factual accuracy.

Additional capabilities cover session orchestration, which automatically detects failures and restarts interrupted operations. The project also provides controls for modulating model reasoning depth by adjusting thinking levels and token budgets.
- [crewaiinc/crewai](https://awesome-repositories.com/repository/crewaiinc-crewai.md) (53,687 ⭐) — CrewAI is a multi-agent orchestration framework designed for building autonomous systems that execute complex, multi-step workflows. It provides a development platform where specialized agents are defined with specific roles, goals, and tool sets to perform tasks collaboratively. By leveraging a declarative workflow engine, the system manages task dependencies, state transitions, and execution logic, allowing for the creation of structured, stateful sequences of operations.

The framework distinguishes itself through its hierarchical management capabilities, which utilize manager agents to coordinate specialist teams, delegate tasks, and oversee project execution. It incorporates a persistent memory architecture that enables agents to retain context and perform semantic searches across long-running operations. Furthermore, the system supports robust production-ready applications by enforcing schema-based output validation and providing execution checkpointing, which allows for mid-flight resumption and the replaying of specific tasks to debug or refine processes.

Beyond its core orchestration, the project offers a comprehensive suite of developer utilities for managing agent performance and workflow reliability. This includes tools for training agents through iterative cycles, monitoring system events via a central execution bus, and visualizing workflow structures. The platform also features a provider-agnostic interface for integrating external APIs and utilities, ensuring that agents can interact with diverse real-world services while maintaining consistent data structures throughout the execution lifecycle.
- [nickjiang2378/vlm-hallucinations](https://awesome-repositories.com/repository/nickjiang2378-vlm-hallucinations.md) (104 ⭐) — [ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"
- [vllm-project/semantic-router](https://awesome-repositories.com/repository/vllm-project-semantic-router.md) (3,205 ⭐)
- [nvidia/nemo-guardrails](https://awesome-repositories.com/repository/nvidia-nemo-guardrails-2.md) (6,453 ⭐) — NeMo-Guardrails is a toolkit for adding programmable safety constraints and dialogue boundaries to large language model conversational systems. It functions as security middleware that intercepts inputs and outputs to block prompt injections, jailbreaks, and sensitive data leaks, while providing a conversational dialogue manager to define structured interaction flows through configuration files.

The framework includes a hallucination filter to screen model outputs for factual accuracy and a specialized modeling language for defining conversational flows and constraints. It provides capabilities for conversational dialogue steering to keep assistants on topic and uses safety moderation to block prohibited content.

The system covers broader capability areas including vulnerability testing and safety evaluation tooling to scan for weaknesses. It also provides observability through request tracing, retrieved context validation to filter sensitive information, and secure tool execution for agentic workflows.

The project can be deployed as a standalone HTTP server or via containerized microservices to provide protected chat completions to external clients.
- [ditekshen/detection](https://awesome-repositories.com/repository/ditekshen-detection.md) (254 ⭐) — Detection in the form of Yara, Snort and ClamAV signatures.
- [vibrantlabsai/ragas](https://awesome-repositories.com/repository/vibrantlabsai-ragas.md) (12,659 ⭐) — Ragas is an evaluation framework designed to measure the performance of retrieval-augmented generation pipelines and autonomous agent workflows. It provides a comprehensive suite of tools for benchmarking system outputs, utilizing language models as automated judges to score performance against defined rubrics and reference data. By standardizing inputs, retrieved contexts, and generated responses into a unified schema, the project enables consistent analysis across complex AI applications.

The framework distinguishes itself through its ability to generate synthetic test datasets from existing documents, allowing developers to simulate diverse user queries and scenarios for rigorous testing. It supports component-wise metric decomposition, which isolates the performance of individual retrieval and generation modules to identify specific bottlenecks. Additionally, the project incorporates graph-based knowledge extraction to structure document collections, enabling multi-hop query generation and relationship-based testing that goes beyond simple string matching.

Beyond its core evaluation capabilities, the project offers extensive support for workflow automation, observability, and configuration management. It includes asynchronous execution harnesses for high-throughput testing, integration primitives for various language model providers and orchestration frameworks, and advanced monitoring tools for tracking metrics and execution traces. Users can further customize evaluation logic through prompt-driven metric definitions and automated optimization strategies.
- [googlechrome/lighthouse](https://awesome-repositories.com/repository/googlechrome-lighthouse.md) (30,355 ⭐) — Lighthouse is an automated diagnostic tool that evaluates web pages against industry standards for performance, accessibility, and search engine optimization. It functions as a programmatic analysis engine and a command-line utility, allowing developers to integrate comprehensive web quality checks directly into continuous integration pipelines and local development workflows.

The project distinguishes itself through a modular architecture that utilizes artifact-based data collection to ensure consistent analysis across different environments. It supports a headless execution mode for automated testing and provides a plugin-driven framework, enabling developers to register custom audit logic and specialized reporting categories to meet unique project requirements.

Beyond its core auditing capabilities, the tool detects underlying web frameworks and content management systems to provide tailored optimization recommendations. It generates structured, machine-readable reports and offers multiple interfaces, including a browser-integrated panel and a dedicated extension, to facilitate real-time feedback during the development process.
- [responsivebp/responsive](https://awesome-repositories.com/repository/responsivebp-responsive.md) (870 ⭐) — :iphone: A super lightweight HTML, Sass, CSS, and JavaScript framework for building responsive websites
- [googlechrome/chrome-extensions-samples](https://awesome-repositories.com/repository/googlechrome-chrome-extensions-samples.md) (17,623 ⭐) — This repository serves as a comprehensive reference library for browser extension development, providing a collection of code samples and implementation patterns. It is designed to help developers understand the requirements for building extensions that adhere to current manifest standards, specifically focusing on the transition to and implementation of version three specifications.

The project provides functional examples for core extension capabilities, including the use of event-driven background service workers, isolated content script injection, and message-passing for inter-process communication. It demonstrates how to configure extension metadata, manage browser UI customizations like action-triggered popups, and integrate various web APIs to modify browser behavior.

These resources cover the full lifecycle of extension development, from initial manifest configuration and local directory loading for debugging to the final packaging and publication process. The repository is structured to assist with both learning individual API usage and building complex, multi-component extensions using standard web technologies.
- [opengvlab/internvl](https://awesome-repositories.com/repository/opengvlab-internvl.md) (10,061 ⭐) — InternVL is a vision-language model framework that fuses a visual encoder with a large language model to translate image features into textual tokens for reasoning. It provides a system for multimodal inference and dialogue, enabling the processing of images and text to answer questions or generate descriptions.

The project is distinguished by its high-resolution image processing, which uses dynamic tiling to maintain detail for images up to 4K resolution, and its chain-of-thought visual reasoning for solving complex mathematical and spatial problems. It also supports temporal frame sampling for video understanding and provides zero-shot capabilities for image classification and multilingual cross-modal retrieval.

The framework covers a broad range of capabilities including optical character recognition, object localization, and semantic image segmentation. It supports distributed multimodal training and fine-tuning via low-rank adaptation, as well as performance optimizations such as weight quantization and model distillation.

Deployment is supported through an OpenAI-compatible REST interface, a web-based chat interface, and a command-line interface with multi-GPU layer distribution.
- [monzo/response](https://awesome-repositories.com/repository/monzo-response.md) (0 ⭐) — Dealing with incidents can be stressful. On top of dealing with the issue at hand, responders are often responsible for handling comms, coordinating the efforts of other engineers, and reporting what happened after the fact. Monzo built Response to help reduce the pressure and cognitive burden…
- [othmanadi/planning-with-files](https://awesome-repositories.com/repository/othmanadi-planning-with-files.md) (14,139 ⭐) — Planning with files is an enterprise knowledge graph platform designed to transform unstructured organizational data into a searchable, interconnected network. By utilizing a graph-based retrieval-augmented generation engine, the system grounds language model outputs in verified internal data, ensuring that responses are explainable, traceable, and free from hallucinations.

The platform distinguishes itself through a focus on data sovereignty and secure, private infrastructure deployment. It enables organizations to maintain full control over sensitive information by processing data locally or within regional cloud environments, preventing the use of internal knowledge for external model training. The architecture supports granular security through attribute-based access control and allows for the isolation of knowledge into distinct, domain-specific workspaces while maintaining a unified semantic logic across the entire organization.

Beyond core retrieval, the system provides a comprehensive suite of tools for managing the data lifecycle, including automated business workflow execution and audit-ready event logging. It facilitates collective intelligence by aggregating expert experience and project documentation into a centralized repository, which can be analyzed to identify infrastructure dependencies and optimize operational efficiency.

The project is implemented in Python and is designed for deployment within customer-managed infrastructure to meet strict regulatory compliance and data governance requirements.
- [emcie-co/parlant](https://awesome-repositories.com/repository/emcie-co-parlant.md) (18,119 ⭐) — Parlant is an agentic workflow engine and orchestration framework designed for building conversational AI that adheres to strict behavioral guidelines. It provides a platform for managing multi-turn interactions through state-machine-based logic, allowing developers to define complex, hierarchical conversational flows that can adapt, skip, or revisit steps based on real-time user input.

The framework distinguishes itself through its focus on behavioral governance and observability. It enables developers to define precise domain terminology and enforce instruction compliance through prioritized guidelines, ensuring that agents remain consistent and brand-aligned. To maintain transparency, the system includes built-in reasoning audits and decision tracing, which log internal decision paths and guideline matches to help developers troubleshoot agent behavior and refine instructions.

Beyond core orchestration, the platform supports a wide range of operational capabilities, including tool execution middleware, dynamic data injection, and event-driven hooks for external integrations. It manages the full interaction lifecycle, from intent disambiguation and session context maintenance to frontend metadata attachment and response streaming. These features allow for the creation of context-aware interfaces that remain grounded in current information while providing a responsive user experience.
- [responsively-org/responsively-app](https://awesome-repositories.com/repository/responsively-org-responsively-app.md) (24,991 ⭐) — This application is a specialized web browser designed to streamline responsive design testing by rendering multiple viewport configurations simultaneously. It functions as a cross-platform testing suite that allows developers to preview and interact with web content across diverse mobile, tablet, and desktop device profiles within a single workspace.

The tool distinguishes itself by synchronizing user interactions and application state across all active browser instances. When a user navigates, scrolls, or clicks in one view, these events are broadcast to every other open viewport to ensure consistent behavior. Furthermore, it maintains shared session data, including cookies and local storage, across all instances, allowing for the testing of authentication and state persistence in real-time.

Beyond basic previewing, the application provides integrated debugging capabilities that allow for simultaneous element inspection and style analysis across different screen sizes. Users can manage complex testing environments through declarative device configurations, enabling the rapid switching of device sets. The tool also supports visual regression documentation by capturing screenshots of entire pages across multiple profiles to track design changes.
- [contra/react-responsive](https://awesome-repositories.com/repository/contra-react-responsive.md) (7,172 ⭐) — react-responsive is a set of utility tools and hooks for evaluating CSS media queries within React components. It functions as a viewport state manager that detects screen dimensions and triggers user interface changes based on defined breakpoints.

The project includes a helper for server-side rendering and automated testing that allows device properties to be overridden via context. This ensures consistent rendering when browser-native detection is unavailable.

The library covers adaptive component rendering, viewport change monitoring, and responsive layout detection. It uses the native matchMedia API to listen for media query transitions and execute corresponding logic within the application.
- [simstudioai/sim](https://awesome-repositories.com/repository/simstudioai-sim.md) (28,796 ⭐) — This project is an AI agent orchestration platform that provides a visual environment for building, testing, and deploying complex automation workflows. It functions as a low-code development interface where users can chain discrete functional blocks into dependency-aware pipelines to integrate artificial intelligence with external data and services. The platform supports the creation of intelligent conversational agents, automated business processes, and multi-service API orchestrations within a unified workspace.

The platform distinguishes itself through its event-driven integration engine, which triggers automated sequences based on real-time webhooks, scheduled events, or changes in third-party platforms. It offers a secure, cloud-native execution sandbox for running custom code, data transformations, and AI model inferences in isolated environments. Users can maintain stateful memory across multi-stage tasks, implement complex branching logic, and utilize human-in-the-loop components to pause and approve workflow execution.

The system covers a broad capability surface, including extensive connectors for cloud storage, communication platforms, CRM systems, and project management tools. It provides utilities for managing infrastructure, observability, and security, alongside specialized tools for meeting intelligence, data enrichment, and web scraping. The platform supports deployment on managed cloud infrastructure or self-hosted container environments, ensuring full control over data and model execution.
- [fincept-corporation/finceptterminal](https://awesome-repositories.com/repository/fincept-corporation-finceptterminal.md) (26,900 ⭐) — FinceptTerminal is a quantitative finance platform and financial engineering library designed for asset valuation, risk management, and fixed-income analytics. It provides a comprehensive suite for algorithmic trading and investment strategy automation, integrating specialized language model agents and node-based workflows to automate market research and alpha generation.

The project distinguishes itself with a dedicated game theory analysis engine for calculating Nash equilibria and simulating strategic interactions in competitive markets. It also features a specialized credit risk modeling tool for estimating default probabilities, building credit scorecards, and calculating expected losses.

The system covers a broad range of capability areas, including derivatives pricing, yield curve construction, and multi-asset portfolio analysis. It incorporates machine learning tools for credit scorecard development and feature engineering, as well as economic analysis frameworks for utility theory and exchange economies.

The platform includes an algorithmic trading suite for real-time trade execution and an LLM investment agent framework for geopolitical and market modeling.
- [huggingface/open-r1](https://awesome-repositories.com/repository/huggingface-open-r1.md) (26,326 ⭐) — Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning.

The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test cases, the framework improves accuracy in mathematical and logical problem-solving. It further supports advanced reasoning capabilities through group relative policy optimization and automated synthetic data pipelines, which curate and filter high-quality reasoning traces for model updates.

The system utilizes modular, configuration-driven recipes to streamline complex workflows, including data decontamination, dataset composition, and multi-node orchestration. It includes standardized benchmarking tools to measure performance across reasoning and coding domains, ensuring that training processes remain reproducible and data-centric. The framework is built to handle the full lifecycle of model improvement, from initial synthetic data generation to final performance evaluation on high-performance computing clusters.
- [hannibal046/awesome-llm](https://awesome-repositories.com/repository/hannibal046-awesome-llm.md) (26,933 ⭐) — This project serves as a comprehensive, static directory of external resources dedicated to the study and application of large language models. It functions as a centralized discovery point for developers and researchers, aggregating foundational academic papers, technical documentation, and specialized tools within a structured, version-controlled knowledge base.

The repository distinguishes itself through a multi-level classification system that organizes diverse technical domains, ranging from model training frameworks and inference optimization to AI safety and hallucination detection. By maintaining a community-driven curation model, the directory ensures that its collection of tutorials, datasets, and prompt engineering techniques remains current with emerging research trends and industry developments.

Beyond its core indexing capabilities, the project covers a broad spectrum of practical resources, including guidance on model alignment, human preference datasets, and domain-specific applications such as healthcare and code generation. The entire knowledge base is structured as a hierarchical collection of links and summaries, providing a collaborative hub for mastering natural language processing.
- [doc-detective/doc-detective](https://awesome-repositories.com/repository/doc-detective-doc-detective.md) (0 ⭐) — Doc Detective is doc content testing framework that makes it easy to keep your docs accurate and up-to-date. You write tests, and Doc Detective runs them directly against your product to make sure your docs match your user experience. Whether it’s a UI-based process or a series of API calls, Doc…
- [liuziwei7/fashion-detection](https://awesome-repositories.com/repository/liuziwei7-fashion-detection.md) (495 ⭐) — Fashion Detection in the Wild (Deep Clothes Detector)
- [eleutherai/lm-evaluation-harness](https://awesome-repositories.com/repository/eleutherai-lm-evaluation-harness.md) (11,460 ⭐) — This project is a standardized framework for benchmarking large language models across a wide range of academic and reasoning datasets. It provides a platform for executing automated evaluation tasks to measure model accuracy and performance, ensuring consistent assessment through a structured configuration schema.

The framework distinguishes itself by incorporating a dedicated utility for data decontamination, which identifies and removes overlapping training samples from evaluation sets to prevent data leakage. It also features a flexible task builder that allows users to define custom benchmarks by specifying unique data sources, prompt structures, and modular scoring metrics.

The system supports large-scale testing by orchestrating distributed evaluation workloads across multiple compute nodes. It utilizes an abstracted interface to standardize communication with diverse model backends, facilitating systematic validation of model capabilities before deployment.
- [datatables/datatables](https://awesome-repositories.com/repository/datatables-datatables.md) (7,408 ⭐) — DataTables is a feature-rich HTML table library that transforms static HTML tables into interactive data grids with sorting, paging, filtering, and server-side processing support. It provides a client-side rendering engine that handles table rows, pagination, and sorting entirely in the browser, while also offering a server-side processing pipeline that offloads sorting, filtering, and paging operations to a backend for efficient handling of large datasets.

The library distinguishes itself through its plugin-based extension system, which allows custom functions and widgets to modify table behavior or rendering, and its CSS framework integration layer that automatically adapts styling to match Bootstrap 3/4/5, Bulma, or other frameworks. It supports inline editing with row injection, responsive layout reflow that adjusts column visibility based on viewport size, and an Ajax data source abstraction for fetching remote data with configurable HTTP methods and parameter mapping.

Additional capabilities include multi-column sorting, text search filtering that narrows rows in real time, dynamic row grouping, table content scrolling, and pagination controls. The library also provides form submission configuration for sending data as JSON or standard HTTP parameters, server-side column filtering, and conditional field validation for dependent form inputs. It offers internationalization for translating UI labels, frontend framework integration for React and Vue, and a custom package builder for selecting only needed components.

The library can be installed via npm, yarn, NuGet, or Composer, and is also available through CDN hosting for fast delivery without local file management.
- [lmeszinc/azurlaneautoscript](https://awesome-repositories.com/repository/lmeszinc-azurlaneautoscript.md) (9,292 ⭐) — AzurLaneAutoScript is a mobile game automation system designed to perform repetitive gameplay tasks unattended. It functions as a screenshot-driven bot that controls Android devices, emulators, and cloud phones via ADB and uiautomator2, using computer vision to make interaction decisions instead of fixed timers.

The project distinguishes itself through an advanced computer vision suite that includes local optical character recognition and perspective-aware grid detection. These tools allow the bot to parse 3D game maps, compute vanishing points, and normalize grid-centered objects for precise entity identification.

The system covers a broad range of operational capabilities, including the automation of combat missions, daily routines, resource harvesting, and fleet management. It features a centralized task scheduler to coordinate independent jobs and can be deployed as a cross-platform Electron desktop application, a web-based remote controller, or a headless Docker container.

The software supports a variety of environments, including ARM-based hardware, single-board computers, and multiple Android emulator distributions.
- [koajs/koa](https://awesome-repositories.com/repository/koajs-koa.md) (35,713 ⭐) — Koa is a lightweight web framework for Node.js designed for building HTTP applications and servers. It functions as an asynchronous middleware engine that processes network requests through a sequence of functions sharing a common context.

The framework distinguishes itself by using an onion-model middleware stack and promise-based flow control. This architecture allows requests to flow downstream and responses to flow back upstream through the same chain, enabling non-blocking request cycles and a modular approach to handling network traffic.

The system provides high-level capabilities for managing HTTP context, including the encapsulation of request and response streams into a unified object. It covers wide-ranging areas such as content negotiation, response header management, cookie handling, and centralized server-side error handling.

The framework is designed to be extended via middleware libraries and plugins to add specialized functionality.
- [yocontra/react-responsive](https://awesome-repositories.com/repository/yocontra-react-responsive.md) (7,171 ⭐) — react-responsive is a media query library for React used to implement responsive design by conditionally rendering components based on viewport dimensions. It provides hooks and utilities for viewport detection and monitoring screen characteristics and orientation changes.

The library includes a server-side rendering mock and device simulation capabilities. These allow for the manual override of device settings via properties or context to ensure consistent rendering during server-side operations and to facilitate automated UI testing.

The project covers device characteristic detection and viewport monitoring. It employs a callback system and listeners that execute functions whenever a monitored media query state transitions.
- [elevenlabs/elevenlabs-python](https://awesome-repositories.com/repository/elevenlabs-elevenlabs-python.md) (2,873 ⭐) — This Python SDK provides a comprehensive toolkit for synthetic audio generation, voice cloning, and the development of conversational AI agents. It enables the creation of lifelike spoken audio from text, the replication of human voices through custom cloning, and the deployment of real-time voice agents capable of interacting with external large language models.

The library distinguishes itself through deep integration of conversational AI capabilities, including the design of agent personas and the execution of real-time actions via APIs. It supports professional-grade audio production through a variety of specialized tools for multilingual dubbing, studio-quality music generation, and high-fidelity sound effects.

The SDK covers a broad surface of speech and media processing, including real-time audio streaming via WebSockets, speech-to-text transcription with speaker diarization, and the synchronization of audio with visual elements. It also provides utilities for monitoring generation costs and managing agent security through response guardrails and access controls.
- [simonihmig/responsive-image](https://awesome-repositories.com/repository/simonihmig-responsive-image.md) (206 ⭐) — The multi-framework JavaScript library for responsive images.
- [openhands/openhands](https://awesome-repositories.com/repository/openhands-openhands.md) (77,330 ⭐) — OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system enables agents to autonomously navigate codebases, implement features, and resolve issues through iterative reasoning and tool execution.

The platform distinguishes itself through a model-agnostic orchestrator that connects diverse language models to a unified tool registry. It supports complex, multi-agent collaboration via hierarchical task delegation, allowing parent agents to spawn and manage independent sub-agents for parallelized workflows. Security is managed through configurable action approval policies and real-time risk evaluation, ensuring that autonomous operations remain within defined safety boundaries.

The system covers a broad capability surface including persistent conversation state management, automated code review, and web research automation. It features an event-driven architecture that serializes interactions into immutable logs, facilitating observability and time-travel debugging. Developers can extend agent functionality through custom skill definitions, plugin packages, and integration with external services via standardized protocols.

The project provides a command-line interface for managing agent sessions, remote server deployments, and containerized workspace lifecycles. It is designed for extensibility, allowing users to configure agent behavior through structured objects, markdown-based definitions, and environment-specific settings.
- [open-mmlab/mmdetection](https://awesome-repositories.com/repository/open-mmlab-mmdetection.md) (32,756 ⭐) — This project is a modular research toolkit designed for developing, training, and evaluating deep learning models for object detection, segmentation, and video instance tracking. It provides a flexible training engine that manages complex neural network execution, including distributed training, custom lifecycle hooks, and weight optimization. The framework is built around a hierarchical configuration system that allows users to define architectures, data pipelines, and training hyperparameters through composable, inheritable files.

The project distinguishes itself through its highly modular architecture, which utilizes a registry-based component injection system to allow users to swap model components or implement custom modules without modifying core source code. It supports advanced workflows such as semi-supervised learning, where models are trained by combining labeled and unlabeled data through multi-branch pipelines and teacher-student weight synchronization. Additionally, the framework includes specialized utilities for video-based tracking, enabling the evaluation of algorithms that maintain object identities across frames.

Beyond its core training capabilities, the project offers a comprehensive suite for data management, model evaluation, and production deployment. It features a standardized data pipeline architecture that handles loading, augmentation, and annotation conversion for diverse computer vision datasets. The toolkit also includes diagnostic utilities for benchmarking performance, visualizing predictions, and exporting trained models into optimized formats for production inference.

The project is distributed as a Python package with comprehensive installation utilities that support environment setup and hardware-specific configuration. Documentation and verification scripts are provided to assist users in validating installations and executing inference demos.
- [reinerba/vue-responsive](https://awesome-repositories.com/repository/reinerba-vue-responsive.md) (97 ⭐) — A plugin for responsive handling with vue.js
- [fastapi/fastapi](https://awesome-repositories.com/repository/fastapi-fastapi.md) (99,260 ⭐) — FastAPI is a web framework for building APIs with Python. It leverages standard language type hints to provide automatic data validation, request parsing, and interactive API documentation generation. The framework supports asynchronous request handling and manages execution contexts to prevent blocking the main event loop.

The project includes a dependency injection system that allows for the resolution and injection of reusable components into request handlers. This system supports request-scoped caching, lifecycle management, and integration with security mechanisms like OAuth2 and JSON Web Tokens. Developers can organize applications into modular routers and mount sub-applications to manage complex routing logic.

Infrastructure features include middleware support for cross-origin resource sharing, background task management, and static file serving. The framework automatically generates OpenAPI specifications for defined endpoints, which can be customized through metadata and schema extensions. Testing utilities are provided to simulate HTTP and WebSocket connections, allowing for isolated verification of application behavior.
- [elysiajs/elysia](https://awesome-repositories.com/repository/elysiajs-elysia.md) (18,531 ⭐) — Elysia is a high-performance TypeScript web framework designed for building type-safe backend services. It provides a modular, plugin-based architecture that allows developers to compose server logic, middleware, and validation schemas into scalable application instances. By leveraging native web standards, the framework ensures portability across diverse JavaScript runtimes, including Node.js, Deno, and various edge computing environments.

The framework distinguishes itself through its focus on end-to-end type safety, automatically synchronizing request and response definitions between the server and client. It features a sophisticated plugin system that enables granular control over the request lifecycle, allowing for scoped validation, dependency injection, and shared state management. Additionally, it includes built-in support for real-time communication via WebSockets and provides automated generation of interactive API documentation directly from server routes.

Beyond its core routing and validation capabilities, the framework offers a comprehensive suite of tools for managing the request-response lifecycle, including custom payload parsing, reactive cookie management, and streaming responses. It also integrates observability features such as request tracing and performance monitoring, alongside testing utilities that allow for in-memory request simulation without requiring a live network connection.

The project is designed for flexibility in deployment, supporting everything from standard server environments to serverless and edge platforms, with options for bundling applications into portable binaries.
- [dtc7w3pq/response-attack](https://awesome-repositories.com/repository/dtc7w3pq-response-attack.md) (0 ⭐) — Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models
- [meirtz/babyblue-llm](https://awesome-repositories.com/repository/meirtz-babyblue-llm.md) (0 ⭐) — The BabyBLUE (Benchmark for Reliability and JailBreak halLUcination Evaluation) is a novel benchmark designed to assess the susceptibility of large language models (LLMs) to hallucinations and jailbreak attempts. Unlike traditional benchmarks that may misinterpret hallucinated outputs as genuine…
- [oumi-ai/oumi](https://awesome-repositories.com/repository/oumi-ai-oumi.md) (8,858 ⭐) — Oumi is a comprehensive large language model development platform designed for synthesizing data, fine-tuning models, and running performance evaluations. It serves as a unified environment for the entire model lifecycle, encompassing a training and fine-tuning suite, an evaluation framework, and tools for synthetic data generation and model distillation.

The platform is distinguished by its iterative, failure-driven synthesis approach, which analyzes model weaknesses during evaluation to generate targeted training data. It utilizes an LLM-based judge framework to programmatically score response quality and factual accuracy, and supports on-policy model distillation to transfer knowledge from teacher models to student models.

The system covers a broad range of capabilities including automated dataset preparation, parameter-efficient fine-tuning via LoRA, and cloud-agnostic job orchestration across multiple GPU providers. It also provides tools for model artifact export and local or cloud-based inference serving through an OpenAI-compatible API.

Administrative features include multi-tenant workspace isolation, role-based access control, and the use of JSON-based workflow recipes to standardize and repeat development steps.
- [fingerprintjs/fingerprintjs](https://awesome-repositories.com/repository/fingerprintjs-fingerprintjs.md) (27,334 ⭐) — Fingerprint is a visitor identification and fraud detection platform that generates persistent, unique identifiers by analyzing browser and device attributes. By extracting technical signals from the client environment, it enables reliable user tracking across sessions without relying on traditional cookies.

The platform distinguishes itself through its focus on high-accuracy identification and security-first architecture. It employs edge-side proxying to bypass ad-blockers and privacy restrictions, ensuring consistent data collection. To maintain data integrity, it uses cryptographic payload sealing and server-side verification flows, which prevent tampering by ensuring that identification data is processed securely on the backend rather than solely on the client.

Beyond core identification, the project provides a comprehensive suite for bot detection and security. It analyzes network metadata, device reputation, and behavioral patterns to identify malicious traffic, AI agents, and automated scrapers. These capabilities are supported by granular risk assessment tools, including confidence scoring and protection rulesets that allow for automated blocking of suspicious interactions.

The platform offers extensive administrative and integration features, including multi-environment resource isolation, regional data residency controls, and programmatic API management. It supports diverse deployment environments through framework-specific SDKs, mobile integration, and automated proxy infrastructure deployment.
- [swisscom/detections](https://awesome-repositories.com/repository/swisscom-detections.md) (0 ⭐) — This repo contains threat intelligence information and threat detection indicators (IOC, IOA) shared by Swisscom CSIRT.
- [promptfoo/promptfoo](https://awesome-repositories.com/repository/promptfoo-promptfoo.md) (10,529 ⭐) — Promptfoo is an evaluation framework designed for testing, benchmarking, and red-teaming language models and agentic workflows. It provides a unified environment to run prompts against multiple providers, allowing developers to systematically validate model outputs against objective assertions, semantic similarity metrics, and custom grading rubrics.

The platform distinguishes itself through a provider-agnostic execution layer and a stateful orchestrator capable of simulating multi-turn conversations and complex tool-use trajectories. It includes a dedicated adversarial mutation pipeline that automates security vulnerability scanning, enabling teams to probe for jailbreaks, prompt injections, and safety policy violations using systematic attack strategies.

Beyond core testing, the project supports comprehensive quality assurance through retrieval-augmented generation assessment, synthetic dataset generation, and prompt performance optimization. It offers extensive extensibility through a plugin-based architecture, allowing for custom logic, Python-based testing extensions, and integration with external version control and observability platforms.

The system utilizes a declarative configuration schema to manage test cases and environment settings, supporting both self-hosted and managed infrastructure deployments. Results are consolidated into structured reports with interactive visualizations to facilitate collaborative review and integration into continuous integration pipelines.
- [formbricks/formbricks](https://awesome-repositories.com/repository/formbricks-formbricks.md) (12,391 ⭐) — Formbricks is an open-source survey and feedback platform designed to help teams capture and analyze user insights through targeted, in-app, and website-based interactions. It functions as a comprehensive customer experience analytics system that allows organizations to maintain full control over their data, user attributes, and survey workflows.

The platform distinguishes itself through its event-driven architecture, which enables precise behavioral targeting by triggering surveys based on specific user actions or application events. It supports deep integration with external ecosystems by automatically synchronizing response data to CRMs, databases, and communication tools, while providing programmatic interfaces for managing resources and automating feedback loops.

Beyond core collection, the system includes advanced logic for conditional branching, scoring, and personalized routing to create adaptive survey experiences. It offers extensive customization options, including white-labeling, CSS overrides, and multi-channel distribution across web, mobile, and email environments.

The platform is built for self-hosting, supporting containerized deployments with built-in multi-tenant data isolation and enterprise-grade security features like single sign-on and role-based access control.
- [labstack/echo](https://awesome-repositories.com/repository/labstack-echo.md) (32,451 ⭐) — Echo is a high-performance, lightweight web framework for Go designed for building scalable RESTful APIs and web services. It provides a centralized environment for mapping network requests to handler functions, utilizing a fast radix-tree routing engine to ensure efficient request dispatching. The framework is built around a modular, middleware-centric pipeline that allows developers to execute reusable logic for cross-cutting concerns like authentication, logging, and security across the entire application.

What distinguishes Echo is its focus on developer productivity through structured data binding and a unified response interface. It automatically maps incoming request payloads into typed objects while validating content against defined schemas, significantly reducing manual parsing boilerplate. The framework also includes built-in support for real-time communication via WebSockets and server-sent events, alongside advanced traffic management capabilities such as rate limiting, load balancing, and reverse proxying.

The framework covers a broad surface of operational and security requirements, including automated TLS certificate management, CSRF protection, and CORS policy enforcement. It provides comprehensive utilities for request and response management, including support for streaming large data, template rendering, and graceful server shutdowns to ensure reliable service termination. Observability is integrated through distributed tracing, performance metrics export, and detailed request logging.
- [raga-ai-hub/ragaai-catalyst](https://awesome-repositories.com/repository/raga-ai-hub-ragaai-catalyst.md) (16,150 ⭐) — RagaAI-Catalyst is a suite of software implementation tools providing an SDK, dashboard, and platform for monitoring, debugging, red-teaming, and evaluating agentic AI workflows. It serves as an observability framework for tracing the execution paths of large language models and multi-agent systems.

The project distinguishes itself through a security suite for automated red-teaming and vulnerability scanning to detect biases, alongside a centralized prompt registry that decouples templates from application code. It further provides an evaluation platform that combines synthetic data generation with custom metric frameworks to quantify model accuracy and reliability.

The system covers broad operational domains including agent behavioral observability, prompt lifecycle management, and the application of output guardrails to block undesirable content. Its monitoring capabilities include trace-based execution graphing, timeline-based event sequencing, and diagnostic tools for analyzing multi-agent interaction flows.

The core functionality is delivered via a Python library for recording tool calls and decision-making processes.
- [pair-code/llm-comparator](https://awesome-repositories.com/repository/pair-code-llm-comparator.md) (528 ⭐) — LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
- [run-llama/llama_index](https://awesome-repositories.com/repository/run-llama-llama-index.md) (50,306 ⭐) — LlamaIndex is a comprehensive development framework designed to connect private or external data sources to large language models. It functions as a data-centric toolkit that enables the construction of retrieval-augmented generation systems, allowing developers to build applications that provide context-aware answers based on specific organizational information.

The project distinguishes itself through a robust agentic orchestration engine that supports the creation of autonomous agents capable of multi-step reasoning, memory management, and complex tool execution. Beyond simple retrieval, it provides a flexible, event-driven architecture for composing modular pipelines, enabling developers to chain data ingestion, transformation, and retrieval steps into sophisticated, multi-agent systems that can coordinate tasks and hand off control between individual agents.

The platform covers the entire lifecycle of language model applications, including advanced document processing for parsing and structuring complex file formats, and a diagnostic layer for observability that tracks execution traces and performance metrics. It also includes a suite of evaluation tools for measuring retrieval effectiveness and response quality, alongside mechanisms for query routing and custom post-processing to ensure high-precision information delivery.
- [flutter/flutter](https://awesome-repositories.com/repository/flutter-flutter.md) (177,056 ⭐) — This project is a multi-platform UI framework designed for building applications that target mobile, web, and desktop environments from a single codebase. It utilizes a declarative paradigm where the user interface is defined as a function of application state, supported by a layered architecture that includes a high-performance rendering engine and a multi-platform compilation model.

The framework provides a comprehensive suite of developer tools, including hot reloading for real-time code injection and diagnostic utilities for monitoring application state and performance. It features a modular component system, a constraint-based layout engine, and built-in support for navigation, localization, and accessibility. Developers can extend functionality through a native integration model that supports platform-specific APIs, foreign function interfaces, and a package management system for dependency distribution.

Beyond core UI development, the project includes infrastructure for application packaging and distribution across various app stores and web environments. It also incorporates concurrency models for background task management, security utilities for code obfuscation, and tools for integrating generative AI into the development workflow.
- [facebookresearch/llm-qat](https://awesome-repositories.com/repository/facebookresearch-llm-qat.md) (0 ⭐) — This repository contains the training code of LLM-QAT introduced in our work: "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models"