30 open-source projects similar to anthropics/defending-code-reference-harness, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Defending Code Reference Harness alternative.
Nanoclaw is an LLM agent orchestrator and multi-platform chat gateway designed to deploy and manage isolated AI agents. It provides a containerized runtime that executes agents within sandboxed Linux containers, ensuring filesystem and state isolation through dedicated workspaces and host bind-mounts. The project distinguishes itself through a unified routing pipeline that connects agents to diverse messaging platforms, including WhatsApp, Discord, Slack, Telegram, Signal, and iMessage. It integrates the Model Context Protocol to extend agent capabilities via managed external data and functio
mini-swe-agent is an autonomous software engineering system designed to develop features and fix bugs by combining large language models with a bash interface. It operates as an agentic framework that executes coding tasks and documentation updates through a continuous cycle of model reasoning and tool execution. The project differentiates itself with a strong focus on safety and evaluation, utilizing container-based sandbox execution via Docker or Singularity to isolate command execution. It includes a batch-parallel evaluation harness to measure code-fixing accuracy against standardized sof
This project provides a secure, containerized execution engine designed to run untrusted code within isolated environments. It functions as a library for integrating code interpretation into autonomous agents and intelligent assistant workflows, ensuring that host systems remain protected while enabling dynamic data processing and file manipulation. The platform distinguishes itself through a multi-backend architecture that abstracts diverse container runtimes, allowing for flexible deployment and automated backend failover. It supports interactive, multi-turn workflows by maintaining persist
osv-scanner is a software composition analysis tool and vulnerability scanner that checks project dependencies and container images against the Open Source Vulnerabilities database. It functions as a dependency remediation tool and can be integrated into custom Go applications as a programmable security library. The project distinguishes itself through a remediation workflow that includes an interactive terminal user interface and automated scripting for upgrading vulnerable packages in lockfiles and manifests. It employs call-graph reachability analysis to determine if vulnerable code is act
w3af is a web penetration testing suite and security audit framework designed to identify and exploit vulnerabilities in web applications. It functions as a vulnerability scanner that crawls targets to find injection points and a fuzzer used to discover hidden endpoints and test input validation. The project distinguishes itself by providing an intercepting HTTP proxy for capturing and modifying traffic, combined with a knowledge-base driven exploitation system. It enables the execution of security exploits to gain remote shell access and supports post-exploitation activities, such as routing
SkillSpector is a security scanner designed to detect vulnerabilities and malicious patterns in AI agent plugins and extensions before they are installed. It functions as a runtime guardrail that calculates numeric risk scores and assigns severity labels to provide installation recommendations or block risky external extensions. The project distinguishes itself by using language models to perform semantic code analysis, evaluating code intent and context to reduce false positives. It also employs fingerprint-based issue suppression to track and ignore previously accepted risks across repeated
Dangerzone is a security tool and content sanitizer that converts untrusted files into safe PDFs. It removes malicious content by rendering documents as raw pixels within a sandboxed environment and rebuilding them as new PDF files to strip executable scripts and hidden threats. The project utilizes container-based sandboxing to isolate file processing from the host operating system. It is designed for air-gapped execution, allowing the sanitization process to operate on hardware without network connectivity to prevent malware from communicating with external servers. To maintain document ut
NemoClaw is an LLM agent orchestrator and sandboxed execution environment designed to deploy and manage the lifecycles of large language model agents. It provides a secure runtime that isolates persistent agents from the underlying host system to ensure operational security. The system includes a secure LLM inference gateway that acts as a managed routing layer, securing communication between AI agents and inference engines to prevent unauthorized access. It also integrates with NVIDIA OpenShell to run specialized agents within a secure shell environment. Operational control is provided thro
Acontext is an LLM orchestration backend and agent memory framework designed to manage session state and knowledge for AI agents. It functions as a context manager and orchestration layer that integrates model providers with a secure code sandbox and a zero-knowledge data store. The project is distinguished by its approach to knowledge distillation, capturing agent learnings as reusable Markdown skills and structured memory files. It provides a secure execution environment where shell commands and scripts run in isolated containers with the ability to mount these persistent skill files direct
container-use is a containerized AI execution environment and code sandbox designed to provide a secure space for AI coding agents to execute commands and build applications. It functions as a workspace orchestrator that provisions isolated containers mapped to git branches, allowing multiple agents to operate in parallel without state conflicts or affecting the host system. The project serves as a Model Context Protocol server, bridging AI agents to containerized environments for standardized tool access. It enables a workflow for reviewing and merging changes made by agents within these iso
OpenShell is a security framework and sandboxed execution runtime for autonomous AI agents. It provides isolated environments using containers and virtual machines to protect host infrastructure and sensitive data from unauthorized access during agent execution. The system distinguishes itself by combining hardware-accelerated passthrough for host GPU access with a security gateway that intercepts model API calls. This gateway manages credentials by stripping caller information and injecting backend secrets, ensuring sensitive API keys remain off the local filesystem. The platform covers bro
Open Interpreter is an autonomous agent runtime that translates natural language instructions into executable code to interact with local software and operating systems. It functions as an orchestration framework that connects language models to a secure execution environment, enabling the development of agents capable of managing system resources and performing complex tasks. To ensure safety, the system mandates explicit user verification before executing any generated code and provides robust isolation through containerized sandboxing. The project distinguishes itself through its deep inte
This project is a secure container runtime that provides strong isolation for application workloads by implementing a userspace kernel. By intercepting system calls and executing them within a memory-safe, restricted environment, it minimizes the attack surface exposed to the host kernel. It functions as a drop-in engine for standard container orchestration platforms, ensuring compatibility with industry-standard runtime specifications while maintaining a hardened execution boundary. The runtime distinguishes itself through its ability to virtualize core system resources, including an indepen
OpenDevin is an autonomous software engineering agent and orchestrator designed to execute coding tasks and manage development workflows using large language models. It functions as a centralized control center for managing and switching between various local and cloud artificial intelligence backends. The system utilizes a Docker sandbox environment to isolate autonomous agents in containers, protecting the host filesystem during code execution. It includes an automated engineering workflow tool that integrates with version control and chat services to trigger tasks via webhooks or scheduled
vibesdk is an agentic software development platform and framework designed to coordinate autonomous agents that write, debug, and refine full-stack applications from natural language. It serves as a cloud-native application orchestrator and an LLM-powered code generation framework that converts prompts into functional code through iterative conversations and multi-phase agent behaviors. The project distinguishes itself by providing a complete toolchain for building AI development platforms. This includes the ability to integrate various model providers, construct custom LLM toolkits, and mana
This project provides secure, containerized infrastructure designed for autonomous agents, remote code execution, and cloud development. It functions as a sandboxed environment where AI agents and external processes can execute code, run shell commands, and manage files while remaining isolated from the host system. The system distinguishes itself by implementing the Model Context Protocol, allowing it to act as a standardized tool server that exposes browser and filesystem capabilities to compatible clients. It further integrates headless browser automation, enabling programmatic web navigat
XAgent is an autonomous agent system that decomposes complex goals into sequential subtasks for execution via a planner and actor model. It functions as a collaboration framework that integrates human-in-the-loop workflows, allowing users to provide real-time guidance and missing information during the automation process. The system features a containerized tool sandbox to isolate the execution of shells and browsers, ensuring system safety and consistency. It includes a state-based execution recorder that captures snapshots of agent runs to enable the exact reproduction of specific task sequ
OSV is a distributed database and aggregator of open-source security advisories that uses a standardized vulnerability schema to track security flaws. It functions as a system for collecting and normalizing security data from diverse ecosystems into a single unified format, providing a web API for querying package vulnerabilities and submitting standardized records. The project distinguishes itself through a security advisory distribution service that supports bulk dataset exports via cloud storage buckets and incremental synchronization of security record updates. It also employs sandbox-bas
DeepAnalyze is an autonomous data science agent and research pipeline designed to transform raw datasets into comprehensive analysis reports. It operates by generating and executing Python code to perform data preparation, modeling, and visualization. The system utilizes a secure, containerized execution environment to run generated scripts in isolation from the host system. It includes a benchmarking tool to evaluate the accuracy and performance of large language models against standardized data science tasks and a standardized API gateway for managing model completions and file uploads. Th
Klavis is a platform for managing Model Context Protocol (MCP) servers and providing sandboxed environments where AI agents can safely interact with external tools and services. It functions as an integration framework that orchestrates MCP server instances, exposes tools and resources for AI agents, and isolates agent interactions from production data through horizontally scalable sandbox environments. The platform distinguishes itself through its ability to generate long-horizon agentic tasks that simulate realistic tool-use workflows with live SaaS applications and production MCP servers.
Paperclip is an LLM agent orchestration platform and governance suite designed to coordinate teams of autonomous AI agents. It provides a management plane for defining organizational hierarchies, assigning roles, and aligning individual agent tasks with a structured mission tree to ensure work maps to business objectives. The project distinguishes itself through a specialized agent skill registry and workspace manager. It allows for the discovery and injection of reusable workflows into agent runtimes without retraining and provides isolated, sandboxed execution environments with persistent s
Rikkahub is an AI model aggregator and frontend interface that provides a unified platform for interacting with multiple large language model providers. It serves as a retrieval-augmented generation chat client with a provider-agnostic gateway, allowing users to switch between different models and endpoints. The platform features a character persona manager for importing structured character cards and behavior settings to define specific interaction styles. It includes a sandboxed code execution environment with a portable Linux agent for running technical scripts and commands within the chat
Dependency-Track is a software composition analysis tool and vulnerability management system designed to track dependencies and supply chain risk. It functions as a platform for ingesting and analyzing CycloneDX software bills of materials to identify known vulnerabilities and license compliance issues within third-party software components. The system distinguishes itself by mirroring external vulnerability databases locally to enable fast offline analysis and using VEX documents to differentiate between technical vulnerabilities and actual contextual risks. It also integrates with identity
ClawWork is a suite of tools designed to monitor agent finances, provide isolated execution environments, simulate economic behaviors, and benchmark performance. It functions as an autonomous agent sandbox where AI agents can run code and generate professional business deliverables. The project focuses on the financial sustainability of AI assistants through an economic simulation environment. This includes tools for tracking token expenditures and income generation, as well as simulations that analyze the trade-offs between immediate earnings and long-term skill acquisition. The system incl
CodeWhale is an AI coding agent orchestrator and development harness designed to coordinate autonomous agents that read, edit, and verify code. It provides a secure environment for AI agents to perform multi-step software engineering tasks, utilizing a sandboxed execution model to isolate shell commands and protect the host system. The system distinguishes itself by spawning multiple independent agents in parallel to handle separate investigation or implementation slices simultaneously. It employs a multi-model gateway to route requests across various cloud APIs and local servers, and utilize
dependabot-core is the automated dependency management engine that powers multi-ecosystem package updates and vulnerability remediation. It parses package manifests and lockfiles, polls package registries for newer versions, resolves version constraints across entire dependency trees, and generates pull requests with changelogs and structured descriptions. The system integrates vulnerability database matching to detect known security flaws and can automatically create remediation pull requests. What distinguishes this project is its handling of complex multi-ecosystem resolution across dozens
ClusterFuzz is an automated platform that runs coverage-guided fuzzers at scale to find security and stability bugs in software. It orchestrates libFuzzer and AFL++ across distributed clusters of worker bots, collecting coverage feedback to guide input mutation and discover crashes. The platform provides a web-based dashboard for configuring fuzzing jobs, monitoring progress, and inspecting crash reports, with role-based access control to restrict sensitive features. The system automates the full fuzzing lifecycle, from build pipeline integration and corpus management to crash triage and bug
OpenHands is an autonomous AI software engineer and coding assistant designed to execute software engineering tasks by interacting directly with codebases and development environments. It functions as a platform for running AI agents that can write code and manage files to automate complex development workflows. The system distinguishes itself through a container-based execution environment that isolates agent actions within a sandboxed Linux environment. It employs an autonomous agent loop of observation, planning, and action, supported by a standardized communication protocol that allows it
Sandbox Agent is a platform designed to manage, secure, and orchestrate autonomous coding assistants. It provides a standardized infrastructure for executing untrusted code and managing agent lifecycles within isolated, containerized environments. By decoupling agent execution from client connections, the platform ensures that session states remain persistent across process restarts and network interruptions. The project distinguishes itself through a capability-based security model that enforces granular permission checks on tool usage, ensuring that autonomous processes operate within defin