30 open-source projects similar to zjunlp/easyedit, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best EasyEdit alternative.
ds4 is a local inference engine for DeepSeek models that includes a distributed runtime for splitting transformer layers across networked computers. It functions as a reasoning controller with a local weight streamer and an API server that streams chat completions via industry standard endpoints. The system employs a memory management model that loads model experts from disk on demand to execute models that exceed available system RAM. It provides controls for reasoning effort and model behavior steering, allowing the modification of response characteristics through activation directions. Th
Obliteratus is a weight ablation framework and refusal removal tool designed to identify and delete the internal representations responsible for content refusals in large language models without retraining. It functions as a circuit analysis suite that maps the geometric structure of model guardrails to isolate the specific layers and attention heads that enforce refusals. The project enables the removal of these behaviors through geometric projection, rank-1 adapter ablation for reversible modifications, and the application of steering vectors to alter behavior during inference. It includes
BERTopic is a topic modeling library used to extract interpretable themes from collections of text documents and images. It functions as a document clustering framework that transforms unstructured data into numerical vectors to group semantically similar content. The project distinguishes itself through a multimodal embedding tool that allows for joint clustering of text and images in a shared vector space. It also features a class-based TF-IDF representation engine to identify representative words for clusters and an integrated system for using large language models to generate natural lang
Chinese-CLIP is a multimodal framework and vision-language model designed for cross-modal retrieval and representation generation using Chinese text and images. It employs a contrastive learning architecture to map visual and textual data into a shared vector space for similarity calculations. The system enables bidirectional search, allowing for text-to-image and image-to-text retrieval. It also provides zero-shot image classification, which identifies objects within images without requiring task-specific training. The project includes tools for fine-tuning pre-trained models on specialized
This repository provides tools and methodologies for studying adversarial attacks on large language models. It focuses on understanding how carefully crafted inputs can manipulate or bypass the safety mechanisms of LLMs, enabling researchers to probe model vulnerabilities and improve their robustness. The project covers techniques for generating adversarial prompts, evaluating model responses under attack conditions, and analyzing the effectiveness of different attack strategies.
This project is a standardized set of abstraction and reasoning problems designed for benchmarking the ability of artificial intelligence models to learn new rules. It functions as a fluid intelligence test and a reasoning benchmark, utilizing a collection of grid-based puzzles and a program synthesis dataset to evaluate how agents generate algorithms from examples. The project focuses on measuring general fluid intelligence and the capacity for zero-shot generalization, testing whether a system can apply learned logic to unseen problems without relying on task-specific training. It provides
OpenCompass is a comprehensive evaluation platform, benchmarking suite, and distributed model evaluator designed to measure the performance and accuracy of large language models. It provides a framework for benchmarking both open-source and API-based models against diverse datasets using standardized metrics and reproducible pipelines. The project features an automated judging framework that uses language models as judges to score and verify the quality of generated text. It includes a performance leaderboard system for comparing the relative capabilities of various models across industry-sta
This project is a large language model persona simulation framework designed to distill individuals into AI personas using professional data and interpersonal context. It functions as a personal knowledge base ingestor and agent configuration manager, allowing for the creation of digital twins that reproduce a specific person's mental models, speaking styles, and professional workflows. The system utilizes a dual-model persona architecture that separates professional work skills from interpersonal personality traits. It distinguishes itself through a multimodal persona generator capable of pr
Giskard is an AI quality assurance suite and evaluation framework designed to measure the performance, bias, and security risks of large language models and AI agents. It functions as a vulnerability scanner to detect security flaws and performance regressions. The project provides automated red-teaming and adversarial testing workflows. These tools generate prompt-injection probes and adversarial attacks based on system descriptions to identify security gaps and vulnerabilities. The platform covers AI agent auditing and RAG quality validation, using knowledge-base grounding and synthetic da
OpenCompass is an open-source framework for standardized benchmarking of large language models. It provides a configurable evaluation pipeline that supports both objective and subjective assessment, using a dual-engine architecture to handle closed-form answer comparison and open-ended response rating. The framework is designed as a modular platform where datasets, models, and metrics are composed through declarative YAML configuration files. The framework distinguishes itself through its extensible model integration layer, which supports custom models, HuggingFace models, and third-party API
OpenViking is a multi-tenant context server and knowledge base administration system designed to provide AI agents with persistent long-term memory. It enables the indexing of diverse documents and codebases to support retrieval-augmented generation, allowing agents to recall past interactions, user preferences, and learned experiences across sessions. The project is distinguished by its use of a URI-based virtual filesystem to organize memories, resources, and skills. It implements a tiered context loading system that balances retrieval precision with token budgets by structuring data into a
Open CLIP is an open source framework for training and deploying Contrastive Language-Image Pre-training models. It serves as a vision-language training framework and multimodal embedding engine that maps images and text into a shared vector space for similarity searches and zero-shot classification. The project provides a toolkit for distributed training of contrastive models and includes an image-to-text generative model for producing natural language descriptions. It supports custom text encoder integration and utilizes teacher-student model distillation to transfer knowledge from large pr
DeepAnalyze is an autonomous data science agent and research pipeline designed to transform raw datasets into comprehensive analysis reports. It operates by generating and executing Python code to perform data preparation, modeling, and visualization. The system utilizes a secure, containerized execution environment to run generated scripts in isolation from the host system. It includes a benchmarking tool to evaluate the accuracy and performance of large language models against standardized data science tasks and a standardized API gateway for managing model completions and file uploads. Th
Poml is a prompt management framework and templating engine designed for authoring, versioning, and rendering structured prompts for large language models. It uses a semantic markup language to organize prompts into reusable templates, combining them with dynamic context and data to generate formatted inputs. The system distinguishes itself by decoupling core prompt logic from final presentation through a stylesheet-based approach. It provides a dedicated JSON schema output generator to enforce strict, machine-parsable model responses and a configuration interface for managing function tool s
This project provides a Chinese large language model based on the LLaMA architecture. It is an instruction-tuned model optimized for natural language processing and multi-turn conversations in Chinese. The system includes a framework for parameter-efficient fine-tuning using low-rank adaptation and quantization to reduce memory requirements. It also implements retrieval augmented generation for local document question answering and supports long-context processing for sequences up to 64K tokens. The project covers a broad set of capabilities including supervised instruction tuning, reinforce
The Kaggle API command line interface is a suite of utilities for managing datasets, machine learning models, and competition entries from a terminal. It functions as a command line wrapper that translates user input into API calls to control remote cloud resources. The project differentiates itself by providing specialized tools for automating the execution of notebook kernels and managing the lifecycle of machine learning models, including version iteration and performance tracking. It also includes a utility for executing evaluation tasks against large language models and downloading the r
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
This project is an alignment framework and suite of pipelines for training language models using supervised fine-tuning and preference optimization. It provides tools for executing large-scale distributed training across multiple GPUs and compute nodes, alongside a system for measuring model helpfulness and dialogue quality through single-turn and multi-turn benchmarks. The framework includes specialized tools for direct preference optimization to refine model behavior using paired data without a separate reward model. It also supports constitutional AI alignment and the training of reward mo
mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified set of tools, prompts, and resources. The project distinguishes itself through a multi-protocol translation bridge that converts communication between standard I/O, SSE, gRPC, and REST to enable interoperability between disparate tool servers. It includes a comprehensive LLM evaluation framework for
Open Llama is an open source large language model and pre-trained transformer designed as a permissively licensed alternative to proprietary weights. It serves as a base model reproduction of the Llama architecture, providing a set of weights for a decoder-only transformer. The project provides a transparently trained model based on the RedPajama dataset, supporting unrestricted commercial and research use. It includes systems for serving pre-trained weights in various sizes. The project covers natural language processing research and performance benchmarking through text quality evaluation
NeMo is a comprehensive framework designed for the development, training, and deployment of large-scale conversational and generative artificial intelligence models. It provides an integrated platform for building multimodal systems, encompassing speech processing, language modeling, and reinforcement learning alignment. The framework is built to handle the entire lifecycle of AI development, from data curation and model pretraining to production-ready service deployment. The platform distinguishes itself through advanced distributed training capabilities, including tensor and pipeline parall
This project is a development platform for managing the lifecycle of generative artificial intelligence models. It provides a unified environment for accessing, fine-tuning, and deploying large language models, serving as an orchestrator that handles the integration of diverse models into custom applications. The platform distinguishes itself by offering a managed infrastructure for hosting and scaling models, which removes the requirement for manual server maintenance or configuration. It includes integrated tools for supervised fine-tuning and vector embedding optimization, allowing for the
L1B3RT4S is an adversarial machine learning toolkit designed for red teaming and evaluating the robustness of large language models. It provides a research framework for investigating how safety alignment mechanisms and content moderation systems respond to sophisticated input strategies. The project focuses on identifying vulnerabilities in model guardrails by employing techniques such as adversarial narrative framing, dynamic context injection, and latent space steering. It utilizes multi-agent prompt decomposition and recursive text transformation to analyze how structural changes to input
This repository is a collection of specialized toolsets and libraries for large language model prompt engineering and security testing. It provides a library of advanced templates and frameworks designed to optimize the quality and specificity of model responses. The project includes resources for red teaming and security research, featuring a repository of prompts designed to bypass safety filters and operational constraints. It also provides techniques for system prompt extraction to reveal the internal instructions and configurations of AI personas. The collection covers a broader surface
Llama 3 is a collection of pretrained, autoregressive transformer-based models designed for natural language generation, reasoning, and complex instruction following. It functions as a generative AI framework that provides the infrastructure for managing model weights, executing neural network inference, and handling computational workloads across diverse knowledge domains. The project distinguishes itself through an integrated AI safety toolkit that employs secondary classification filtering to inspect inputs and outputs, ensuring adherence to usage compliance and safety standards. It suppor
This project is a comprehensive library of structured system prompts and configuration templates designed to define the behavior, persona, and operational boundaries of autonomous artificial intelligence agents. It serves as a framework for prompt engineering, providing modular instructions that help models parse complex tasks, maintain consistent interaction tones, and adhere to specific domain constraints. The repository distinguishes itself by offering specialized configurations for agent safety and security, including protocols to prevent prompt injection and unauthorized data access. It
Wandb is a centralized platform for machine learning experiment tracking, model registry management, and workflow orchestration. It provides a comprehensive suite of tools for logging, visualizing, and versioning training metrics, model artifacts, and hyperparameter sweeps to ensure reproducibility across development cycles. The platform also functions as an observability tool for large language model applications, enabling the tracing of execution steps, token usage, and reasoning processes. The project distinguishes itself through its event-driven automation capabilities, which allow users
This repository provides a collection of reference implementations and code examples for training and deploying machine learning models using the MLX framework. It serves as a practical guide for executing distributed training, fine-tuning large language models, converting model weights, and implementing multimodal generative workflows. The project distinguishes itself through specialized examples for local hardware execution, featuring weight quantization to reduce memory usage and low-rank adaptation for parameter-efficient fine-tuning. It also includes scripts for transforming external mod