30 open-source projects similar to openai/openai-python, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Openai Python alternative.
LangChain is an orchestration framework designed for building, managing, and deploying applications powered by large language models. It provides a unified integration layer that normalizes disparate model provider APIs into a consistent set of primitives, enabling developers to build complex, multi-step AI workflows that manage state, memory, and tool execution. The project distinguishes itself through a durable execution runtime that maintains persistent state across long-running processes by checkpointing progress to external storage. It models agent workflows as directed graphs, allowing
This project is a Go library that provides a programmatic interface for interacting with generative AI services. It serves as a comprehensive software development kit for integrating large language models into applications, enabling developers to perform tasks such as text and chat completion, image generation, and audio transcription. The library distinguishes itself through a unified infrastructure designed for robust network communication and service management. It features structured request mapping and error normalization to ensure type-safe interactions and simplified debugging. Further
LiteLLM is a unified gateway and proxy server designed to centralize access to over one hundred language model providers. It provides a standardized API interface that abstracts vendor-specific schemas, allowing developers to interact with diverse models through a single, consistent format. By acting as a central traffic management layer, it enables organizations to route, secure, and govern model interactions across multiple deployments. The platform distinguishes itself through its policy-driven architecture, which uses configuration-based routing to manage traffic distribution, load balanc
Ollama provides a framework for running and managing local machine learning models. It includes a command-line interface for model lifecycle management, such as creation, embedding generation, and configuration, alongside a stable API for programmatic interaction across multiple programming languages. The platform supports the import of models and adapters in various formats, including GGUF and Safetensors. Users can define custom model behaviors, prompt templates, and system messages through a configuration file format. It also offers tools for fine-tuning models with LoRA adapters and apply
This project is a Python framework for building autonomous, event-driven agent systems. It provides a unified runtime for orchestrating multi-agent workflows, managing persistent conversation state, and executing code within secure, isolated sandbox environments. The framework is designed to handle complex task delegation, allowing agents to invoke other agents as tools while maintaining context across multi-turn interactions. The framework distinguishes itself through its deep integration with the Model Context Protocol, enabling agents to connect to external data sources and remote services
Qwen is a comprehensive framework for large language model development, serving, and deployment. It provides a complete ecosystem for transformer-based sequence modeling, offering base models alongside specialized tools for instruction-tuned alignment, fine-tuning, and long-context inference. The project is designed to support both research and production environments, enabling users to train, optimize, and host generative models locally or across distributed hardware. The framework distinguishes itself through its focus on high-performance serving and extensibility. It features a high-perfor
This repository is a comprehensive sample library providing reference implementations for automating tasks and extending functionality across Google Workspace applications. It serves as a collection of code examples and templates for building workspace automation scripts, custom add-ons, and integrated productivity tools. The project distinguishes itself by providing specialized examples for integrating large language models into productivity tools for content generation and data analysis. It also includes reference implementations for creating conversational chat apps, interactive cards, and
This project is an AI agent orchestration platform that provides a visual environment for building, testing, and deploying complex automation workflows. It functions as a low-code development interface where users can chain discrete functional blocks into dependency-aware pipelines to integrate artificial intelligence with external data and services. The platform supports the creation of intelligent conversational agents, automated business processes, and multi-service API orchestrations within a unified workspace. The platform distinguishes itself through its event-driven integration engine,
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset
Haystack is an orchestration framework designed for building complex search and generative AI pipelines. It functions as an agentic workflow engine, enabling the construction of automated sequences that allow AI agents to perform multi-step reasoning and data analysis. The framework utilizes a modular, component-based architecture that connects processing steps into directed acyclic graphs. By employing a provider-agnostic integration layer, it decouples core logic from specific external AI services and vector databases, allowing for the flexible exchange of underlying technologies. This desi
PyTorch extensions for high performance and large scale training.
ColossalAI is a distributed deep learning framework designed for training and deploying massive artificial intelligence models across clusters of hardware accelerators. It functions as a parallel computing engine that partitions model workloads and data across multiple processors to maximize memory efficiency and throughput. The platform distinguishes itself through a comprehensive suite of parallelization strategies, including multi-dimensional tensor parallelism and pipeline-based model parallelism, which segment neural network layers and stages across devices. To support large-scale genera
Chinese-Vicuna is a Chinese large language model and instruction-following AI based on the LLaMA architecture. It is specifically designed for natural language understanding and generation in the Chinese language, utilizing an instruction-tuned model to follow complex user prompts across conversations. The project provides a LoRA fine-tuning framework and quantization systems to enable model adaptation and inference on consumer hardware. It implements quantized inference to reduce memory usage on both CPUs and GPUs, supported by a low-level C++ implementation to minimize system resource requi
BigDL is a PyTorch acceleration framework and distributed inference engine designed for large language models. It provides a toolkit for running models on Intel hardware, integrating quantization tools and libraries for parameter-efficient fine-tuning. The project distinguishes itself through the use of pipeline parallelism to distribute model workloads across multiple hardware accelerators. It utilizes low-bit integer quantization and speculative decoding to reduce memory footprints and decrease text generation latency. The system covers broad capabilities in model optimization, including w
This library provides a framework for parameter-efficient fine-tuning, enabling the adaptation of large pretrained models by training only a small subset of parameters. It functions as a distributed model training system and optimization toolkit, designed to reduce the computational and memory requirements typically associated with full model fine-tuning. The project distinguishes itself through a suite of methods for modular adapter composition, including low-rank matrix decomposition and activation-based scaling. It supports the integration of multiple task-specific adapter modules, allowin
LLaMA-Factory is a comprehensive suite for dataset preparation, model fine-tuning, memory optimization, and standardized API deployment. It provides a unified platform for the supervised and reward-based fine-tuning of large language models and vision-language models. The framework includes a specialized toolkit for training vision-language models and a model serving interface that deploys trained models through high-performance APIs. It utilizes precision tuning and quantization techniques to reduce the hardware requirements and memory footprint of large models. The system covers data pipel
Accelerate is a PyTorch distributed training library that abstracts the boilerplate required to run models across multiple GPUs, TPUs, and CPUs. It functions as a deep learning model scaler and distributed hardware orchestrator, allowing the same training script to run on different hardware backends without modifying the core logic. The project provides a distributed training command line interface for configuring compute environments and launching jobs across single or multi-node clusters. It includes a mixed precision training framework to implement FP16 and BF16 precision, reducing memory
TinyLlama is a compact 1.1B parameter language model pretrained on a dataset of 3 trillion tokens. It is an edge AI model designed for high-performance text generation on memory-constrained devices. The project provides a distributed pretraining framework for training small language models across multiple GPUs and nodes. It also includes a finetuning toolkit for full-parameter weight adjustments to adapt the base model for chat and specific tasks. The system supports distributed large language model training and on-device text generation. Its architectural components include rotary positiona
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
MemGPT is a memory management framework and external memory layer for large language models. It functions as a platform for building stateful AI agents that maintain a persistent identity and continuous context across multiple sessions. The system enables agents to bypass fixed context window limitations by using a virtual context windowing approach. This allows models to manage their own memory through internal commands to search, update, and delete stored information within a hierarchical structure of short-term working context and long-term archival storage. The framework provides a local
StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 and some natural language text such as Wikipedia, Arxiv, and GitHub issues. The models use Grouped Query Attention, a context window of 16,384 tokens, with sliding window…
Dolly is an instruction-tuned large language model designed to follow complex natural language directions. It operates as a causal language model that predicts the next token in a sequence to generate coherent conversational responses and perform tasks such as brainstorming, classification, and question answering. The project focuses on the development of models using open datasets suitable for commercial application. It enables the creation of instruction-following models by utilizing curated collections of human-generated instruction-response pairs. The repository provides capabilities for
CrewAI is a multi-agent orchestration framework and autonomous agent workflow engine. It provides a system for coordinating autonomous AI agents with specific roles and goals to solve complex tasks through collaborative intelligence. The framework distinguishes itself through a collaborative AI agent system that enables multiple language model instances to share intelligence and execute multi-step objectives via role-playing. It incorporates human-in-the-loop mechanisms, allowing for manual review checkpoints to validate decisions and refine outcomes within autonomous execution paths. The pl
Run Mixtral-8x7B models in Colab or consumer desktops
Llama is a large language model runtime and inference engine designed to load and execute autoregressive transformer models. It enables the generation of natural language text completions from prompts using pretrained weights. The system features multi-GPU model parallelism, which distributes model weights and workloads across multiple graphics processors to support larger parameter counts. It also incorporates a content safety filter that uses classifiers to intercept and block unsafe inputs or outputs during the inference process. The project covers broad capabilities in distributed model
Fault-tolerant, highly scalable GPU orchestration, and a machine learning framework designed for training models with billions to trillions of parameters
This repository is a collection of frameworks and guides for Llama models, functioning as a fine-tuning framework, an inference pipeline, and an AI workflow orchestrator. It provides tools for adapting large language models to specific datasets and domains. The project includes a parameter-efficient fine-tuning toolkit that utilizes techniques like low-rank adaptation to reduce memory and compute requirements. It also serves as an implementation guide for retrieval-augmented generation, combining model inference with external data retrieval to improve response accuracy. The capability surfac
MetaGPT is an agentic workflow orchestrator and multi-agent framework designed to transform natural language requirements into complete software deliverables. It functions as an AI software engineering suite that automates the creation of technical documentation, data structures, and source code by treating natural language as a programming environment. The system distinguishes itself by assigning professional roles to large language models, creating specialized agent teams that collaborate through a shared communication structure. It utilizes standard operating procedures to convert organiza