Mixtral Offloading

Open-source alternatives to Mixtral Offloading

Similar open-source projects, ranked by how many features they share with Mixtral Offloading.

berriai/litellm
BerriAI/litellm
50,579View on GitHub
LiteLLM is a unified gateway and proxy server designed to centralize access to over one hundred language model providers. It provides a standardized API interface that abstracts vendor-specific schemas, allowing developers to interact with diverse models through a single, consistent format. By acting as a central traffic management layer, it enables organizations to route, secure, and govern model interactions across multiple deployments. The platform distinguishes itself through its policy-driven architecture, which uses configuration-based routing to manage traffic distribution, load balanc
Pythonai-gatewayanthropicazure-openai
View on GitHub50,579
bigcode-project/starcoder2
bigcode-project/starcoder2
2,075View on GitHub
StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 and some natural language text such as Wikipedia, Arxiv, and GitHub issues. The models use Grouped Query Attention, a context window of 16,384 tokens, with sliding window…
Python
View on GitHub2,075
artidoro/qlora
artidoro/qlora
10,929View on GitHub
This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset
Jupyter Notebook
View on GitHub10,929
bigscience-workshop/petals
bigscience-workshop/petals
10,208View on GitHub
Petals is a decentralized framework and inference engine for running large language models across a peer-to-peer network. It enables the execution of models that exceed the memory of any single machine by splitting computations and model layers across a collaborative swarm of GPUs. The system functions as a collaborative compute network where participants share local GPU resources and host model weights. It supports distributed prompt-tuning to adapt massive models to specific tasks and allows for the establishment of private compute swarms to process sensitive data within restricted, trusted
Python
View on GitHub10,208

See all 30 alternatives to Mixtral Offloading

dvmazurmixtral-offloading

Features

Open-source alternatives to Mixtral Offloading

BerriAI/litellm

bigcode-project/starcoder2

artidoro/qlora

bigscience-workshop/petals

Star history

Open-source alternatives to Mixtral Offloading

BerriAI/litellm

bigcode-project/starcoder2

artidoro/qlora

bigscience-workshop/petals