MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset
AudioGPT is an LLM-driven audio framework and processing suite that uses large language models to orchestrate neural audio pipelines. It functions as a multimodal audio generator and processing system, integrating a collection of pretrained models to handle speech synthesis, sound generation, and audio manipulation. The system is distinguished by its ability to generate audio from diverse inputs, including text and images, and its capacity to produce synchronized talking head videos. It also operates as a neural speech translator, converting spoken language between different tongues while pre
LiteLLM is a unified gateway and proxy server designed to centralize access to over one hundred language model providers. It provides a standardized API interface that abstracts vendor-specific schemas, allowing developers to interact with diverse models through a single, consistent format. By acting as a central traffic management layer, it enables organizations to route, secure, and govern model interactions across multiple deployments. The platform distinguishes itself through its policy-driven architecture, which uses configuration-based routing to manage traffic distribution, load balanc