Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
AudioGPT is an LLM-driven audio framework and processing suite that uses large language models to orchestrate neural audio pipelines. It functions as a multimodal audio generator and processing system, integrating a collection of pretrained models to handle speech synthesis, sound generation, and audio manipulation. The system is distinguished by its ability to generate audio from diverse inputs, including text and images, and its capacity to produce synchronized talking head videos. It also operates as a neural speech translator, converting spoken language between different tongues while pre
This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset