Multimodal GPT

alisawuffles/ambient

Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)

Jupyter Notebook

View on GitHub66

allenai/mmc4

953View on GitHub

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Python

View on GitHub953

AIGC-Audio/AudioGPT

10,174View on GitHub

AudioGPT is an LLM-driven audio framework and processing suite that uses large language models to orchestrate neural audio pipelines. It functions as a multimodal audio generator and processing system, integrating a collection of pretrained models to handle speech synthesis, sound generation, and audio manipulation. The system is distinguished by its ability to generate audio from diverse inputs, including text and images, and its capacity to produce synchronized talking head videos. It also operates as a neural speech translator, converting spoken language between different tongues while pre

Pythonaudiogptmusic

View on GitHub10,174

artidoro/qlora

10,929View on GitHub

This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset

Jupyter Notebook

View on GitHub10,929

open-mmlabMultimodal-GPT

Features

Open-source alternatives to Multimodal GPT

alisawuffles/ambient

allenai/mmc4

AIGC-Audio/AudioGPT

artidoro/qlora

Star history

Open-source alternatives to Multimodal GPT

alisawuffles/ambient

allenai/mmc4

AIGC-Audio/AudioGPT

artidoro/qlora