1 repo
Automated workflows for refining models using instruction-based datasets.
Distinguishing note: Focuses on multi-modal command following rather than generic language fine-tuning.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Instruction Tuning Pipelines. Refine with filters or upvote what's useful.
LLaVA is a multimodal large language model architecture designed to process and interpret both image and text inputs to generate natural language responses. It functions as a research-oriented platform for visual instruction tuning, providing a framework to align language models with human intent through training on diverse datasets of paired images and text queries. The system distinguishes itself through a specialized vision-language training pipeline that connects visual data to language models using projection layers and instruction-based fine-tuning. It supports distributed inference by
Refines language models using curated datasets of image-text pairs to improve multi-modal command following.