Minimind V | Awesome Repository

Features

Training Frameworks - Provides an open-source framework for building and fine-tuning small vision-language models.
Image-Text Prompt Inferences - Replaces image placeholder tokens in a text prompt with projected visual features to generate responses.
Image-Text Prompt Inferences - Generates descriptive or conversational responses from image-text prompts by replacing image placeholder tokens.