3 repos
Interfaces for processing mixed-modality inputs like images and audio.
Explore 3 awesome GitHub repositories matching artificial intelligence & ml · Multimodal Input Handlers. Refine with filters or upvote what's useful.
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering
Handles diverse input modalities including audio, video, and images within a unified content processing interface.
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task
Facilitates advanced multimodal prompting techniques to enable reasoning and analysis of video content.
Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade
Accepts documents, images, and audio files within chat conversations to provide multimodal context for prompts.