awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Multimodal Processing Tools · Awesome GitHub Repositories

7 repos

Awesome GitHub RepositoriesMultimodal Processing Tools

Systems for ingesting and synthesizing non-textual data types, including vision, audio, and speech, within AI pipelines.

Explore 7 awesome GitHub repositories matching artificial intelligence & ml · Multimodal Processing Tools. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Artificial Intelligence
  4. Multimodal Processing Tools

Awesome Multimodal Processing Tools GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • josephmisiti/awesome-machine-learning

    josephmisiti/awesome-machine-learning

    71,702GitHubView on GitHub↗

    This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco

    Python
  • OpenHands/OpenHands

    OpenHands/OpenHands

    67,974GitHubView on GitHub↗

    OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system

    Pythonagentartificial-intelligencechatgpt
  • xtekky/gpt4free

    xtekky/gpt4free

    65,720GitHubView on GitHub↗

    This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, co

    Pythonchatbotchatbotschatgpt
  • CorentinJ/Real-Time-Voice-Cloning

    CorentinJ/Real-Time-Voice-Cloning

    59,355GitHubView on GitHub↗

    This project is a neural text-to-speech engine and voice cloning toolkit designed to generate synthetic speech that mimics the vocal characteristics of a target speaker. It functions as a real-time audio synthesizer, utilizing a deep learning pipeline to convert written text into high-fidelity speech output with minima

    Pythondeep-learningpythonpytorch
  • AntonOsika/gpt-engineer

    AntonOsika/gpt-engineer

    55,201GitHubView on GitHub↗

    GPT-Engineer is an autonomous agent and framework designed for AI-assisted software development. It functions as a generative codebase architect that translates natural language requirements into complete, functional software projects by reading and writing files directly to the local file system. The platform disting

    Pythonaiautonomous-agentcode-generation
  • RVC-Boss/GPT-SoVITS

    RVC-Boss/GPT-SoVITS

    55,111GitHubView on GitHub↗

    GPT-SoVITS is a text-to-speech synthesis engine and voice cloning toolkit designed for generating natural-sounding human speech. It functions as a neural audio processing pipeline that maps input text to high-fidelity audio waveforms, utilizing conditional variational autoencoders and flow-based decoders to ensure expr

    Pythontext-to-speechttsvits
  • appwrite/appwrite

    appwrite/appwrite

    54,884GitHubView on GitHub↗

    Appwrite is a backend-as-a-service platform that provides a unified development environment for building full-stack applications. It integrates essential infrastructure components—including authentication, databases, storage, and serverless functions—into a single, centralized interface to simplify application developm

    TypeScriptandroidappwritebackend

Explore sub-tags

  • Multi-Modal Input ProcessorsSystems that ingest and normalize diverse data types, such as text, images, and audio, for model processing.
  • Multimodal AI ApplicationsApplications that integrate multiple sensory inputs to perform complex tasks like image captioning or video analysis.
  • Multimodal Vision InputsTools that process and interpret visual data, such as photos or video streams, for AI-driven insights.
  • Speech RecognitionTools and toolkits designed to process and convert spoken audio input into machine-readable text.
  • Synthetic Speech GenerationSystems that generate natural-sounding synthetic speech by replicating vocal characteristics and cadence from text input.