awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Multimodal Perception Models · Awesome GitHub Repositories

2 repos

Awesome GitHub RepositoriesMultimodal Perception Models

Models designed to interpret and analyze visual data, charts, or cross-modal inputs alongside text.

Explore 2 awesome GitHub repositories matching artificial intelligence & ml · Multimodal Perception Models. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Artificial Intelligence Models
  4. Multimodal Perception Models

Awesome Multimodal Perception Models GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • abi/screenshot-to-code

    abi/screenshot-to-code

    71,707GitHubView on GitHub↗

    This project is an artificial intelligence-powered frontend generator that translates visual design inputs into functional source code. It functions as a workflow engine that interprets graphical user interfaces, mapping layout structures and styling rules to structured markup and programming language syntax. The tool

    TypeScript
  • dair-ai/Prompt-Engineering-Guide

    dair-ai/Prompt-Engineering-Guide

    70,526GitHubView on GitHub↗

    This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task

    MDXagentagentsai-agents

Explore sub-tags

  • Chart Understanding ModelsVision-language models designed to interpret, extract data from, and analyze visual information presented in charts and graphs.
  • Multimodal AI ModelsMachine learning models capable of processing and synthesizing information across multiple data types, including text, images, and audio.
  • Multimodal Vision ModelsNeural networks capable of processing and interpreting visual inputs alongside other data modalities.