awesome-repositories.comBlog
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPBlogSitemapPrivacyTerms
Multimodal Interaction Engines · Awesome GitHub Repositories

1 repo

Awesome GitHub RepositoriesMultimodal Interaction Engines

Processing layers that bridge visual perception with language models for coordinate-based action.

Distinguishing note: Focuses on the engine layer that integrates vision and language for spatial grounding.

Explore 1 awesome GitHub repository matching artificial intelligence & ml · Multimodal Interaction Engines. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Multimodal Interaction Engines

Awesome Multimodal Interaction Engines GitHub Repositories

Describe the repository you're looking for…
Find the best repos with AI.We'll search the best matching repositories with AI.
  • microsoft/OmniParser

    microsoft/OmniParser

    24,377View on GitHub↗

    OmniParser is a multimodal interaction engine designed to function as a desktop automation agent. It interprets visual screen information to execute complex, multi-step tasks across operating system environments by bridging visual interface perception with language models. Through a continuous cycle of observation and command execution, the system grounds high-level natural language instructions into precise, coordinate-based actions. The project distinguishes itself by utilizing vision-based parsing to interact with software interfaces without requiring access to underlying application progr

    Bridges visual interface perception with language models to ground high-level instructions into precise coordinate-based actions.

    Jupyter Notebook
    24,377View on GitHub↗