1 repo
Systems that maintain continuous cycles of observation and action for autonomous task execution.
Distinguishing note: Focuses on the control loop mechanism rather than the underlying vision or parsing models.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Agentic Orchestration Loops. Refine with filters or upvote what's useful.
OmniParser is a multimodal interaction engine designed to function as a desktop automation agent. It interprets visual screen information to execute complex, multi-step tasks across operating system environments by bridging visual interface perception with language models. Through a continuous cycle of observation and command execution, the system grounds high-level natural language instructions into precise, coordinate-based actions. The project distinguishes itself by utilizing vision-based parsing to interact with software interfaces without requiring access to underlying application progr
Maintains a continuous cycle of screen observation and command execution to navigate through multi-step tasks.