30 open-source projects similar to basedhardware/openglass, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best OpenGlass alternative.
Claude-hud is a heads-up display for monitoring AI agent activity, LLM session metrics, and project environment states. It provides a real-time visual interface to track token usage, context window health, API rate limits, and the activity of active sub-agents and tools. The project distinguishes itself by parsing session transcripts to extract task progress and tool execution status, converting todo lists into visual completion counters. It also includes a project configuration auditor that visualizes the number of active rules, hooks, and servers defined in the environment. Beyond agent ob
Omi is an open-source wearable AI platform that captures audio and screen data to provide real-time conversational assistance and memory. It integrates a wearable hardware development kit with a vector memory database and large language model capabilities to create a persistent digital record of user interactions. The platform is distinguished by its BLE audio streaming pipeline, which transmits raw audio from wearable hardware for real-time transcription and speaker identification. It utilizes a plugin-based agent tool framework that allows AI assistants to autonomously invoke custom functio
Mimiclaw is a framework for integrating large language models with microcontroller hardware to create interactive AI agents. It provides an embedded AI agent runtime and a tool-calling engine that allows language model loops to execute on embedded devices. The system acts as a bridge between language model APIs and physical hardware peripherals, enabling the control of sensors and actuators through natural language. The project features a dedicated manager for over-the-air firmware updates, allowing system images to be installed via web browsers or wireless networks to remove local toolchain
TaskMatrix is a visual language model orchestration framework and modular visual pipeline designed to coordinate disparate foundation models. It functions as a multi-model workflow coordinator that sequences visual and textual models through logic paths to handle image processing tasks without requiring additional training. The system integrates large language models with visual foundation models to enable the exchange of image data during interactive chat sessions. It utilizes template-based orchestration to chain specialized models together for complex visual tasks. The framework supports
LLaMA-Adapter is a parameter-efficient fine-tuning framework designed to adapt large language models using a minimal set of trainable parameters. It functions as an instruction tuning tool and a multimodal adapter, allowing pre-trained models to follow human instructions and process non-textual data. The project specializes in the integration of image, video, audio, and sensor data into language models for cross-modal understanding. It enables the customization of LLaMA models through the use of lightweight adapters, which allows for the extraction and storage of learned weights independently
Geekai is a multi-model AI platform and SaaS framework designed to deploy and manage AI agents and multimodal models through a unified interface. It serves as a multimodal AI gateway, providing centralized access to large language models and generative tools for text, image, audio, and video production. The project functions as an AI agent orchestrator, allowing for the definition of specialized personas and the import of external workflows and knowledge bases. It distinguishes itself by providing a complete commercial service layer, including credit-based billing, subscription management, an
RT-DETR is a real-time object detection model based on the detection transformer architecture. It is implemented as a computer vision model for both the PyTorch and PaddlePaddle deep learning platforms, designed to identify and locate multiple objects in images and video streams. The model eliminates the need for anchor generation and non-maximum suppression by utilizing a transformer-based approach. It focuses on high-performance detection, balancing precision and low latency for live environment deployment. The system employs a hybrid encoder and multi-scale feature fusion to extract globa
This project is an automated image translation system and pipeline specifically optimized for manga and comics. It provides a sequence of text detection, machine translation, and typesetting, and is available as an image translation API, a command-line tool for batch processing, and an LLM-powered translator. The system utilizes OCR to detect text regions and an inpainter to remove original content by synthesizing background pixels. Translated text is then overlaid using an automated typesetter that manages font sizes, colors, and reading directions based on the original coordinates. The wor
This project is a PyTorch implementation of the YOLOv4 object detection framework. It provides a system for training and deploying neural networks that identify and locate multiple objects within images and video streams. The framework includes tools for converting trained weights into universal formats and hardware-specific optimized engines, specifically supporting ONNX and TensorRT. It features a TensorRT inference optimizer to reduce latency and increase throughput, as well as a model architecture compatible with NVIDIA DeepStream streaming analytics pipelines. The system covers model tr
This project is a PyTorch implementation of the EfficientDet architecture designed for real-time object detection. It provides a neural network and inference engine capable of identifying and locating multiple objects within images or video streams. The implementation includes pretrained computer vision models with optimized weights, enabling immediate inference and fine-tuning without the need for training from scratch. The project covers the full pipeline for computer vision model optimization, including custom object detection training and model weight optimization. It incorporates struct
This project is a PyTorch implementation of the YOLOv3 object detection architecture. It functions as a real-time object detector and computer vision framework designed to identify and locate multiple objects within images using bounding boxes and class labels. The system allows for both the use of pretrained weights for immediate image analysis and the training of custom models using datasets with bounding box annotations. It provides a programmatic interface to integrate detection capabilities directly into other software applications. The framework includes tools for model evaluation to m
YOLOv9 is a real-time computer vision framework and deep learning model designed for image classification, object detection, and instance segmentation. It functions as both a vision model and a trainer, allowing for the optimization of neural network weights on custom datasets using single or multiple GPUs. The framework utilizes programmable gradient information to perform high-speed identification and location of multiple objects within images and video streams. It extends beyond bounding box detection to provide instance segmentation and panoptic segmentation, which labels every pixel in a
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
YOLOv7 is a PyTorch vision library and real-time inference engine designed for object detection, human pose estimation, and instance segmentation. It provides a framework for detecting and locating multiple objects within images or video streams using neural networks. The system includes tools for custom model training and fine-tuning, allowing pre-trained weights to be adapted to specialized datasets via transfer learning. It also supports model weight export and format conversion to facilitate deployment on production servers and embedded edge devices.
Darknet is a high-performance C-based inference engine and computer vision library designed for real-time object identification and localization. It serves as a neural network framework for training and deploying detection models using the YOLO architecture, providing a toolset for deep learning training and deployment. The project differentiates itself through a C and CUDA implementation that enables hardware acceleration for matrix multiplication and inference speed optimization. It provides a shared library interface for embedding detection capabilities into external applications and suppo
Dango-Translator is an OCR translation system and multi-engine translation client designed to extract text from images or screens and replace it with translated content. It functions as an image text translator and real-time screen translator, utilizing optical character recognition to convert text between different languages automatically. The software distinguishes itself through coordinate-based image typesetting and a glossary manager. These tools allow for the replacement of original image content with translated text in the same area and the use of specialized dictionaries to ensure con
YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning to high-speed inference and deployment. The framework utilizes a modular neural architecture, allowing users to swap backbone and head components to tailor models for specific visual tasks. What distinguishes this project is its focus on production-ready deployment and model ef
Betaflight is open-source flight controller firmware designed to stabilize aircraft and manage sensor data through PID loops and motor control. It serves as a multirotor flight stack that integrates low-level drivers and control algorithms to manage electronic speed controllers, radio receivers, and telemetry hardware. The system focuses on real-time flight stabilization and telemetry routing. It includes a PID stabilization system to calculate motor outputs for aircraft stability and a flight telemetry manager to route real-time information to ground stations and on-screen displays. The fir
This project is an AI-powered screenshot manager and visual assistant designed for capturing screen content and processing it through large language models. It functions as an OCR translation application and screen annotation tool, allowing users to extract text from images and perform intelligent analysis of visual data. The software differentiates itself through an AI-driven OCR pipeline and the ability to convert screenshots into structured Markdown or HTML via layout-aware document transformation. It features a visual AI assistant capable of analyzing screen content and a prompt-engineere
MIRIX is an AI agent state orchestrator and long-term memory system designed to provide persistent context for large language models. It functions as a multi-modal AI memory pipeline that processes text, voice, and screen captures into structured knowledge stores, including a dedicated screen activity knowledge base. The project distinguishes itself by integrating a multi-modal observation pipeline that monitors desktop activity in real-time to build a searchable history of user actions. It utilizes a multi-tiered memory hierarchy—separating episodic, semantic, procedural, and core stores—and
This project is a TensorFlow object detection framework designed for training and deploying Single Shot MultiBox Detector models. It provides a neural network training toolkit for implementing the SSD architecture to achieve real-time image and video object localization. The framework includes a dedicated data pipeline for transforming object detection datasets into binary record formats to increase training speed and performance. It also features utilities for converting model weights between different checkpoint formats to facilitate the reuse of pre-trained networks. The system covers a b
This project is a multi-object tracking framework designed to assign persistent identities to detected bounding boxes across consecutive video frames. It functions as a computer vision tracking algorithm that monitors multiple moving targets in real time by associating detections with consistent labels. The system utilizes a state estimation approach centered on a Kalman filter to predict future object positions and maintain identity during detection gaps. It employs the Hungarian algorithm for optimal data association and calculates intersection over union to match predicted track locations
Scrypted is a video integration platform that connects IP cameras and NVRs into smart home ecosystems such as HomeKit, Google Home, Alexa, and Home Assistant. It functions as both an NVR software for recording and playing back continuous video footage, and an object detection engine that analyzes live camera feeds to identify motion, people, vehicles, and other objects. The platform distinguishes itself through its ability to transcode and forward live camera streams to multiple smart home platforms simultaneously, enabling unified viewing and control across all connected devices. It includes
ms-agent is an LLM agent framework and multi-agent orchestration system designed to build autonomous entities that combine large language models with tool calling and structured workflows. It serves as a tool integration platform and workflow engine for executing complex tasks through the coordination of specialized agents. The project distinguishes itself through a multimodal agent workflow engine capable of automating the production of text, images, and video. It features a sandboxed code execution environment for running generated code and quantitative data analysis in isolated containers,
tracking.js is a browser computer vision library written in JavaScript for performing real-time image analysis and object tracking directly within a web browser. It functions as a real-time object tracker, a color tracking tool, and a face detection utility. The library enables the detection and monitoring of specific color ranges, human faces, and known visual patterns across consecutive video frames. It extracts visual features and descriptors from images to identify distinct landmarks for matching and tracking. The project covers broad computer vision capabilities, including the ability t
This project is a cross-platform mobile camera framework and real-time computer vision library. It provides a high-performance interface for mobile applications to handle hardware control, media capture, and live camera frame processing. The framework includes a dedicated system for running AI models and custom analysis on live camera streams using high-performance worklets. It also functions as a real-time detection and decoding system for QR codes and barcodes. Broad capabilities cover the capture of high-resolution photos and videos with controls for zoom, HDR, and frame rates. The projec
chat2api is a web-to-API bridge and proxy that converts web-based chat sessions into a standardized API format. This allows web accounts to be used programmatically within third-party client applications. The system includes a multi-account rotator that distributes requests across a pool of authentication tokens using random or sequential polling to bypass rate limits. It also functions as a multimodal API proxy, translating base64 or URL-encoded images and files into formats compatible with web-based chat interfaces. The project manages the full lifecycle of session tokens, featuring statef
This is a backend-as-a-service SDK that connects web and mobile applications to a suite of cloud services. It provides a unified interface for managing user identity, executing serverless logic, and handling cloud object storage. The toolkit is characterized by its real-time data synchronization, which allows NoSQL document data to stay consistent across multiple clients with built-in offline persistence. It facilitates secure user access through a variety of identity providers and manages serverless function invocation to execute backend logic in response to HTTPS requests or database events
BallonsTranslator is a software suite designed for extracting, translating, and replacing text within comic panels while preserving the original visual layout. It functions as an image translation tool that combines text region detection, optical character recognition, and deep learning inpainting to automate the localization of comics. The tool features a deep learning image inpainter that removes original text and restores backgrounds using generative neural networks and patch-matching algorithms. It also includes a rich-text translation editor for modifying translated dialogue with support
FairMOT is a multi-object tracking framework and deep learning model designed to identify and track multiple entities across video frames. It implements a unified pipeline that integrates object detection and identity re-identification into a single-stage joint network. The system utilizes an anchor-free detection method to predict object centers and bounding box dimensions. It maintains identity consistency across consecutive frames by generating high-dimensional embedding vectors for re-identification and employing a Kalman filter for motion state prediction. The framework covers a broad r