6 dépôts
Systems that extract information from combined visual and temporal data sources.
Explore 6 awesome GitHub repositories matching data & databases · Multi-Modal Data Processors. Refine with filters or upvote what's useful.
This project is an artificial intelligence-powered frontend generator that translates visual design inputs into functional source code. It functions as a workflow engine that interprets graphical user interfaces, mapping layout structures and styling rules to structured markup and programming language syntax. The tool distinguishes itself by supporting both static design mockups and dynamic video recordings. It processes temporal and spatial information from screen captures to reconstruct interaction flows and state transitions, enabling the creation of functional software prototypes from vis
Extracts temporal and spatial information from video recordings to reconstruct interaction flows and dynamic UI states in generated code.
The Model Context Protocol SDK is a framework for building clients and servers that connect AI models to external data, tools, and resources using a standardized communication protocol. It provides the foundational libraries and interfaces necessary to establish reliable, transport-agnostic connections between AI agents and external systems, enabling seamless information retrieval and task automation. The SDK distinguishes itself through a robust capability negotiation handshake that ensures compatibility between connected parties before exchanging messages. It supports a pluggable transport
Automatically handles binary data, file paths, and format detection for multi-modal tool communication.
MMDetection3D is an open-source toolbox for 3D perception, providing a unified framework for detecting and segmenting objects in three-dimensional environments. It supports a range of core tasks including monocular 3D object detection from single camera images, LiDAR-based 3D object detection from raw point clouds, and multi-modal fusion that combines camera images with LiDAR data. The toolbox also covers point cloud semantic segmentation, assigning class labels to every point in a scan for scene understanding. The project distinguishes itself through a config-driven pipeline that orchestrate
Loads and synchronizes point clouds, camera images, and calibration data from multiple sweeps into a unified input.
SpatialLM est un framework de modélisation spatiale qui utilise des modèles de langage (LLM) pour transformer des vidéos monoculaires et des données de capteurs en cartes sémantiques intérieures structurées. Il fonctionne comme un système d'estimation de disposition intérieure et un analyseur sémantique de nuages de points, convertissant des données géométriques brutes en représentations d'éléments architecturaux et de catégories d'objets. Le projet aligne des entrées de capteurs multimodaux avec des jetons linguistiques, permettant à un modèle de langage de servir de moteur de raisonnement pour déduire la topologie d'une pièce. Il utilise des mécanismes pour convertir des nuages de points 3D et des séquences d'images 2D en jetons discrets et en encodages spatiaux structurés, qui sont ensuite décodés en plans architecturaux. Le framework couvre l'analyse de scènes 3D et la détection d'objets pour identifier le mobilier via des boîtes englobantes et des étiquettes sémantiques. Il fournit également des outils pour la compréhension environnementale des robots, traitant les données des capteurs pour créer des cartes sémantiques destinées à la navigation autonome.
Integrates sensor-derived geometric data with linguistic tokens into a unified spatial representation.
This project is a technical reference guide and sensor-based robotics manual focused on the theoretical foundations and practical implementation of Simultaneous Localization and Mapping. It serves as a knowledge base for spatial AI, covering the integration of deep learning and semantic rendering to create intelligent systems for open world environments. The resource provides guidance on integrating multi-modal sensor data from cameras, LiDAR, radar, and inertial sensors for localization and mapping. It also establishes a bibliographic standard for robotics research by providing systems for m
Combines data streams from cameras, LiDAR, radar, and inertial sensors into a single spatial representation.
FAST-LIVO2 is a LiDAR-inertial odometry framework and factor-graph SLAM implementation designed for real-time robot localization and 3D mapping. It functions as a multi-sensor fusion pipeline and state estimator that integrates LiDAR, inertial, and camera inputs to track a robot's position and orientation. The system employs a tightly-coupled sensor fusion approach to maintain stable navigation, particularly in degraded environments. It utilizes a voxel-based 3D mapping tool to organize point clouds into volumetric grids, which optimizes memory usage and search speed during spatial reconstruc
Synchronizes timestamps from LiDAR, inertial, and camera sources to ensure correct temporal processing order.