6 repositorios
Systems that extract information from combined visual and temporal data sources.
Explore 6 awesome GitHub repositories matching data & databases · Multi-Modal Data Processors. Refine with filters or upvote what's useful.
This project is an artificial intelligence-powered frontend generator that translates visual design inputs into functional source code. It functions as a workflow engine that interprets graphical user interfaces, mapping layout structures and styling rules to structured markup and programming language syntax. The tool distinguishes itself by supporting both static design mockups and dynamic video recordings. It processes temporal and spatial information from screen captures to reconstruct interaction flows and state transitions, enabling the creation of functional software prototypes from vis
Extracts temporal and spatial information from video recordings to reconstruct interaction flows and dynamic UI states in generated code.
The Model Context Protocol SDK is a framework for building clients and servers that connect AI models to external data, tools, and resources using a standardized communication protocol. It provides the foundational libraries and interfaces necessary to establish reliable, transport-agnostic connections between AI agents and external systems, enabling seamless information retrieval and task automation. The SDK distinguishes itself through a robust capability negotiation handshake that ensures compatibility between connected parties before exchanging messages. It supports a pluggable transport
Automatically handles binary data, file paths, and format detection for multi-modal tool communication.
MMDetection3D is an open-source toolbox for 3D perception, providing a unified framework for detecting and segmenting objects in three-dimensional environments. It supports a range of core tasks including monocular 3D object detection from single camera images, LiDAR-based 3D object detection from raw point clouds, and multi-modal fusion that combines camera images with LiDAR data. The toolbox also covers point cloud semantic segmentation, assigning class labels to every point in a scan for scene understanding. The project distinguishes itself through a config-driven pipeline that orchestrate
Loads and synchronizes point clouds, camera images, and calibration data from multiple sweeps into a unified input.
SpatialLM es un framework de modelado espacial que utiliza modelos de lenguaje de gran tamaño (LLM) para transformar datos de video monocular y sensores en mapas semánticos interiores estructurados. Funciona como un sistema para la estimación de diseños de interiores y un analizador semántico de nubes de puntos, convirtiendo datos geométricos crudos en representaciones de elementos arquitectónicos y categorías de objetos. El proyecto alinea entradas de sensores multimodales con tokens lingüísticos, permitiendo que un modelo de lenguaje actúe como un motor de razonamiento para inferir la topología de una habitación. Emplea mecanismos para convertir nubes de puntos 3D y secuencias de imágenes 2D en tokens discretos y codificaciones espaciales estructuradas, que luego se decodifican en diseños arquitectónicos. El framework abarca el análisis de escenas 3D y la detección de objetos para identificar mobiliario mediante cuadros delimitadores (bounding boxes) y etiquetas semánticas. También proporciona herramientas para la comprensión ambiental de robots, procesando datos de sensores para crear mapas semánticos destinados a la navegación autónoma.
Integrates sensor-derived geometric data with linguistic tokens into a unified spatial representation.
This project is a technical reference guide and sensor-based robotics manual focused on the theoretical foundations and practical implementation of Simultaneous Localization and Mapping. It serves as a knowledge base for spatial AI, covering the integration of deep learning and semantic rendering to create intelligent systems for open world environments. The resource provides guidance on integrating multi-modal sensor data from cameras, LiDAR, radar, and inertial sensors for localization and mapping. It also establishes a bibliographic standard for robotics research by providing systems for m
Combines data streams from cameras, LiDAR, radar, and inertial sensors into a single spatial representation.
FAST-LIVO2 is a LiDAR-inertial odometry framework and factor-graph SLAM implementation designed for real-time robot localization and 3D mapping. It functions as a multi-sensor fusion pipeline and state estimator that integrates LiDAR, inertial, and camera inputs to track a robot's position and orientation. The system employs a tightly-coupled sensor fusion approach to maintain stable navigation, particularly in degraded environments. It utilizes a voxel-based 3D mapping tool to organize point clouds into volumetric grids, which optimizes memory usage and search speed during spatial reconstruc
Synchronizes timestamps from LiDAR, inertial, and camera sources to ensure correct temporal processing order.