6 个仓库
Systems that extract information from combined visual and temporal data sources.
Explore 6 awesome GitHub repositories matching data & databases · Multi-Modal Data Processors. Refine with filters or upvote what's useful.
This project is an artificial intelligence-powered frontend generator that translates visual design inputs into functional source code. It functions as a workflow engine that interprets graphical user interfaces, mapping layout structures and styling rules to structured markup and programming language syntax. The tool distinguishes itself by supporting both static design mockups and dynamic video recordings. It processes temporal and spatial information from screen captures to reconstruct interaction flows and state transitions, enabling the creation of functional software prototypes from vis
Extracts temporal and spatial information from video recordings to reconstruct interaction flows and dynamic UI states in generated code.
The Model Context Protocol SDK is a framework for building clients and servers that connect AI models to external data, tools, and resources using a standardized communication protocol. It provides the foundational libraries and interfaces necessary to establish reliable, transport-agnostic connections between AI agents and external systems, enabling seamless information retrieval and task automation. The SDK distinguishes itself through a robust capability negotiation handshake that ensures compatibility between connected parties before exchanging messages. It supports a pluggable transport
Automatically handles binary data, file paths, and format detection for multi-modal tool communication.
MMDetection3D is an open-source toolbox for 3D perception, providing a unified framework for detecting and segmenting objects in three-dimensional environments. It supports a range of core tasks including monocular 3D object detection from single camera images, LiDAR-based 3D object detection from raw point clouds, and multi-modal fusion that combines camera images with LiDAR data. The toolbox also covers point cloud semantic segmentation, assigning class labels to every point in a scan for scene understanding. The project distinguishes itself through a config-driven pipeline that orchestrate
Loads and synchronizes point clouds, camera images, and calibration data from multiple sweeps into a unified input.
SpatialLM 是一个空间建模框架,利用大语言模型将单目视频和传感器数据转换为结构化的室内语义地图。它作为一个室内布局估计系统和点云语义解析器,将原始几何数据转换为建筑元素和对象类别的表示。 该项目将多模态传感器输入与语言标记对齐,使语言模型能够作为推理引擎来推断房间拓扑结构。它采用多种机制将 3D 点云和 2D 图像序列转换为离散标记和结构化空间编码,然后解码为建筑布局。 该框架涵盖 3D 场景分析和对象检测,通过边界框和语义标签识别家具。它还为机器人环境理解提供工具,处理传感器数据以创建用于自主导航的语义地图。
Integrates sensor-derived geometric data with linguistic tokens into a unified spatial representation.
This project is a technical reference guide and sensor-based robotics manual focused on the theoretical foundations and practical implementation of Simultaneous Localization and Mapping. It serves as a knowledge base for spatial AI, covering the integration of deep learning and semantic rendering to create intelligent systems for open world environments. The resource provides guidance on integrating multi-modal sensor data from cameras, LiDAR, radar, and inertial sensors for localization and mapping. It also establishes a bibliographic standard for robotics research by providing systems for m
Combines data streams from cameras, LiDAR, radar, and inertial sensors into a single spatial representation.
FAST-LIVO2 is a LiDAR-inertial odometry framework and factor-graph SLAM implementation designed for real-time robot localization and 3D mapping. It functions as a multi-sensor fusion pipeline and state estimator that integrates LiDAR, inertial, and camera inputs to track a robot's position and orientation. The system employs a tightly-coupled sensor fusion approach to maintain stable navigation, particularly in degraded environments. It utilizes a voxel-based 3D mapping tool to organize point clouds into volumetric grids, which optimizes memory usage and search speed during spatial reconstruc
Synchronizes timestamps from LiDAR, inertial, and camera sources to ensure correct temporal processing order.