6 مستودعات
Systems that extract information from combined visual and temporal data sources.
Explore 6 awesome GitHub repositories matching data & databases · Multi-Modal Data Processors. Refine with filters or upvote what's useful.
This project is an artificial intelligence-powered frontend generator that translates visual design inputs into functional source code. It functions as a workflow engine that interprets graphical user interfaces, mapping layout structures and styling rules to structured markup and programming language syntax. The tool distinguishes itself by supporting both static design mockups and dynamic video recordings. It processes temporal and spatial information from screen captures to reconstruct interaction flows and state transitions, enabling the creation of functional software prototypes from vis
Extracts temporal and spatial information from video recordings to reconstruct interaction flows and dynamic UI states in generated code.
The Model Context Protocol SDK is a framework for building clients and servers that connect AI models to external data, tools, and resources using a standardized communication protocol. It provides the foundational libraries and interfaces necessary to establish reliable, transport-agnostic connections between AI agents and external systems, enabling seamless information retrieval and task automation. The SDK distinguishes itself through a robust capability negotiation handshake that ensures compatibility between connected parties before exchanging messages. It supports a pluggable transport
Automatically handles binary data, file paths, and format detection for multi-modal tool communication.
MMDetection3D is an open-source toolbox for 3D perception, providing a unified framework for detecting and segmenting objects in three-dimensional environments. It supports a range of core tasks including monocular 3D object detection from single camera images, LiDAR-based 3D object detection from raw point clouds, and multi-modal fusion that combines camera images with LiDAR data. The toolbox also covers point cloud semantic segmentation, assigning class labels to every point in a scan for scene understanding. The project distinguishes itself through a config-driven pipeline that orchestrate
Loads and synchronizes point clouds, camera images, and calibration data from multiple sweeps into a unified input.
SpatialLM هو إطار عمل للنمذجة المكانية يستخدم نماذج لغوية كبيرة لتحويل بيانات الفيديو أحادي العين وبيانات المستشعرات إلى خرائط داخلية دلالية مهيكلة. يعمل النظام كأداة لتقدير التصميم الداخلي ومحلل دلالي لسحب النقاط، حيث يحول البيانات الهندسية الخام إلى تمثيلات للعناصر المعمارية وفئات الكائنات. يُنسق المشروع بين مدخلات المستشعرات متعددة الوسائط والرموز اللغوية، مما يسمح للنموذج اللغوي بالعمل كمحرك استنتاجي لاستنباط طوبولوجيا الغرف. يستخدم آليات لتحويل سحب النقاط ثلاثية الأبعاد وتسلسلات الصور ثنائية الأبعاد إلى رموز منفصلة وترميزات مكانية مهيكلة، والتي يتم فك تشفيرها لاحقاً إلى تخطيطات معمارية. يغطي إطار العمل تحليل المشاهد ثلاثية الأبعاد واكتشاف الكائنات لتحديد الأثاث عبر مربعات الإحاطة والتصنيفات الدلالية. كما يوفر أدوات لفهم البيئة للروبوتات، حيث يعالج بيانات المستشعرات لإنشاء خرائط دلالية للملاحة الذاتية.
Integrates sensor-derived geometric data with linguistic tokens into a unified spatial representation.
This project is a technical reference guide and sensor-based robotics manual focused on the theoretical foundations and practical implementation of Simultaneous Localization and Mapping. It serves as a knowledge base for spatial AI, covering the integration of deep learning and semantic rendering to create intelligent systems for open world environments. The resource provides guidance on integrating multi-modal sensor data from cameras, LiDAR, radar, and inertial sensors for localization and mapping. It also establishes a bibliographic standard for robotics research by providing systems for m
Combines data streams from cameras, LiDAR, radar, and inertial sensors into a single spatial representation.
FAST-LIVO2 is a LiDAR-inertial odometry framework and factor-graph SLAM implementation designed for real-time robot localization and 3D mapping. It functions as a multi-sensor fusion pipeline and state estimator that integrates LiDAR, inertial, and camera inputs to track a robot's position and orientation. The system employs a tightly-coupled sensor fusion approach to maintain stable navigation, particularly in degraded environments. It utilizes a voxel-based 3D mapping tool to organize point clouds into volumetric grids, which optimizes memory usage and search speed during spatial reconstruc
Synchronizes timestamps from LiDAR, inertial, and camera sources to ensure correct temporal processing order.