18 repositorios
Zero-copy communication mechanisms for efficient data access across multiple processes.
Distinguishing note: Focuses on memory-mapped data sharing to avoid expensive data duplication.
Explore 18 awesome GitHub repositories matching data & databases · Shared Memory Transports. Refine with filters or upvote what's useful.
Apollo es un stack de software integral diseñado para el desarrollo de vehículos autónomos, proporcionando los componentes necesarios para la percepción, planificación y control. Funciona como un middleware de robótica de alto rendimiento, utilizando un bus de datos de publicación-suscripción para facilitar la comunicación de baja latencia entre módulos distribuidos y sensores de hardware. La plataforma integra datos de cámaras, lidar y radar a través de un framework de fusión de sensores para generar un modelo ambiental en tiempo real para la navegación. El sistema cuenta con un framework de tiempo de ejecución basado en componentes que gestiona la programación de tareas y la asignación de recursos, respaldado por una capa de abstracción de hardware que desacopla la lógica de conducción de configuraciones específicas de vehículos. Para garantizar un comportamiento consistente durante las pruebas, incluye un motor de reproducción determinista para flujos de datos de sensores y admite simulación hardware-in-the-loop. La plataforma también emplea programación de grafos acíclicos dirigidos y transporte de memoria compartida de copia cero para optimizar el flujo de datos y la eficiencia computacional en sistemas robóticos complejos. El software proporciona una interfaz de control de vehículo estandarizada para traducir las decisiones de navegación en comandos mecánicos. Hay documentación extensa disponible, incluyendo instrucciones de instalación, guías de integración de hardware y una serie de manuales de inicio rápido para varias versiones de la plataforma.
Allows multiple processes to access large sensor data buffers without expensive memory duplication.
Arrow is a cross-language development platform for in-memory data. It provides a standardized, language-independent columnar memory format designed to accelerate analytical operations and improve memory efficiency on modern computing hardware. By utilizing a schema-driven approach, the framework enables the efficient organization of both flat and nested data structures. The project functions as an analytical data processing engine that facilitates high-performance computation directly on memory-resident datasets. It distinguishes itself through a zero-copy architecture, which allows multiple
Provides zero-copy communication mechanisms for efficient data access across multiple processes.
This project is an NGINX module that embeds the Lua scripting language directly into the server environment. It functions as a request processor and response filter, enabling the execution of scripts to handle HTTP requests, generate dynamic content, and manage server behavior without external application calls. The module provides a shared memory dictionary and cache manager, allowing data to be stored and retrieved across all active worker processes. This capability supports the collection of high-performance server metrics and the synchronization of information across concurrent processes.
Provides a shared memory dictionary for synchronizing state and configuration across all active worker processes.
MacType is a system-level utility that replaces the default Windows font rasterization engine. It functions as a background service that intercepts and modifies font rendering calls to provide custom anti-aliasing, weight, and contrast adjustments for desktop applications. The software operates by injecting custom libraries into running processes to override standard text layout and graphics routines. It utilizes a shared memory space to apply configuration updates across multiple processes instantly, allowing for granular control over visual parameters such as gamma, hinting, and font substi
Uses shared memory to apply configuration updates across multiple processes instantly.
Crossplane is a Kubernetes-based control plane framework that functions as a cloud resource orchestrator and infrastructure-as-code platform. It enables the management of heterogeneous infrastructure by extending the Kubernetes API to provision and maintain external cloud services through declarative configuration. By utilizing custom resource controllers, it continuously reconciles the state of external infrastructure with defined desired states, ensuring consistent deployment and lifecycle management across multiple cloud providers. The platform distinguishes itself through its composition-
Crossplane stores and retrieves shared configuration data in an isolated environment to facilitate patching and state synchronization between composite and composed resources.
CuPy es una biblioteca de computación de matrices CUDA que implementa una interfaz compatible con NumPy para ejecutar operaciones de matrices y computación numérica en GPUs NVIDIA. Sirve como una biblioteca numérica acelerada por GPU y una implementación de SciPy basada en CUDA, descargando cálculos pesados al hardware gráfico para aumentar la velocidad de procesamiento para cargas de trabajo científicas y de ingeniería. La biblioteca permite el intercambio de tensores entre múltiples frameworks, permitiendo que los búferes de datos se compartan entre diferentes frameworks de aprendizaje profundo utilizando diseños de memoria estandarizados para evitar copias de memoria. También admite la integración de kernels de GPU personalizados, permitiendo que los datos de las matrices se conecten a APIs de bajo nivel para un control preciso sobre la ejecución del hardware. En términos generales, el proyecto cubre flujos de trabajo de procesamiento de matrices y computación científica de alto rendimiento. Sus capacidades incluyen la aceleración de cálculos de matrices y la provisión de herramientas para cálculos numéricos a gran escala.
Utilizes memory-mapped buffer sharing to enable zero-copy data exchange between different libraries.
OpenVINO is an AI inference engine and model serving platform designed to execute optimized deep learning models across CPUs, GPUs, and NPUs through a unified API. It includes a model optimization toolkit for converting, quantizing, and compressing models from various frameworks, alongside a specialized generative AI runtime for large language models. The project distinguishes itself through a plugin-based hardware acceleration layer that maps neural network operations to vendor-specific drivers. It features advanced execution mechanisms such as continuous batching, speculative decoding, and
Provides zero-copy memory buffers between the inference engine and native APIs to eliminate data copy overhead.
Napajs is an embeddable JavaScript engine and multi-threaded runtime designed to be integrated directly into other software applications as a component. It serves as a parallel computation framework that allows JavaScript code to execute across multiple threads, bypassing the standard single-threaded event loop limitation to handle CPU-intensive tasks. The runtime is distinguished by its ability to load and execute modules from the NPM ecosystem and its pluggable execution environment. This architecture allows for custom implementations of memory allocation, system logging, and performance me
Implements zero-copy communication by transferring typed arrays via shared memory buffers across multiple threads.
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Implements zero-copy memory transport to share data buffers between libraries without expensive CPU-to-GPU transfers.
Metalsmith is a Node.js static site generator and static content processor that transforms source files into websites, eBooks, or technical documentation. It functions as a file-to-object transformer, converting directory trees into plain JavaScript objects that can be programmatically manipulated in memory. The project is built around a pluggable build pipeline where files are passed through a sequence of custom functions to transform content and metadata incrementally. This architecture allows users to extend functionality by writing their own plugins or using third-party modules to define
Maintains a globally accessible memory space for synchronizing site-wide configuration and shared variables across all plugins.
LMCache is a distributed key-value cache manager and tiering system designed to accelerate large language model inference. It functions as a tiered storage layer that offloads tensors from GPU memory to CPU RAM, local disks, or remote object stores, enabling the reuse of cached prefixes across different inference sessions and serving engines. The system differentiates itself through a disaggregated prefill-decode model, which separates prompt processing from token generation by transferring caches between distributed compute nodes. It utilizes peer-to-peer orchestration to share and retrieve
Achieves zero-copy transfers by sharing tensors between the cache server and inference engine using shared memory.
Feast is an open-source feature store for machine learning that provides a central platform for defining, storing, and serving features across both training and inference workflows. It operates as a declarative system where feature definitions are written as code in Python files, synchronized to a central registry, and made available for low-latency online retrieval or point-in-time correct historical joins for training datasets. The project abstracts storage behind a pluggable architecture, allowing offline and online backends to be swapped without changing retrieval logic, and coordinates ma
Defines connection classes for offline store backends in the feature store configuration.
AFL++ is a coverage-guided fuzzing framework that discovers crashes and hangs in software by mutating inputs while tracking which code paths are exercised. It functions as both a fuzzing engine and a campaign manager, supporting targets with or without source code through compile-time instrumentation, dynamic binary instrumentation, and emulation. The framework includes tools for crash triage and analysis, test case minimization, and campaign deployment across local or distributed environments. The framework distinguishes itself through its breadth of instrumentation backends, allowing users
Passes input data between fuzzer and target through shared memory to reduce per-execution overhead.
This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across distributed GPU clusters. The repository distinguishes itself by offering deep-dive tutorials and implementation strategies for complex system challenges. It emphasizes high-performance architectural primitives, such as collective communication orchestration, distributed tensor sharding, and static gr
Implements shared memory transports to optimize communication efficiency by separating control and data layers.
Este proyecto es un recurso educativo que proporciona un tutorial de desarrollo integral para escribir y cargar programas eBPF utilizando C, Go y Rust dentro del kernel de Linux. Sirve como una guía técnica para desarrollar lógica personalizada para ejecutar directamente en el kernel. Los materiales cubren dominios especializados incluyendo observabilidad y rastreo del kernel, implementación de seguridad para detección de intrusiones e ingeniería de red de alto rendimiento para filtrado de paquetes y balanceo de carga. También incluye manuales dedicados para el rastreo del kernel de Linux y el uso de kprobes, uprobes y tracepoints. El proyecto abarca una amplia gama de áreas de capacidad, como instrumentación del kernel, monitoreo y observabilidad del sistema, análisis de red y aplicación de seguridad. Además, se extiende a la depuración a nivel de hardware para GPUs y controladores, así como a la manipulación de sistemas de bajo nivel y gestión de recursos.
Creates sparse memory regions shared between kernel and userspace to avoid expensive system calls.
Este proyecto es un driver de cámara para el sistema macOS y un plugin de software que expone flujos de video de software como entradas de cámara reconocidas por el hardware. Funciona como un plugin de cámara virtual para OBS, permitiendo que la salida en vivo de OBS sea utilizada como un dispositivo de webcam dentro de otras aplicaciones. La herramienta permite el enrutamiento de video compuesto desde una suite de producción hacia aplicaciones de videoconferencia como Zoom o Google Meet. Esto permite la transmisión de escenas procesadas en lugar de una señal de webcam cruda. El sistema se integra con macOS utilizando un driver de dispositivo a nivel de kernel y transferencias de buffer de memoria compartida para mover fotogramas de video desde el proceso de la aplicación hacia el sistema operativo. Utiliza el framework CoreMedia para manejar la temporización y los metadatos del flujo de video.
Uses a high-speed shared memory region to transfer raw video frames between user-space and the kernel driver.
pyslam is a framework for Simultaneous Localization and Mapping that combines Python flexibility with C++ performance. It is a sparse SLAM implementation designed to map environment geometry and track device location by processing image frames into 3D points. The project features a bridge for exposing high-performance C++ classes to Python scripts using zero-copy memory sharing. This integration allows for switching between a scripting interface for rapid prototyping and a compiled core for execution speed. The system includes a spatial map optimizer to refine 3D point and camera pose estima
Uses zero-copy memory sharing to move large spatial data structures between language runtimes without duplicating memory.
Dora is a robotics dataflow framework and distributed orchestrator used to build and manage processing pipelines. It enables the deployment of robotics workloads across clusters with remote node execution and provides a real-time data pipeline for predictable performance. The system is distinguished by its support for multi-language nodes written in Rust, Python, C, or C++ that interoperate within a single dataflow. It utilizes a zero-copy shared-memory transport and columnar formats to minimize latency for large payloads, and it includes bidirectional bridges to integrate with external ecosy
Automatically switches between shared memory for local nodes and network sockets for remote nodes.