18 Repos
Zero-copy communication mechanisms for efficient data access across multiple processes.
Distinguishing note: Focuses on memory-mapped data sharing to avoid expensive data duplication.
Explore 18 awesome GitHub repositories matching data & databases · Shared Memory Transports. Refine with filters or upvote what's useful.
Apollo ist ein umfassender Software-Stack für die Entwicklung autonomer Fahrzeuge, der die notwendigen Komponenten für Wahrnehmung, Planung und Steuerung bereitstellt. Er fungiert als leistungsstarke Robotik-Middleware und nutzt einen Publish-Subscribe-Datenbus, um eine latenzarme Kommunikation zwischen verteilten Modulen und Hardwaresensoren zu ermöglichen. Die Plattform integriert Daten von Kameras, Lidar und Radar über ein Sensorfusions-Framework, um ein Echtzeit-Umgebungsmodell für die Navigation zu generieren. Das System verfügt über ein komponentenbasiertes Laufzeit-Framework, das die Aufgabenplanung und Ressourcenallokation verwaltet, unterstützt durch eine Hardware-Abstraktionsschicht, die die Fahrlogik von spezifischen Fahrzeugkonfigurationen entkoppelt. Um ein konsistentes Verhalten während des Testens zu gewährleisten, enthält es eine deterministische Replay-Engine für Sensordatenströme und unterstützt Hardware-in-the-Loop-Simulation. Die Plattform verwendet zudem eine azyklische gerichtete Graph-Planung und Zero-Copy-Shared-Memory-Transport, um den Datenfluss und die Recheneffizienz in komplexen Robotersystemen zu optimieren. Die Software bietet eine standardisierte Fahrzeugsteuerungsschnittstelle, um Navigationsentscheidungen in mechanische Befehle zu übersetzen. Umfangreiche Dokumentationen sind verfügbar, einschließlich Installationsanweisungen, Hardware-Integrationsleitfäden und einer Reihe von Schnellstart-Handbüchern für verschiedene Versionen der Plattform.
Allows multiple processes to access large sensor data buffers without expensive memory duplication.
Arrow is a cross-language development platform for in-memory data. It provides a standardized, language-independent columnar memory format designed to accelerate analytical operations and improve memory efficiency on modern computing hardware. By utilizing a schema-driven approach, the framework enables the efficient organization of both flat and nested data structures. The project functions as an analytical data processing engine that facilitates high-performance computation directly on memory-resident datasets. It distinguishes itself through a zero-copy architecture, which allows multiple
Provides zero-copy communication mechanisms for efficient data access across multiple processes.
This project is an NGINX module that embeds the Lua scripting language directly into the server environment. It functions as a request processor and response filter, enabling the execution of scripts to handle HTTP requests, generate dynamic content, and manage server behavior without external application calls. The module provides a shared memory dictionary and cache manager, allowing data to be stored and retrieved across all active worker processes. This capability supports the collection of high-performance server metrics and the synchronization of information across concurrent processes.
Provides a shared memory dictionary for synchronizing state and configuration across all active worker processes.
MacType is a system-level utility that replaces the default Windows font rasterization engine. It functions as a background service that intercepts and modifies font rendering calls to provide custom anti-aliasing, weight, and contrast adjustments for desktop applications. The software operates by injecting custom libraries into running processes to override standard text layout and graphics routines. It utilizes a shared memory space to apply configuration updates across multiple processes instantly, allowing for granular control over visual parameters such as gamma, hinting, and font substi
Uses shared memory to apply configuration updates across multiple processes instantly.
Crossplane is a Kubernetes-based control plane framework that functions as a cloud resource orchestrator and infrastructure-as-code platform. It enables the management of heterogeneous infrastructure by extending the Kubernetes API to provision and maintain external cloud services through declarative configuration. By utilizing custom resource controllers, it continuously reconciles the state of external infrastructure with defined desired states, ensuring consistent deployment and lifecycle management across multiple cloud providers. The platform distinguishes itself through its composition-
Crossplane stores and retrieves shared configuration data in an isolated environment to facilitate patching and state synchronization between composite and composed resources.
CuPy ist eine CUDA-Array-Computing-Bibliothek, die eine NumPy-kompatible Schnittstelle für die Ausführung von Array-Operationen und numerischen Berechnungen auf NVIDIA GPUs implementiert. Sie dient als GPU-beschleunigte numerische Bibliothek und CUDA-basierte SciPy-Implementierung, die rechenintensive Aufgaben auf Grafikhardware auslagert, um die Verarbeitungsgeschwindigkeit für wissenschaftliche und technische Workloads zu erhöhen. Die Bibliothek ermöglicht den Austausch von Tensoren zwischen verschiedenen Frameworks, wodurch Datenpuffer zwischen verschiedenen Deep-Learning-Frameworks unter Verwendung standardisierter Speicherlayouts geteilt werden können, um Speicherkopien zu vermeiden. Sie unterstützt zudem die Integration benutzerdefinierter GPU-Kernel, wodurch Array-Daten mit Low-Level-APIs verbunden werden können, um eine präzise Kontrolle über die Hardwareausführung zu ermöglichen. Das Projekt deckt im Wesentlichen Workflows für Array-Verarbeitung und wissenschaftliches Rechnen mit hoher Leistung ab. Zu den Fähigkeiten gehören die Beschleunigung von Array-Berechnungen und die Bereitstellung von Werkzeugen für numerische Berechnungen im großen Maßstab.
Utilizes memory-mapped buffer sharing to enable zero-copy data exchange between different libraries.
OpenVINO is an AI inference engine and model serving platform designed to execute optimized deep learning models across CPUs, GPUs, and NPUs through a unified API. It includes a model optimization toolkit for converting, quantizing, and compressing models from various frameworks, alongside a specialized generative AI runtime for large language models. The project distinguishes itself through a plugin-based hardware acceleration layer that maps neural network operations to vendor-specific drivers. It features advanced execution mechanisms such as continuous batching, speculative decoding, and
Provides zero-copy memory buffers between the inference engine and native APIs to eliminate data copy overhead.
Napajs is an embeddable JavaScript engine and multi-threaded runtime designed to be integrated directly into other software applications as a component. It serves as a parallel computation framework that allows JavaScript code to execute across multiple threads, bypassing the standard single-threaded event loop limitation to handle CPU-intensive tasks. The runtime is distinguished by its ability to load and execute modules from the NPM ecosystem and its pluggable execution environment. This architecture allows for custom implementations of memory allocation, system logging, and performance me
Implements zero-copy communication by transferring typed arrays via shared memory buffers across multiple threads.
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Implements zero-copy memory transport to share data buffers between libraries without expensive CPU-to-GPU transfers.
Metalsmith is a Node.js static site generator and static content processor that transforms source files into websites, eBooks, or technical documentation. It functions as a file-to-object transformer, converting directory trees into plain JavaScript objects that can be programmatically manipulated in memory. The project is built around a pluggable build pipeline where files are passed through a sequence of custom functions to transform content and metadata incrementally. This architecture allows users to extend functionality by writing their own plugins or using third-party modules to define
Maintains a globally accessible memory space for synchronizing site-wide configuration and shared variables across all plugins.
LMCache is a distributed key-value cache manager and tiering system designed to accelerate large language model inference. It functions as a tiered storage layer that offloads tensors from GPU memory to CPU RAM, local disks, or remote object stores, enabling the reuse of cached prefixes across different inference sessions and serving engines. The system differentiates itself through a disaggregated prefill-decode model, which separates prompt processing from token generation by transferring caches between distributed compute nodes. It utilizes peer-to-peer orchestration to share and retrieve
Achieves zero-copy transfers by sharing tensors between the cache server and inference engine using shared memory.
Feast is an open-source feature store for machine learning that provides a central platform for defining, storing, and serving features across both training and inference workflows. It operates as a declarative system where feature definitions are written as code in Python files, synchronized to a central registry, and made available for low-latency online retrieval or point-in-time correct historical joins for training datasets. The project abstracts storage behind a pluggable architecture, allowing offline and online backends to be swapped without changing retrieval logic, and coordinates ma
Defines connection classes for offline store backends in the feature store configuration.
AFL++ is a coverage-guided fuzzing framework that discovers crashes and hangs in software by mutating inputs while tracking which code paths are exercised. It functions as both a fuzzing engine and a campaign manager, supporting targets with or without source code through compile-time instrumentation, dynamic binary instrumentation, and emulation. The framework includes tools for crash triage and analysis, test case minimization, and campaign deployment across local or distributed environments. The framework distinguishes itself through its breadth of instrumentation backends, allowing users
Passes input data between fuzzer and target through shared memory to reduce per-execution overhead.
This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across distributed GPU clusters. The repository distinguishes itself by offering deep-dive tutorials and implementation strategies for complex system challenges. It emphasizes high-performance architectural primitives, such as collective communication orchestration, distributed tensor sharding, and static gr
Implements shared memory transports to optimize communication efficiency by separating control and data layers.
Dieses Projekt ist eine Bildungsressource, die ein umfassendes Entwicklungs-Tutorial zum Schreiben und Laden von eBPF-Programmen unter Verwendung von C, Go und Rust innerhalb des Linux-Kernels bietet. Es dient als technischer Leitfaden für die Entwicklung benutzerdefinierter Logik zur direkten Ausführung im Kernel. Die Materialien decken spezialisierte Bereiche ab, einschließlich Kernel-Observability und Tracing, Sicherheitsimplementierung für Intrusion-Detection und High-Performance-Network-Engineering für Paketfilterung und Load Balancing. Es enthält zudem dedizierte Handbücher für Linux-Kernel-Tracing und die Verwendung von kprobes, uprobes und tracepoints. Das Projekt umfasst ein breites Spektrum an Funktionsbereichen, wie Kernel-Instrumentierung, Systemüberwachung und Observability, Netzwerkanalyse und Sicherheitsdurchsetzung. Es erstreckt sich zudem auf Hardware-Level-Debugging für GPUs und Treiber sowie auf Low-Level-Systemmanipulation und Ressourcenmanagement.
Creates sparse memory regions shared between kernel and userspace to avoid expensive system calls.
Dieses Projekt ist ein macOS-Systemkameratreiber und Software-Plugin, das Software-Videostreams als hardwareseitig erkannte Kameraeingänge bereitstellt. Es fungiert als OBS-Virtual-Camera-Plugin und ermöglicht es, die Live-Ausgabe von OBS als Webcam-Gerät in anderen Anwendungen zu nutzen. Das Tool ermöglicht das Routing von zusammengesetztem Video aus einer Produktionssuite in Videokonferenzanwendungen wie Zoom oder Google Meet. Dies erlaubt das Streamen bearbeiteter Szenen anstelle eines rohen Webcam-Feeds. Das System integriert sich unter macOS über einen Kernel-Level-Gerätetreiber und Shared-Memory-Buffer-Transfers, um Videoframes vom Anwendungsprozess an das Betriebssystem zu übertragen. Es nutzt das CoreMedia-Framework zur Handhabung von Videostream-Timing und Metadaten.
Uses a high-speed shared memory region to transfer raw video frames between user-space and the kernel driver.
pyslam is a framework for Simultaneous Localization and Mapping that combines Python flexibility with C++ performance. It is a sparse SLAM implementation designed to map environment geometry and track device location by processing image frames into 3D points. The project features a bridge for exposing high-performance C++ classes to Python scripts using zero-copy memory sharing. This integration allows for switching between a scripting interface for rapid prototyping and a compiled core for execution speed. The system includes a spatial map optimizer to refine 3D point and camera pose estima
Uses zero-copy memory sharing to move large spatial data structures between language runtimes without duplicating memory.
Dora is a robotics dataflow framework and distributed orchestrator used to build and manage processing pipelines. It enables the deployment of robotics workloads across clusters with remote node execution and provides a real-time data pipeline for predictable performance. The system is distinguished by its support for multi-language nodes written in Rust, Python, C, or C++ that interoperate within a single dataflow. It utilizes a zero-copy shared-memory transport and columnar formats to minimize latency for large payloads, and it includes bidirectional bridges to integrate with external ecosy
Automatically switches between shared memory for local nodes and network sockets for remote nodes.