15 Repos
Techniques for pinning processes or threads to specific CPU cores to optimize cache usage and reduce context switching.
Distinct from CPU Optimizations: Candidates focus on hardware architectures or AI-specific optimizations, not general systems programming for server process binding.
Explore 15 awesome GitHub repositories matching operating systems & systems programming · CPU Affinity Binding. Refine with filters or upvote what's useful.
h2o is a high-performance content delivery server and HTTP/3 web server. It functions as a network gateway and reverse proxy that forwards client requests to upstream servers to manage traffic flow and load. The project distinguishes itself as a protocol fuzzing tool, utilizing a testing framework to execute automated stress tests against network protocols to identify memory leaks and crashes. The server provides capabilities for secure web traffic management through encrypted data transmission and high-performance web serving across HTTP/1, HTTP/2, and HTTP/3. It includes tools for server r
Implements CPU affinity binding to pin server threads to specific physical cores for reduced cache misses.
PowerInfer is a high-performance local large language model inference engine and sparse inference framework. It provides a runtime for executing models on consumer-grade hardware, utilizing a GPU acceleration backend to optimize tensor operations for graphics processors. The system distinguishes itself through a sparse inference framework that increases generation speed by skipping computations based on activation sparsity in model weights. It includes a GGUF model converter for transforming weights and metadata into a unified binary format, as well as an OpenAI API compatible server for inte
Binds execution threads to high-performance CPU cores to minimize scheduling latency and maximize generation speed.
iperf ist eine IP-Netzwerkmesssoftware, die entwickelt wurde, um Datentransferraten und Netzwerkstabilität zu quantifizieren. Es fungiert als Benchmarking-Tool für die Netzwerkleistung, das Kapazität und Durchsatz zwischen zwei Hosts testet, um Engpässe und Leistungsgrenzen zu identifizieren. Das Tool misst spezifisch die maximale Bandbreite und den Paketverlust über IP-Netzwerke unter Verwendung von TCP- und UDP-Protokollen. Es dient auch als Exporteur für Netzwerkdaten und gibt Leistungsergebnisse im JSON-Format für die programmatische Analyse und Integration aus. Die Software deckt eine Reihe von Funktionen ab, einschließlich der Analyse des Netzwerkdurchsatzes und der Verkehrstests. Sie ermöglicht die Messung der Gesamtdatenkapazität und die Bewertung von Netzwerkhardware und -konfigurationen durch standardisierte Tests.
Binds network processing threads to specific CPU cores to reduce cache misses and context switching.
htop is a terminal-based system resource monitor and interactive process viewer. It functions as a text-user interface dashboard for overseeing hardware temperatures, load averages, and battery status while providing a comprehensive tool for monitoring and managing system processes. The application distinguishes itself through detailed process lifecycle management, allowing users to kill processes, adjust priorities via renicing, and assign CPU affinity to specific cores. It provides high-level visibility into system behavior through process hierarchy visualization and the ability to inspect
Enables pinning processes to specific CPU cores to optimize performance and isolate workloads.
OpenBLAS is a high-performance implementation of the Basic Linear Algebra Subprograms standard designed for numerical computing and matrix operations. It serves as a hardware-accelerated numerical library and optimized math kernel library, providing a computational engine for large-scale matrix multiplication and vector operations. The library distinguishes itself through the use of hand-tuned assembly kernels and SIMD instruction mapping, such as AVX and SVE, to maximize floating-point performance on specific CPU architectures. It features a multi-threaded framework that manages parallel exe
Binds specific threads to CPU cores to optimize cache usage and maximize processing efficiency.
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Reduces latency by mapping internal thread categories to designated CPU cores.
CppGuide is a curated collection of educational resources and practical guides focused on C++ server development, Linux kernel internals, concurrent programming, network protocols, and security exploitation. It provides structured learning paths for backend developers, covering everything from interview preparation to building high-performance network servers and understanding operating system fundamentals. The guide distinguishes itself by offering in-depth, hands-on tutorials that walk through real-world implementations, including building a Redis-like server from scratch, designing custom
Restricts where a task, interrupt, or memory allocation may run by respecting affinity masks and memory policies.
htop ist ein Systemmonitor mit Terminal-Benutzeroberfläche für Unix-Systeme. Er fungiert als interaktiver Prozessbetrachter und Echtzeit-Ressourcenvisualisierer und bietet ein Dashboard zur Verfolgung von CPU-, Speicher- und Lastdurchschnittsmetriken. Das Tool ermöglicht das Sortieren, Filtern und Beenden aktiver Systemprozesse und Threads. Es zeichnet sich durch eine Textmodus-Schnittstelle aus, die Prozesse in einer hierarchischen Baumstruktur darstellen kann, um Eltern-Kind-Beziehungen zu visualisieren, und ermöglicht die Zuweisung der CPU-Affinität zu spezifischen Prozessorkernen. Die Überwachungsoberfläche deckt CPU-Auslastung, Speicherbelegungen, Batteriestatus und Systemlastdurchschnitte ab. Benutzer können das visuelle Erlebnis anpassen, indem sie Ressourcen-Meter konfigurieren, Prozessspalten ändern und spezifische Farbschemata anwenden. Die Schnittstelle enthält eine persistente Funktionsleiste, die verfügbare Tastenkombinationen und Befehle anzeigt.
Allows users to pin specific processes to designated CPU cores to optimize performance.
CRI-O is an open-source container runtime that implements the Kubernetes Container Runtime Interface (CRI) to manage container images, pods, and containers on cluster nodes using OCI-compatible runtimes. It serves as a node-level container manager that handles image pulling, container lifecycle, and resource monitoring for Kubernetes clusters, running containers according to the Open Container Initiative specifications. The runtime distinguishes itself through live configuration reloading that applies changes to runtime definitions, registry mirrors, and TLS certificates without restarting th
Assigns system-level commands and the container monitor to a dedicated CPU set for workload isolation.
seL4 is a formally verified microkernel whose C implementation is backed by machine-checked mathematical proofs of correctness, confidentiality, integrity, and availability. It enforces strict isolation between processes through hardware-enforced address space separation and a capability-based access control system, where each process holds explicit rights only to the resources it has been granted. The kernel exposes hardware resources through a minimal API of system calls that manage threads, address spaces, and inter-process communication, with synchronous IPC supporting sender-identifying b
Binds threads to specific processor cores to optimize cache usage and control execution placement.
monoio is a high-performance asynchronous runtime and executor for Rust. It implements a thread-per-core concurrency model that pins tasks to specific CPU cores to eliminate synchronization overhead and data migration. The runtime leverages the io_uring interface to perform non-blocking system calls and reduce kernel-user mode memory copying. It utilizes a high-performance I/O driver and zero-copy TCP stream wrapping to manage data transfer via shared-memory buffers. The project provides capabilities for CPU core affinity management, low-latency system programming, and high-performance netwo
Optimizes performance by pinning asynchronous tasks to specific CPU cores.
NCCL ist eine Hochleistungs-Kommunikationsbibliothek und ein Framework für verteiltes GPU-Computing, das für die Ausführung kollektiver und Punkt-zu-Punkt-Datenaustausche über mehrere GPUs in Einzel- oder Multi-Node-Systemen entwickelt wurde. Es dient als RDMA-GPU-Transportschicht und Speicher-Orchestrator, der die hochbandbreitige Synchronisation von Daten und Modellgradienten für verteiltes GPU-Training und Inference erleichtert. Die Bibliothek zeichnet sich durch ihre Fähigkeit aus, Kommunikationsprimitive direkt aus GPU-Kernels auszuführen, wodurch die Host-CPU aus dem kritischen Pfad entfernt wird. Sie nutzt topologiebewusste Pfadauswahl zur Optimierung der Datenbewegung und verwendet RDMA-basierten Netzwerktransport, einschließlich InfiniBand und NVLink, um Zero-Copy-Speicherzugriffe zwischen Geräten über verschiedene physische Knoten hinweg zu ermöglichen. Das Projekt deckt eine breite Palette an kollektiven Kommunikationsmustern ab, darunter Reduktionen, Broadcasts, Gathers und All-to-All-Austausche, neben Punkt-zu-Punkt-Remote-Speicherzugriffen. Es bietet umfassendes Communicator-Management für die Initialisierung, Partitionierung und Größenanpassung von GPU-Gruppen sowie spezialisiertes Speichermanagement für das Registrieren von Buffern und das Koordinieren von gemeinsam genutztem Gerätespeicher. Das System enthält eine Suite von Monitoring- und Observability-Tools für Health-Tracking, diagnostisches Logging und Echtzeit-Ereignisüberwachung sowie Integrationsschnittstellen für Machine-Learning-Frameworks, CUDA-Graphs, MPI und Python.
Binds internal threads to specific processor cores based on hardware proximity to minimize latency.
Asterinas is a memory-safe operating system kernel designed to prevent data races and memory corruption. It functions as a Linux-ABI compatible kernel, enabling the execution of existing Linux binaries and container workloads while providing a declarative operating system distribution model. The project distinguishes itself by acting as a virtual machine container host and a confidential computing guest OS, allowing it to run within hardware-isolated Trusted Execution Environments such as Intel TDX. It implements a minimal trusted computing base by isolating unsafe low-level operations and se
Prevents tasks from executing on multiple CPUs simultaneously using atomic flags during context switches.
Iggy is a distributed message streaming platform and multi-protocol message broker that functions as a persistent distributed log store. It provides infrastructure for publishing and consuming binary messages using an append-only log, ensuring high availability and data consistency across nodes through Viewstamped Replication. The platform is distinguished by its specialized LLM streaming infrastructure, which uses a server protocol to connect large language models to streaming data and system controls. This includes standardized protocols for context management and data bridging via HTTP or
Binds shards to specific CPU cores and detects hardware topology to maximize processing efficiency.
uperf is an Android performance tuning tool and Linux kernel parameter manager designed to optimize device responsiveness and battery life. It functions as a CPU affinity and scheduling manager, a hardware power profile controller, and a real-time system monitor that adjusts kernel parameters and CPU frequencies. The project distinguishes itself through real-time system monitoring of touchscreen input and frame rendering to trigger immediate performance boosts. It utilizes hardware performance profiling to apply pre-tuned configuration files tailored to specific hardware platforms, balancing
Implements CPU affinity binding to pin UI threads to high-performance clusters and reduce latency.