30 open-source projects similar to google/highway, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Highway alternative.
c3c is the compiler for the C3 programming language, transforming source code into executable binaries, static libraries, or dynamic libraries using an LLVM backend. It implements a system based on result-based error handling, scoped memory pooling, and a semantic macro system. The compiler provides first-class support for hardware-backed SIMD vectors that map directly to processor instructions and enables runtime polymorphism through interface-based dynamic dispatch. The project covers a broad set of low-level capabilities, including manual and pooled memory management, inline assembly inte
ISPC is a vectorizing compiler and SIMD parallel programming language that implements a single program multiple data model. It serves as a toolchain for translating C-based code with parallel extensions into optimized machine code for various CPU and GPU architectures using an LLVM backend. The compiler is designed for cross-platform SIMD toolchain support, generating specialized instruction sets for x86 SSE/AVX, ARM NEON, and Intel GPU from a single source. It features a runtime dispatch mechanism that selects the most efficient hardware-specific implementation for the current system during
OpenBLAS is a high-performance implementation of the Basic Linear Algebra Subprograms standard designed for numerical computing and matrix operations. It serves as a hardware-accelerated numerical library and optimized math kernel library, providing a computational engine for large-scale matrix multiplication and vector operations. The library distinguishes itself through the use of hand-tuned assembly kernels and SIMD instruction mapping, such as AVX and SVE, to maximize floating-point performance on specific CPU architectures. It features a multi-threaded framework that manages parallel exe
ZLUDA is a middleware and translation engine designed to enable the execution of unmodified proprietary compute binaries on non-native graphics hardware. It functions as a compatibility layer that bridges vendor-specific compute interfaces with open standards, allowing software originally restricted to a single hardware ecosystem to operate on alternative graphics processing units. The project achieves this through a combination of dynamic library interception and runtime instruction translation. By replacing standard system libraries and mapping proprietary compute calls to open standards, t
Crystal is a statically typed, compiled programming language designed for high performance and memory safety. It leverages an LLVM-based compiler to translate source code into optimized machine-executable binaries, while its type-inference-based static analysis enforces strict safety rules during the build process. The language distinguishes itself through a fiber-based concurrent runtime that manages lightweight execution units for asynchronous input and output without blocking the main process. It also features a powerful compile-time macro system that allows for the inspection and transfor
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
gfx is a hardware-agnostic graphics API abstraction that translates a unified set of graphics and compute commands into native instructions for multiple GPU drivers. It provides a common interface for cross-platform rendering and general-purpose GPU compute programming. The project features an intermediate-representation shader translation system that converts source code and SPIR-V into target-specific languages. It employs a data-driven reference test framework to verify that graphics output remains consistent across different hardware platforms. Capabilities include parallel command buffe
This project is a cross-platform graphics and compute framework that provides a unified, hardware-agnostic abstraction layer for rendering and parallel processing. It enables developers to build high-performance applications that execute consistently across diverse operating systems and hardware backends, including Vulkan, Metal, and DirectX. By mapping high-level graphics commands to native APIs, it serves as a portable foundation for both real-time 3D rendering and general-purpose GPU computing. The framework distinguishes itself through a robust architecture that supports both native deskt
TinyGo is a specialized compiler and development toolkit designed to bring the Go programming language to resource-constrained microcontrollers and WebAssembly environments. It provides a bare-metal runtime environment that enables high-level code execution without the need for a traditional operating system, utilizing an LLVM-based backend to generate efficient machine instructions. The project distinguishes itself through aggressive optimization techniques tailored for small hardware, including a static memory allocation strategy and whole-program dead code elimination that significantly re
simdjson is a high-performance, header-only C++ library designed for parsing, querying, and serializing JSON data with minimal memory overhead. It functions as a hardware-aware data processing engine that leverages vector instructions to achieve gigabyte-per-second parsing speeds. By detecting host processor capabilities at runtime, the library automatically selects the most efficient instruction sets to accelerate structural analysis and validation. The library distinguishes itself through a focus on extreme efficiency and resource management. It utilizes memory mapping and padded buffer ali
This project is a first-person shooter game engine and a pseudo-3D rendering engine written in C. It serves as a software framework for rendering three-dimensional environments and managing entity physics. The engine includes a networked multiplayer system designed to synchronize game state and player actions across a client-server network. It utilizes a portable codebase that allows game logic to be adapted across different operating systems and hardware architectures. Core capabilities cover 3D game engine architecture, including spatial partitioning and depth-based rendering. The system a
This is an Android barcode scanning library designed to detect and decode barcode data from a live camera feed. It provides the core infrastructure for translating visual patterns into text on mobile devices. The library includes a camera preview manager that adjusts aspect ratios and feed sizes to fit various screen dimensions, as well as a hardware controller for managing flash, autofocus, and sensor selection. It also features a barcode format filter to restrict scanning to specific barcode types to increase detection speed and accuracy. The project covers camera hardware control, live ba
CppGuide is a curated collection of educational resources and practical guides focused on C++ server development, Linux kernel internals, concurrent programming, network protocols, and security exploitation. It provides structured learning paths for backend developers, covering everything from interview preparation to building high-performance network servers and understanding operating system fundamentals. The guide distinguishes itself by offering in-depth, hands-on tutorials that walk through real-world implementations, including building a Redis-like server from scratch, designing custom
MediaPipe is a cross-platform machine learning framework designed for building and deploying pipelines that process live and streaming media. It provides a system for connecting processing components into custom machine learning chains to analyze real-time audio and video streams. The framework includes a suite of pre-trained models for tasks such as hand, face, and pose tracking, along with tools for retraining and customizing these models with specific datasets. It also features a dedicated benchmarker for measuring the execution speed and accuracy of machine learning models directly within
Pebble is a reference implementation of wearable firmware intended for embedded system research and firmware analysis. It serves as a technical blueprint for understanding how smartwatch operating systems are structured and executed on constrained hardware. The codebase focuses on low power hardware rendering and the management of pixel data and display buffers to minimize energy consumption. It provides a historical guide for watch application development and the study of wearable software architecture. The system incorporates a hardware abstraction layer, a monolithic firmware image, and a
This project is a cross-platform game engine framework and build pipeline designed to produce playable executable binaries for desktop and Android devices. It provides a collection of core libraries for game development, including a native Android build system and a C++ build pipeline. The framework features a specialized bitmap font rendering system that displays stylized typography by mapping character indices to image atlases using coordinate and spacing metadata. It also implements a hardware abstraction layer to decouple application logic from graphics and audio drivers, ensuring portabl
WiringPi is a GPIO hardware control library and embedded hardware interface designed for managing General Purpose Input Output pins on single board computers. It provides a standardized software layer for interacting with hardware registers, facilitating low-latency pin manipulation. The project includes a GPIO command line interface that allows users to inspect pin states and manage hardware input and output levels directly from the terminal. It supports embedded hardware prototyping and hardware state monitoring, specifically targeting Raspberry Pi boards to interact with external electron
stable-diffusion.cpp is a high-performance C++ inference engine designed for generating images and video from text prompts using Stable Diffusion models. It functions as a latent diffusion model runtime and a lightweight machine learning framework that enables local diffusion model execution on consumer hardware. The project distinguishes itself as a CPU-based image generator capable of running without a dedicated GPU. It employs a specialized C++ tensor backend and cross-backend hardware abstraction to dispatch compute tasks across different processor instruction sets and graphics APIs. The
OpenRGB is a centralized software suite for controlling colors and lighting effects across various brands of RGB hardware. It functions as a cross-platform controller and hardware control system that provides a unified interface for managing lighting profiles and effects. The project features an extensible plugin framework and a dedicated plugin interface that allow for the addition of new hardware support and integration features. It includes a network gateway that exposes an API for third-party applications to send lighting commands to connected devices. The system supports multi-computer
MaaFramework is a GPU-accelerated automation framework and image recognition tool. It identifies and interacts with visual screen elements by offloading compute-intensive pixel-level processing and image analysis to graphics hardware. The framework employs a hardware-abstracted execution layer to maintain consistent performance across different hardware configurations. This architecture supports a hardware-accelerated recognition pipeline that reduces latency during visual automation tasks. The project includes a community-driven resource directory and searchable asset registry. These tools
The Espressif SoC Development Framework is a comprehensive toolset for developing, compiling, and flashing applications targeting Espressif system-on-chips. It serves as an embedded toolchain orchestrator and a hardware abstraction layer that simplifies the control of low-level peripherals, memory mapping, and chip-specific registers. The framework provides a dedicated IoT connectivity stack for implementing Wi-Fi, Bluetooth, Zigbee, and Ethernet communication on microcontrollers. It also includes an embedded real-time operating system integration layer to manage multitasking and scheduling o
Pygame is a framework for building interactive 2D applications and games using the Python programming language. It functions as a 2D graphics rendering engine, a game input and event manager, and a multimedia audio toolkit. The project serves as a wrapper for the Simple DirectMedia Layer, providing a Python interface for low-level access to graphics, audio, keyboard, mouse, and joystick hardware. Its capabilities cover 2D graphics rendering and image manipulation, digital audio playback and streaming, and the management of game entities and physics simulations. It also includes tools for rea
This project is a Java GUI framework used to build cross-platform desktop, mobile, and embedded applications. It centers on a hardware accelerated graphics engine that provides 2D and 3D visualizations and visual effects, complemented by a reactive UI binding system for synchronizing data and interface updates. The framework distinguishes itself through the FXML markup language, which separates the visual structure of an interface from its procedural logic. It also includes a dedicated CSS styling engine that allows for the customization of component appearances using external stylesheets and
MonoGame is a cross-platform game engine and framework built for the C# language and .NET ecosystem. It provides a development environment for creating 2D and 3D games that run across multiple operating systems and hardware platforms. The project serves as an open-source implementation of the XNA game studio API. It includes a dedicated game asset pipeline to convert raw art and sound files into optimized formats for efficient engine loading. The framework provides a hardware-abstraction layer to maintain a consistent interface for graphics and input across different devices. Its capabilitie
OSHI is a Java system information library and cross-platform hardware monitor used to extract real-time performance data and specifications from processors, memory, disks, network interfaces, and system firmware. It serves as an operating system metadata provider, querying system boot times, uptime, and detailed version information across various desktop and server distributions. The library integrates with observability pipelines by exporting system and process metrics to external monitoring backends using the Micrometer standard. It also supports connecting to vendor libraries to extract ad
Autoware is a modular autonomous driving stack and open-source platform for advanced driver assistance systems. It functions as an integrated operating environment that manages the full pipeline from sensor data processing to vehicle actuation, utilizing the ROS 2 robotics framework for distributed communication and hardware abstraction. The system provides a comprehensive software architecture to enable autonomous driving across various vehicle platforms. It coordinates perception, planning, and control systems to operate vehicles without human intervention. The platform covers several core
Sokol is a C hardware abstraction layer and cross-platform graphics library designed for managing windowing, input, and audio across different operating systems. It functions as a GPU resource manager and multimedia application framework, providing a unified API for rendering 2D and 3D graphics across WebGL, Metal, Direct3D, and OpenGL. The project is distinguished by its single-header implementation, which simplifies integration and portability. It utilizes a stateless render pass definition and a one-update-per-frame model to synchronize CPU data to GPU memory and manage resource lifecycles
OpenBLAS is a high-performance library for basic linear algebra subprograms that provides optimized matrix and vector operations. It serves as a multi-architecture math backend and numerical computing framework designed to execute complex mathematical calculations and high-speed numerical analysis. The library functions as an optimized CPU math library that detects hardware at runtime to apply the most efficient operation kernels for the specific processor. It supports multiple CPU targets through a combination of optimized assembly and C implementations. The project covers high-performance
XMRig is a multi-algorithm hashing tool and cryptocurrency miner that utilizes CPU, CUDA, and OpenCL hardware to execute hashing algorithms across multiple operating systems. It functions as a computational engine for mining cryptocurrency and benchmarking hardware efficiency. The project includes a Stratum proxy server that routes mining traffic between worker clients and pools to optimize connectivity and balance load. It also provides a secured HTTP management API for monitoring hashrates and modifying miner configuration in real time without restarting the process. The software covers a
PX4-Autopilot is a professional-grade flight control software stack designed for autonomous unmanned vehicles, including multicopters, fixed-wing aircraft, and vertical takeoff and landing platforms. It operates as a modular, real-time framework that decouples flight control logic from hardware drivers through a publish-subscribe middleware architecture. The system utilizes a deterministic microkernel runtime to execute time-critical flight control loops and sensor fusion tasks, ensuring stable navigation and vehicle operation. The platform distinguishes itself through a parameter-driven conf