awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Hardware Acceleration Kernels · Awesome GitHub Repositories

2 repos

Awesome GitHub RepositoriesHardware Acceleration Kernels

Optimized computational kernels designed to offload intensive mathematical operations to specialized hardware.

Distinguishing note: Focuses on low-level kernel optimization for tensor operations rather than high-level machine learning model management.

Explore 2 awesome GitHub repositories matching artificial intelligence & ml · Hardware Acceleration Kernels. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Hardware Acceleration Kernels

Awesome Hardware Acceleration Kernels GitHub Repositories

Describe the repository you're looking for…
Find the best repos with AI.We'll search the best matching repositories with AI.
  • ggml-org/whisper.cpp

    ggml-org/whisper.cpp

    46,843View on GitHub↗

    Whisper.cpp is a high-performance, local-first speech recognition engine designed to run large-scale machine learning models on consumer hardware. It functions as a portable library that converts audio into text, supporting both static file transcription and real-time stream processing. By utilizing a lightweight inference engine and weight quantization, the project minimizes memory and compute overhead, allowing for efficient execution without reliance on external cloud APIs or internet connectivity. The project distinguishes itself through a hardware-agnostic compute abstraction that offloa

    A collection of optimized kernels that offload intensive tensor operations to specialized graphics and neural processing units for maximum throughput.

    C++inferenceopenaispeech-recognition
    46,843View on GitHub↗
  • deepspeedai/DeepSpeed

    deepspeedai/DeepSpeed

    41,638View on GitHub↗

    DeepSpeed is a high-performance library designed to scale deep learning model training and inference across massive clusters of GPUs and compute nodes. It provides a comprehensive suite of tools for distributed training, enabling the execution of models that exceed the memory capacity of single devices through advanced parameter partitioning, pipeline-based model parallelism, and memory-efficient state offloading. The framework distinguishes itself through specialized communication-efficient optimizers and hardware-aware acceleration techniques. By utilizing gradient compression, quantization

    Custom-compiled kernels optimize mathematical operations for specific hardware architectures to maximize throughput and reduce computational latency.

    Pythonbillion-parameterscompressiondata-parallelism
    41,638View on GitHub↗