# ggerganov/ggml

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/ggerganov-ggml).**

14,831 stars · 1,676 forks · C++ · MIT

## Links

- GitHub: https://github.com/ggerganov/ggml
- awesome-repositories: https://awesome-repositories.com/repository/ggerganov-ggml.md

## Description

ggml is a low-level C++ tensor library and machine learning inference engine designed for performing mathematical operations on multi-dimensional arrays across diverse hardware platforms. It provides a foundational toolset for executing machine learning models and calculating mathematical gradients through an automatic differentiation library.

The project features a quantized tensor framework that converts floating-point weights into integer representations to reduce memory usage and increase inference speed. It utilizes a custom binary format for model serialization to ensure rapid loading and consistent versioning across different platforms.

The system covers a broad range of machine learning primitives, including graph-based computation for optimizing execution flow and parameter optimization for updating model weights. Its capabilities extend to neural network training and the deployment of large language models on consumer hardware.

## Tags

### Artificial Intelligence & ML

- [C Tensor Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/c-tensor-libraries.md) — Provides a low-level C-based library for performing mathematical operations on multi-dimensional arrays.
- [Automatic Differentiation Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/automatic-differentiation-engines.md) — Implements an engine that tracks operation dependencies to calculate gradients for model parameter updates.
- [Computational Graphs](https://awesome-repositories.com/f/artificial-intelligence-ml/computational-graphs.md) — Represents machine learning models as directed graphs of tensor operations to optimize execution and memory.
- [Hardware Acceleration Backends](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-backends.md) — Maps generic tensor operations to optimized machine instructions for various CPU and GPU architectures.
- [Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-engines.md) — Acts as a runtime environment for executing pre-trained neural network models with optimized performance.
- [Inference Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-optimization.md) — Optimizes model execution speed and reduces memory usage for running large neural networks on consumer hardware.
- [Tensor Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/tensor-computing-libraries/tensor-libraries.md) — Provides fundamental data structures and mathematical functions required for building custom high-dimensional array computations.
- [Tensor Operations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/tensor-computing-libraries/tensor-operations.md) — Performs low-level manipulations and transformations of multi-dimensional data structures across hardware platforms. ([source](https://github.com/ggerganov/ggml#readme))
- [Model Quantization Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/quantization/model-quantization-frameworks.md) — Implements a framework that converts high-precision weights into lower-precision formats to reduce model size.
- [Weight Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization.md) — Converts high-precision floating point weights into low-bit integer formats to reduce memory usage.
- [Model Serialization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serialization.md) — Implements binary formats for saving and loading model states and metadata to ensure consistency across hardware. ([source](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md))
- [Neural Network Training](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-training.md) — Supports updating model weights and optimizing performance via automatic gradient calculations.
- [Optimization Algorithms](https://awesome-repositories.com/f/artificial-intelligence-ml/optimization-algorithms.md) — Provides mathematical methods for updating model parameters to minimize loss functions during training. ([source](https://github.com/ggerganov/ggml#readme))

### Scientific & Mathematical Computing

- [Low-Level Tensor Libraries](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/scientific-computing-platforms/low-level-tensor-libraries.md) — Ships a low-level library for performing mathematical operations on multi-dimensional arrays without high-level abstractions.

### Part of an Awesome List

- [Local Model Deployment](https://awesome-repositories.com/f/awesome-lists/ai/local-model-deployment.md) — Enables the deployment of large language models on local hardware using memory-efficient quantization.
- [Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/machine-learning.md) — Tensor library with quantization support.

### Data & Databases

- [Binary Serialization Formats](https://awesome-repositories.com/f/data-databases/binary-serialization-formats.md) — Provides a custom binary serialization format for tensors and metadata to ensure rapid loading across platforms.

### DevOps & Infrastructure

- [Cross-Platform Runtimes](https://awesome-repositories.com/f/devops-infrastructure/execution-environments/code-execution-runtimes/cross-platform-runtimes.md) — Ensures consistent model execution across diverse hardware architectures through portable tensor operations.