fxmengTransMLA

View on GitHub

0 stars0 forks1 view

TransMLA

Features

Attention Optimization - Implementation of multi-head latent attention mechanisms.

Open-source alternatives to TransMLA

Similar open-source projects, ranked by how many features they share with TransMLA.

stability-ai/stablelm
Stability-AI/StableLM
15,699View on GitHub
StableLM is a pre-trained transformer-based large language model designed for natural language generation and zero-shot inference. It functions as a causal language model that predicts the next token in a sequence to produce human-like text for conversational and creative writing tasks. The model is built as a fine-tunable base, allowing the adaptation of pre-trained weights to specific tasks or styles through custom dataset training and weight regularization. It utilizes rotary positional embeddings and flash-attention to optimize memory usage and processing efficiency during deployment on G
Jupyter Notebook
View on GitHub15,699
dao-ailab/flash-attention
Dao-AILab/flash-attention
24,220View on GitHub
FlashAttention is an attention mechanism optimization library and machine learning acceleration framework designed to increase training speed and reduce memory footprint for large-scale neural network models. It functions as a collection of low-level CUDA kernels that optimize memory-bound operations to improve hardware utilization on graphics processing units. The library distinguishes itself through an input-output-aware algorithm design that minimizes data movement between different levels of memory. By employing kernel fusion and tiled matrix multiplication, it combines sequential operati
Python
View on GitHub24,220
deepseek-ai/3fs
deepseek-ai/3FS
9,970View on GitHub
3FS is a distributed file system and RDMA storage cluster designed for high-performance AI training and inference workloads. It functions as a strongly consistent storage layer that utilizes a disaggregated architecture to pool SSDs and memory resources across multiple nodes. The system provides specialized storage implementations including an AI training checkpoint store for parallel state preservation and a distributed key-value cache store for decoder layer vectors to optimize inference processing. It ensures data integrity through chain replication and apportioned query distribution. The
C++
View on GitHub9,970
bytedance/shadowkv
bytedance/ShadowKV
306View on GitHub
ICML 2025 Spotlight ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Python
View on GitHub306

See all 30 alternatives to TransMLA

TransMLA

Features

Open-source alternatives to TransMLA

Stability-AI/StableLM

Dao-AILab/flash-attention

deepseek-ai/3FS

bytedance/ShadowKV

Star history

Open-source alternatives to TransMLA

Stability-AI/StableLM

Dao-AILab/flash-attention

deepseek-ai/3FS

bytedance/ShadowKV