OpenMythos

OpenMythos is a framework for implementing recurrent large language model architectures. It utilizes recurrent transformer blocks to enable compute-adaptive reasoning and variable processing depth through multiple iterative passes over the same weights.

The system features a mixture of experts framework that routes tokens between shared and specialized layers to optimize parameter usage. It also includes parameter-efficient fine-tuning tools using low-rank adaptation modules to modify model behavior with minimal weight updates.

The framework covers distributed training pipelines using data parallelism and mixed precision for multi-GPU hardware. It incorporates configurable attention mechanisms, such as grouped query and multi-latent attention, and employs depth-wise batching to allow early exit for simple inputs to increase inference throughput.

Features

Recurrent Transformer Blocks - Core architecture utilizing recurrent blocks to enable compute-adaptive and depth-variable reasoning.

Data-Parallel Training - Provides a distributed training pipeline using data parallelism to split large datasets across multiple GPUs.

Expert Routing Gates - Utilizes a token processing mechanism to distribute computation across specialized and shared expert layers.

Recurrent LLM Architectures - Implements a transformer model utilizing recurrent blocks for compute-adaptive reasoning through variable loop iterations.

Recurrent Transformer Architectures - Builds transformer models with recurrent blocks for compute-adaptive reasoning and variable depth processing.

Mixture of Experts - Includes a mixture of experts framework that routes tokens between shared and specialized layers.

Low-Rank Adaptation - Utilizes low-rank adaptation modules to modify model behavior with minimal parameter updates.

Parameter Efficient Fine-Tuning - Ships a system for integrating low-rank adaptation modules to minimize parameter updates during fine-tuning.

Recurrent-Depth Transformers - Implements a transformer architecture using recurrent blocks for compute-adaptive and depth-variable reasoning.

Distributed Training - Executes large-scale training runs across multi-GPU hardware using data parallelism.

Dynamic Depth Batching - Implements depth-wise batching to allow early exit for simple inputs, increasing inference throughput.

Large Scale Training - Provides a distributed training system utilizing data parallelism and mixed precision for large-scale model optimization.

Depth-wise Early Exit Strategies - Implements depth-wise batching to allow early exit for simple inputs, increasing overall inference throughput.

Inference Optimizations - Optimizes inference throughput by managing batch depth sequences and attention mechanisms to reduce memory overhead.

Parameter-Efficient Adapters - Integrates low-rank adaptation modules into the recursive base to modify behavior at different loop depths.

Configurable Attention Mechanisms - Provides a configuration for toggling between grouped query attention and multi-latent attention.

Latent Attention Mechanisms - Implements multi-latent attention to compress key-value pairs into latent representations for memory efficiency.

AI Tools - PyTorch implementation of transformer architectures.

kyegomezOpenMythos

Features

Star history