OpenMythos is a framework for implementing recurrent large language model architectures. It utilizes recurrent transformer blocks to enable compute-adaptive reasoning and variable processing depth through multiple iterative passes over the same weights.
The system features a mixture of experts framework that routes tokens between shared and specialized layers to optimize parameter usage. It also includes parameter-efficient fine-tuning tools using low-rank adaptation modules to modify model behavior with minimal weight updates.
The framework covers distributed training pipelines using data parallelism and mixed precision for multi-GPU hardware. It incorporates configurable attention mechanisms, such as grouped query and multi-latent attention, and employs depth-wise batching to allow early exit for simple inputs to increase inference throughput.