Neural Networks - Sparse inference engine for transformer models.
ANEE is an experimental dynamic inference wrapper for pretrained Transformer language models (currently GPT-2). Instead of always running all layers, ANEE exposes an energy_budget and performs early exit inside the model’s forward pass.