PocketFlow

PocketFlow is an integrated toolkit for deep learning model compression, distributed training, and mobile format optimization. It provides a system for reducing the size and complexity of neural networks to improve inference efficiency, featuring a dedicated engine for knowledge distillation and a mobile model optimizer.

The framework differentiates itself through an automated hyperparameter tuning system that uses reinforcement learning and statistical models to determine optimal compression ratios and layer-wise bit allocation. It also includes a distributed training system that utilizes multi-GPU acceleration to speed up the fine-tuning and compression of large networks.

The toolkit covers several core compression methodologies, including weight sparsification, convolutional channel pruning, and both uniform and non-uniform quantization. It provides workflows for recovering precision via knowledge distillation and includes utilities for exporting optimized checkpoints into formats compatible with mobile interpreters.

The project supports the import of pre-trained weights to initialize the compression process and allows for the integration of custom data pipelines and loss functions.

Features

Model Compression - Provides a comprehensive system for reducing neural network size and complexity through pruning, sparsification, and quantization.

Distributed Training Accelerators - Splits training workloads across multiple GPUs to accelerate the fine-tuning and compression of large networks.

Distributed Training - Features a multi-GPU acceleration layer for scaling the fine-tuning and compression of large neural networks.

Knowledge Distillation - Uses knowledge distillation to train student models that mimic teacher models for accuracy recovery after compression.

Model Compression Suites - Provides an integrated suite for reducing model size through pruning, quantization, and distillation.

Weight Quantization - Lowers the precision of model weights and activations to reduce memory footprint and accelerate execution.

Weight Pruning - Provides weight sparsification through a dynamic pruning schedule during training to reduce model size.

Mobile Model Deployment - Optimizes and converts trained model checkpoints into lightweight formats for efficient on-device inference.

Model Format Optimizers - Implements a pipeline to transform large checkpoints into optimized formats for mobile device deployment.

Layer-Wise Optimization Strategies - Determines optimal pruning ratios or quantization widths for individual layers to maintain accuracy within resource budgets.

Teacher-Student Distillation - Employs teacher-student distillation to train compact models that mimic the performance of larger teacher models.

Convolutional Channel Pruning - Reduces input channels in convolutional layers to decrease model size while minimizing reconstruction loss.

Pruning Ratio Optimization - Uses reinforcement learning to find optimal compression ratios that meet specific computation or FLOPs budgets.

Model Sparsification - Implements a dynamic pruning schedule to reduce the number of non-zero weights and decrease inference cost.

Non-Uniform Quantization - Optimizes the distribution of quantization points through back-propagation to approximate full-precision network behavior.

Automated Bit Allocation - Employs reinforcement learning to automatically determine the most efficient bit distribution across model layers.

Layer-Wise Bit Allocation - Assigns different quantization bit-widths to individual network layers to balance inference speed and model accuracy.

Quantization-Aware Training - Fine-tunes models during the quantization process to recover accuracy lost during weight and activation compression.

Hyperparameter Optimizers - Automatically searches for optimal compression ratios and bit-widths using reinforcement learning and statistical models.

Non-Uniform Quantization - Implements back-propagation-based non-uniform quantization to approximate full-precision network behavior.

Automated Hyperparameter Search - Uses a reinforcement learning agent to automatically determine optimal pruning ratios and bit-widths across model layers.

RL-Based Parameter Tuning - Iteratively searches for the best configuration settings using reinforcement learning to maximize model performance.

TFLite Exports - Transforms trained model checkpoints into the TFLite format for deployment on mobile devices.

Mobile Model Format Converters - Transforms optimized checkpoints into a single file format compatible with mobile interpreters for deployment.

Filter Pruning - AutoML for model compression and mobile acceleration.

Inference Optimization - AutoML-based tool for model compression.

TencentPocketFlow

Features

Star history