# microsoft/Swin-Transformer

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/microsoft-swin-transformer).**

15,715 stars · 2,212 forks · Python · mit

## Links

- GitHub: https://github.com/microsoft/Swin-Transformer
- Homepage: https://arxiv.org/abs/2103.14030
- awesome-repositories: https://awesome-repositories.com/repository/microsoft-swin-transformer.md

## Topics

`ade20k` `image-classification` `imagenet` `mask-rcnn` `mscoco` `object-detection` `semantic-segmentation` `swin-transformer`

## Description

Swin-Transformer is a deep learning framework designed for training and deploying hierarchical vision transformer models. It serves as a research library and toolkit for computer vision tasks, providing the infrastructure to build models that replace standard convolution operations with sliding window self-attention mechanisms. By utilizing a multi-scale feature hierarchy, the framework enables the processing of visual data at varying resolutions and spatial scales.

The project distinguishes itself through its implementation of shifted window partitioning, which facilitates global information flow across image patches while maintaining linear computational complexity. It supports advanced scaling techniques, including mixture-of-experts architectures, to increase model capacity without a proportional rise in inference costs. These capabilities are complemented by a robust suite of tools for self-supervised representation learning, allowing for the extraction of visual features from unlabeled data.

The framework provides comprehensive support for distributed deep learning, enabling the parallelization of training across multiple graphics cards and compute nodes. It includes built-in optimizations such as mixed precision training and gradient checkpointing to manage memory consumption and accelerate throughput during large-scale experiments. Users can also perform fine-tuning on pre-trained models, apply feature distillation, and manage complex training schedules through configurable hyperparameters.

The repository includes scripts and configuration utilities to support image classification, object detection, and semantic segmentation workflows. It is designed to be installed as a Python-based library, offering a modular approach to defining model architectures and executing distributed training routines.

## Tags

### Artificial Intelligence & ML

- [Object Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-detection-tracking/object-detection.md) — Identifies and outlines specific items within images or video frames to support precise instance segmentation and localization tasks. ([source](https://github.com/microsoft/Swin-Transformer#readme))
- [Image Segmentation](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/image-segmentation.md) — Assigns a specific class to every pixel in an image to provide detailed scene understanding and environmental mapping. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/README.md))
- [Computer Vision Training](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-training.md) — Provides standardized training routines for transformer-based computer vision models on large-scale infrastructure. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md))
- [Distributed Deep Learning Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-deep-learning-frameworks.md) — Scaling the training of complex vision models across multiple graphics cards and compute nodes to accelerate convergence and handle massive workloads.
- [Transformer-Based Image Classifiers](https://awesome-repositories.com/f/artificial-intelligence-ml/image-classification/transformer-based-image-classifiers.md) — Categorizes images into predefined groups using hierarchical models trained on large datasets to ensure accurate identification of objects and scenes. ([source](https://github.com/microsoft/Swin-Transformer#readme))
- [Vision Transformer Pre-training](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-model-training/vision-transformer-pre-training.md) — Trains models on large datasets using masked image modeling to learn visual representations before applying them to specific downstream tasks. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/main_simmim_pt.py))
- [Computer Vision](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/computer-vision.md) — Provides a comprehensive library for training and deploying hierarchical vision transformer models for classification, detection, and segmentation tasks.
- [Attention-Based Replacements](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/neural-network-layers/convolution-layers/attention-based-replacements.md) — "Processes image patches within local windows that shift across layers to capture multi-scale features while maintaining linear computational complexity." ([source](https://github.com/microsoft/Swin-Transformer/tree/LR-Net))
- [Vision Transformers](https://awesome-repositories.com/f/artificial-intelligence-ml/vision-transformers.md) — Implements hierarchical transformer models with sliding window attention mechanisms for advanced computer vision tasks. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/configs/swin/swin_small_patch4_window7_224.yaml))
- [Data-Parallel Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/data-parallel-training.md) — Synchronizes model gradients and parameters across multiple compute nodes to accelerate training throughput on massive image datasets.
- [Hierarchical Feature Pyramids](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-scaling/resolution-scaling/hierarchical-feature-pyramids.md) — "Constructs a pyramid of visual representations by progressively merging image patches to model objects at varying spatial resolutions and scales."
- [Mixed Precision Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/mixed-precision-training.md) — Utilizes lower-bit floating point arithmetic during forward and backward passes to reduce memory consumption and increase computational speed.
- [Mixture of Experts](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-customization/mixture-of-experts.md) — Scales model capacity by routing input tokens to specialized sub-networks, allowing for high performance without increasing the cost of every inference.
- [Shifted Window Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/shifted-window-attention-mechanisms.md) — "Alternates between different window configurations in successive layers to enable cross-window connections and facilitate global information flow across the image."
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training.md) — Implements parallel processing techniques to scale model training across multiple devices and compute nodes. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/main_simmim_ft.py))
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Provides infrastructure for parallelizing model training across multiple compute nodes and graphics cards. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/main_simmim_pt.py))
- [Gradient Checkpointing](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-checkpointing.md) — Reduces memory consumption by recomputing intermediate activations during the backward pass instead of storing them.
- [Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning.md) — Adapt pre-trained vision backbones to specific datasets by adjusting weights through additional training cycles on target data. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/MODELHUB.md))
- [Self-Supervised Embedding Trainers](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings/self-supervised-embedding-trainers.md) — Extracts visual representations from raw data using self-supervised techniques to build robust models without requiring manually annotated datasets. ([source](https://github.com/microsoft/Swin-Transformer#readme))
- [Self-Supervised Vision Representation Trainers](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings/self-supervised-embedding-trainers/self-supervised-vision-representation-trainers.md) — Extracting robust visual features from raw, unlabeled image data to build foundational models without requiring extensive manual annotation for every task.
- [Training Checkpointing](https://awesome-repositories.com/f/artificial-intelligence-ml/training-checkpointing.md) — Writes model weights, optimizer states, and training metadata to disk at regular intervals to ensure progress is preserved. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/utils_simmim.py))
- [Training Memory Management](https://awesome-repositories.com/f/artificial-intelligence-ml/training-memory-management.md) — Lower memory consumption during the training of large-scale vision models by applying gradient checkpointing and mixed precision techniques to optimize hardware resource utilization. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/main_moe.py))
- [Vision Model Loaders](https://awesome-repositories.com/f/artificial-intelligence-ml/vision-model-loaders.md) — Provide access to a variety of pre-trained transformer-based vision models for tasks like image classification, supporting multiple architectures and scaling configurations. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/MODELHUB.md))
- [Knowledge Distillation](https://awesome-repositories.com/f/artificial-intelligence-ml/knowledge-distillation.md) — Transfers knowledge from large, high-capacity models into smaller, efficient architectures to maintain high performance while reducing computational resources. ([source](https://github.com/microsoft/Swin-Transformer#readme))
- [Learning Rate Schedulers](https://awesome-repositories.com/f/artificial-intelligence-ml/learning-rate-schedulers.md) — Specify optimization parameters such as epoch counts and learning rate warmups to control the convergence behavior of models during the learning process. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/configs/swin/swin_base_patch4_window12_384_22kto1k_finetune.yaml))
- [Vision Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/fine-tuning-and-alignment/fine-tuning-frameworks/vision-model-fine-tuning.md) — Enables fine-tuning of pre-trained vision transformer models for specific classification tasks. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/main_simmim_ft.py))
- [Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/model-fine-tuning.md) — Supports adapting pre-trained vision models to new tasks or resolutions through continued training on target data. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md))
- [Model Parameter Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/model-parameter-configurations.md) — Supports adjustment of architectural settings like embedding dimensions and layer depths. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/configs/swin/swin_large_patch4_window12_384_22kto1k_finetune.yaml))
- [Transformer Architecture Configurators](https://awesome-repositories.com/f/artificial-intelligence-ml/model-parameter-configurations/transformer-architecture-configurators.md) — Allows definition of hierarchical transformer parameters to optimize feature extraction. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/configs/swin/swin_base_patch4_window12_384_22kto1k_finetune.yaml))
- [Model Training Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-optimizers.md) — Decrease memory consumption and training time by using automatic mixed precision and gradient accumulation to improve the efficiency of the model fine-tuning process. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/main_simmim_ft.py))
- [Training Checkpointers](https://awesome-repositories.com/f/artificial-intelligence-ml/training-checkpointers.md) — Detects and loads the latest checkpoint from a storage directory to continue training automatically after an unexpected interruption. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/main_simmim_pt.py))
- [Training Loop Schedulers](https://awesome-repositories.com/f/artificial-intelligence-ml/training-loop-schedulers.md) — Reloads model weights, optimizer settings, and learning rate schedules from a saved file to resume interrupted training sessions. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/utils_simmim.py))
- [Video Object Tracking](https://awesome-repositories.com/f/artificial-intelligence-ml/video-object-tracking.md) — Analyzes temporal sequences in video data to identify and classify human movements or specific events occurring over time. ([source](https://github.com/microsoft/Swin-Transformer#readme))
- [Model Evaluation Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/model-evaluation-frameworks.md) — Supports distributed inference and validation across multiple devices to measure model performance. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/main_moe.py))
- [Optimizer Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/optimizer-configurations.md) — Selects and initializes training optimizers while automatically excluding specific parameters like biases or normalization layers from weight decay. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/optimizer.py))
- [Resource-Efficient Model Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/resource-efficient-model-inference.md) — Optimizing transformer-based vision architectures to balance high predictive performance with reduced computational resource requirements for inference and production environments.
- [Training Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/training-optimizations.md) — Improve training efficiency and reduce memory consumption by using gradient accumulation, gradient checkpointing, and memory-based dataset caching to handle larger workloads on limited hardware. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md))
- [Vision Model Evaluation](https://awesome-repositories.com/f/artificial-intelligence-ml/vision-model-evaluation.md) — Measure accuracy and throughput of vision models on validation datasets to verify predictive capabilities and computational efficiency. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md))
- [Training Hyperparameter Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/training-configuration-management/training-hyperparameter-configurations.md) — Set hyperparameters including learning rate schedules and regularization techniques to refine the training process for transformer-based vision models. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/configs/swinv2/swinv2_base_patch4_window12to16_192to256_22kto1k_ft.yaml))
- [Training Hyperparameters](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-hyperparameters.md) — Allows configuration of training hyperparameters like learning rate schedules and epoch counts. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/configs/swin/swin_small_patch4_window7_224_22kto1k_finetune.yaml))
- [Training Progress Monitors](https://awesome-repositories.com/f/artificial-intelligence-ml/training-progress-monitors.md) — Record execution events and status updates to the console and persistent files to track model training progress across distributed processes. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/logger.py))

### Data & Databases

- [Training Memory Optimizers](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers.md) — Reduce memory footprint during model training by applying gradient checkpointing, fused operations, and efficient data caching strategies to keep resource consumption within hardware limits. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/main.py))

### Networking & Communication

- [Distributed Training Metric Aggregators](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/hierarchical-metric-aggregation/distributed-training-metric-aggregators.md) — Combine numerical values across multiple compute nodes to calculate global statistics and performance indicators during distributed training operations. ([source](https://github.com/microsoft/Swin-Transformer/blob/main/utils_simmim.py))