# jadore801120/attention-is-all-you-need-pytorch

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/jadore801120-attention-is-all-you-need-pytorch).**

9,742 stars · 2,094 forks · Python · MIT

## Links

- GitHub: https://github.com/jadore801120/attention-is-all-you-need-pytorch
- awesome-repositories: https://awesome-repositories.com/repository/jadore801120-attention-is-all-you-need-pytorch.md

## Topics

`attention` `attention-is-all-you-need` `deep-learning` `natural-language-processing` `nlp` `pytorch`

## Description

This project is a Transformer machine translation model and attention-based neural network implemented using the PyTorch deep learning framework. It functions as a text-to-text translation tool designed to convert source sequences into target language text.

The implementation focuses on neural machine translation, covering the development of sequence-to-sequence architectures. It includes the full pipeline for translation, from text sequence preprocessing and vocabulary creation to model training and text generation inference.

The system incorporates standard transformer components such as an encoder-decoder architecture, multi-head self-attention, positional encoding, and beam search decoding. Training capabilities include label smoothing, layer normalization, and the ability to evaluate model performance on validation datasets.

## Tags

### Artificial Intelligence & ML

- [Text Translation Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/text-translation-tools.md) — Provides a complete system for translating text from a source language to a target language using AI models.
- [Transformer Architecture Implementation](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-architecture-implementation.md) — Provides a full PyTorch implementation of the Transformer architecture for sequence-to-sequence translation. ([source](https://github.com/jadore801120/attention-is-all-you-need-pytorch#readme))
- [Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms.md) — Utilizes self-attention layers to compute weighted relevance between tokens in textual sequences.
- [Encoder-Decoder Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/encoder-decoder-architectures.md) — Implements a classic encoder-decoder architecture to map source sequences to target language representations.
- [Neural Machine Translation](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/neural-machine-translation.md) — Implements a neural machine translation system to convert text between different languages. ([source](https://github.com/jadore801120/attention-is-all-you-need-pytorch#readme))
- [Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training.md) — Implements training loops that update model parameters via backpropagation on data batches. ([source](https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/train.py))
- [Multi-Head Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-head-attention-mechanisms.md) — Uses multi-head self-attention to capture diverse relational patterns within input and output sequences.
- [Text Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/text-model-training.md) — Provides the machinery for end-to-end training of text-based sequence-to-sequence models. ([source](https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/README.md))
- [Causal](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-masking/causal.md) — Employs masked multi-head attention to block subsequent tokens during decoder training.
- [Generative Text Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/generative-ai/generative-text-inference.md) — Produces translated text outputs from language models using beam search decoding.
- [Normalization Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/neural-network-layers/normalization-layers.md) — Incorporates layer normalization to stabilize training and improve convergence across transformer layers.
- [Beam Search Decoders](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/sequence-to-sequence-tasks/beam-search-decoders.md) — Implements a beam search decoder to explore multiple translation paths for optimal sequence generation.
- [Model Training Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-optimizers.md) — Optimizes translation training through hyperparameter tuning and convergence acceleration techniques.
- [Positional Encodings](https://awesome-repositories.com/f/artificial-intelligence-ml/positional-encodings.md) — Injects sine and cosine wave patterns as positional encodings to provide sequence order information.
- [PyTorch Model Components](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-model-components.md) — Implements the neural network using PyTorch building blocks and tensor operations.
- [Text Preprocessing Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/text-preprocessing-pipelines.md) — Provides a pipeline for cleaning and tokenizing raw text data to create model vocabularies. ([source](https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/README.md))
- [Text Translation Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/text-translation-inference.md) — Transforms source sequences into target language text using a trained model and beam search. ([source](https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/translate.py))

### Part of an Awesome List

- [Text Sequence Generators](https://awesome-repositories.com/f/awesome-lists/ai/sequence-to-sequence-models/text-sequence-generators.md) — Generates natural language translation sequences using a trained sequence-to-sequence model. ([source](https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/README.md))
