# facebookresearch/fasttext

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/facebookresearch-fasttext).**

26,543 stars · 4,809 forks · HTML · MIT · archived

## Links

- GitHub: https://github.com/facebookresearch/fastText
- Homepage: https://fasttext.cc/
- awesome-repositories: https://awesome-repositories.com/repository/facebookresearch-fasttext.md

## Description

fastText is a library and framework for word embedding generation, text vectorization, and supervised text classification. It provides tools to transform raw text into fixed-length vector representations and to train models that assign category labels to sentences or documents.

The system utilizes subword-based vectorization and character n-gram embeddings, allowing it to generate meaningful vectors for words that were not present during training. To manage resource usage, it includes a quantized language model implementation that employs product quantization and dimensionality reduction to decrease the memory footprint of trained models.

The project covers broader capabilities for machine learning workflows, including text classifier training, label prediction, and the generation of vectors for full sentences or paragraphs.

## Tags

### Artificial Intelligence & ML

- [Word Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings.md) — Generates mathematical vector representations of words and character n-grams to capture semantic meaning.
- [Classification Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/language-tools/text-classification/text-classifier-initializers/classification-frameworks.md) — Offers a supervised framework for assigning category labels to text using efficient linear models.
- [Text Vectorizations](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/text-vectorizations.md) — Transforms raw sentences and paragraphs into fixed-length numerical vectors for machine learning.
- [Character N-Grams](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings/skip-gram-model-architectures/n-gram-co-occurrence-models/character-n-grams.md) — Uses character n-gram embeddings to represent words and handle out-of-vocabulary terms.
- [Subword Representation Models](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings/subword-representation-models.md) — Decomposes words into subword units to generate vectors for unseen terms based on internal structure.
- [Out-of-Vocabulary Vector Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/new-word-discovery/dynamic-word-generation/out-of-vocabulary-vector-generation.md) — Generates word representations for terms missing from the training vocabulary using subword-based embeddings. ([source](https://github.com/facebookresearch/fasttext#readme))
- [Supervised Classification](https://awesome-repositories.com/f/artificial-intelligence-ml/supervised-classification.md) — Provides supervised learning capabilities to automatically assign category labels to text documents.
- [Text Vectorization Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/text-vectorization-tools.md) — Provides tools to transform raw text into fixed-length vector representations for downstream AI models.
- [Sentence Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-embeddings/sentence-embeddings.md) — Transforms full paragraphs or sentences into single fixed-size vector representations. ([source](https://github.com/facebookresearch/fasttext#readme))
- [Word Embedding Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/word-embedding-libraries.md) — Provides a comprehensive library for training vectors that represent words and character n-grams.
- [Hierarchical Softmax](https://awesome-repositories.com/f/artificial-intelligence-ml/hierarchical-softmax.md) — Implements hierarchical softmax to optimize label prediction and reduce training complexity.
- [Linear Classifiers](https://awesome-repositories.com/f/artificial-intelligence-ml/linear-regression/linear-classifiers.md) — Employs a linear model that averages word embeddings for fast text classification.
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization.md) — Reduces model memory requirements using quantization while preserving prediction functionality. ([source](https://github.com/facebookresearch/fasttext#readme))
- [Negative Sampling Techniques](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-optimizers/negative-sampling-techniques.md) — Uses negative sampling to update a small subset of weights, accelerating the training process.
- [Model Compression](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-networks/model-compression.md) — Reduces the memory footprint of text models through quantization and dimensionality reduction.
- [Quantized Model Implementations](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization/quantized-model-implementations.md) — Implements a memory-efficient language model using quantization to reduce the overall footprint.
- [Text Classifier Training](https://awesome-repositories.com/f/artificial-intelligence-ml/text-model-training/text-classifier-training.md) — Trains supervised models to categorize text and evaluates them using precision and recall. ([source](https://github.com/facebookresearch/fasttext#readme))

### Data & Databases

- [Single-Label Prediction](https://awesome-repositories.com/f/data-databases/data-categorization/classification-labelers/multi-label-classifiers/multi-label-prediction-analysis/single-label-prediction.md) — Predicts the most likely labels or probabilities for text using a trained supervised model. ([source](https://github.com/facebookresearch/fasttext#readme))
- [Product Quantization](https://awesome-repositories.com/f/data-databases/product-quantization.md) — Implements product quantization to compress large vector matrices into smaller codebooks.
- [Embedding Dimension Reduction](https://awesome-repositories.com/f/data-databases/vector-quantization/high-dimensional-vector-compressors/embedding-dimension-reduction.md) — Decreases the dimensionality of pre-trained word vectors to lower the overall model footprint. ([source](https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md))

### Part of an Awesome List

- [NLP](https://awesome-repositories.com/f/awesome-lists/ai/nlp.md) — Efficient text classification and representation learning.
- [Text Embeddings](https://awesome-repositories.com/f/awesome-lists/ai/text-embeddings.md) — Library for efficient learning of word representations and classification.
