awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Subword Tokenization · Awesome GitHub Repositories

1 repo

Awesome GitHub RepositoriesSubword Tokenization

Methods for breaking text into subword units to handle out-of-vocabulary words and maintain consistent representations.

Distinguishing note: Focuses on the subword-level granularity of input processing.

Explore 1 awesome GitHub repository matching artificial intelligence & ml · Subword Tokenization. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Subword Tokenization

Awesome Subword Tokenization GitHub Repositories

Describe the repository you're looking for…
Find the best repos with AI.We'll search the best matching repositories with AI.
  • google-research/bert

    google-research/bert

    39,869View on GitHub↗

    This project is a transformer-based language model and natural language processing toolkit designed to generate deep contextual representations of text. By utilizing a transformer-based encoder architecture, the system processes input sequences through stacked self-attention layers to capture the semantic meaning of tokens based on their surrounding sentence structure. The model distinguishes itself through bidirectional contextual processing, which analyzes text in both directions simultaneously, and masked language modeling, which trains the system by predicting hidden tokens within a seque

    Breaks raw text into smaller units using a frequency-based vocabulary to handle out-of-vocabulary words.

    Pythongooglenatural-language-processingnatural-language-understanding
    39,869View on GitHub↗