3 repos
Utilities that convert raw text into numerical sequences or sub-word units for machine learning models.
Explore 3 awesome GitHub repositories matching artificial intelligence & ml · Tokenizers. Refine with filters or upvote what's useful.
Codex is an automated programming tool and generative code assistant designed to interpret developer intent through a natural language interface. It functions as a machine learning model trained on public code repositories to provide intelligent code completion, suggestions, and refactoring within development environme
Meilisearch is a Rust-based search engine providing typo-tolerant full-text and vector-based semantic search with real-time conversational capabilities.
nanoGPT is a lightweight engine for training and fine-tuning transformer-based language models from scratch. It provides a minimalist codebase designed for educational exploration and rapid experimentation with neural network architectures, utilizing self-attention and feed-forward layers to process sequences and predi