30 open-source projects similar to nielsrogge/transformers-tutorials, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Transformers Tutorials alternative.
Transformers.js is a JavaScript library and web machine learning framework designed to run pretrained transformer models directly in the browser. It serves as a client-side inference engine and a wrapper for the ONNX Runtime, enabling the execution of multimodal AI tasks on user devices without the need for a backend server. The library distinguishes itself by providing a unified toolkit for processing text, image, and audio data locally. This architecture supports privacy-preserving model inference and reduces latency by performing all computations on the client's hardware. Its capabilities
Fairseq is a PyTorch toolkit for sequence-to-sequence modeling, specializing in neural machine translation, automatic speech recognition, and large-scale language model training. It provides a framework for processing and aligning diverse data sources, including text, audio, and video, to support tasks such as speech-to-text conversion and multimodal sequence learning. The project is distinguished by its distributed training capabilities, which utilize parameter sharding, mixed-precision training, and CPU offloading to handle models that exceed single-device memory. It also includes specializ
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
This project is a comprehensive deep learning framework and educational platform designed for constructing, training, and evaluating neural network architectures. It provides a modular environment for building models through tensor operations and automatic differentiation, supporting a wide range of tasks from image classification and object detection to sequential data processing. Beyond its core technical capabilities, the project distinguishes itself by integrating professional career development resources directly into its learning ecosystem. It offers structured guidance, resume reviews,
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
This project is an educational course and learning curriculum for implementing and fine-tuning transformer models using the Hugging Face ecosystem. It serves as a structured guide and technical walkthrough for processing multimodal data, adapting pre-trained neural networks, and deploying models. The material includes a guide for managing, versioning, and distributing model weights and datasets through a centralized asset hub. It also provides a practical tutorial on adapting models to specific datasets using parameter-efficient methods and an implementation guide for solving natural language
This project is a generative AI educational resource and natural language processing course. It serves as a technical implementation guide for building, pre-training, and fine-tuning a large language model from scratch using PyTorch. The curriculum provides a step-by-step tutorial on large language model development, focusing specifically on the design of transformer-based text generation models. It includes dedicated instruction on parameter-efficient fine-tuning to optimize training by updating only a small subset of model weights. The material covers the end-to-end generative AI training
This project is a machine learning educational repository providing a collection of implementations and guides for machine learning and deep learning algorithms. It serves as a deep learning model library and a reference for training workflows, covering foundational machine learning, convolutional, recurrent, and transformer architectures. The collection includes a generative adversarial network suite for synthesizing realistic images and performing image-to-image translation. It also functions as a computer vision implementation guide for object detection and semantic segmentation, alongside
This project is a comprehensive collection of practical code examples and implementation libraries for machine learning. It provides a wide array of reference materials for building supervised, unsupervised, and reinforcement learning algorithms. The repository serves as a multi-domain resource, featuring specific implementation suites for financial AI, Bayesian statistical modeling, and deep learning architectures. It includes a framework for training intelligent agents using policy gradients and actor-critic models, as well as practical guides for fine-tuning transformers and utilizing larg
RF-DETR is a Python library for training and deploying object detection, instance segmentation, and keypoint detection models built on a vision transformer architecture. It provides a unified command-line interface and Python API for the full workflow, from fine-tuning pretrained checkpoints on custom datasets to running inference on images, video files, and live camera streams. The project supports training on datasets in COCO or YOLO format, with automatic format detection and configurable augmentation pipelines. Models can be exported to ONNX, TFLite, or TensorRT for deployment across edge
Swin-Transformer is a deep learning framework designed for training and deploying hierarchical vision transformer models. It serves as a research library and toolkit for computer vision tasks, providing the infrastructure to build models that replace standard convolution operations with sliding window self-attention mechanisms. By utilizing a multi-scale feature hierarchy, the framework enables the processing of visual data at varying resolutions and spatial scales. The project distinguishes itself through its implementation of shifted window partitioning, which facilitates global information
This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management. The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computation
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
This project is a collection of pre-trained machine learning models and conversion pipelines designed for running inference directly in the browser using TensorFlow.js. It provides a library of ready-to-use models for computer vision, audio classification, and natural language processing tasks. The suite includes specialized tools for transforming Python-based Keras models into JSON formats compatible with web environments. It enables the deployment of these models by fetching architectures and weight shards via HTTP for client-side execution. The project covers a broad range of capabilities
This project is a collection of educational resources and implementation frameworks providing deep learning model recipes, code samples, and step-by-step guides for computer vision tasks. It organizes complex workflows into modular recipes and implementation guides to facilitate the building of image and video analysis models. The framework focuses on specialized vision capabilities, including an image similarity framework for fast retrieval and re-ranking, human pose estimation, and video action recognition. It also provides specific tools for crowd density estimation and document image clea
Detectron2 is a PyTorch computer vision framework and visual recognition platform designed for training and deploying models for object detection, image segmentation, and visual recognition. It provides a research-oriented environment for training complex vision models with multi-GPU acceleration. The project includes a specialized object detection library for identifying and locating multiple objects via bounding boxes, as well as an image segmentation toolkit for creating pixel-level masks through instance, semantic, and panoptic segmentation. Additionally, it features a human pose estimati
Ludwig is a declarative machine learning framework designed for training neural networks and large language models using configuration files instead of manual coding. It functions as a multimodal model builder and a low-code tool for supervised fine-tuning, allowing users to build models that process mixed inputs of text, images, audio, and tabular data. The project distinguishes itself through an automated hyperparameter optimizer and a system for large language model fine-tuning using parameter-efficient adapters. It features a multimodal data pipeline and the ability to automatically gener
This project is a self-supervised vision foundation model based on a vision transformer architecture. It is designed to learn dense visual representations from unlabeled images, serving as a general-purpose backbone for a wide variety of downstream vision tasks. The system is distinguished by its use of self-distillation and masked image modeling to extract semantic and geometric features. It also incorporates an image-text alignment model that maps visual embeddings to textual descriptions, enabling zero-shot image recognition, zero-shot segmentation, and cross-modal retrieval. The project
LMFlow is a comprehensive suite for large language model fine-tuning, context extension, multimodal processing, and inference execution. It provides a toolkit for updating model parameters through full tuning or memory-efficient adapter algorithms, alongside an inference engine for executing tuned models via command-line or web-based interfaces. The framework includes a dedicated alignment suite for supervised tuning and reward model training to refine model behavior. It features a context window extender to increase maximum input lengths and a multimodal framework for building chatbots that
PaddleNLP is a development library and toolkit for training, fine-tuning, and deploying large and small language models using the PaddlePaddle framework. It provides a comprehensive suite for the entire natural language processing lifecycle, from model development to high-performance inference. The project features a standardized model zoo for loading and managing pre-trained models and tokenizers through a unified interface. It distinguishes itself with a specialized model compression framework that reduces memory footprints via weight precision conversion and lossless size optimization, alo
Oumi is a comprehensive large language model development platform designed for synthesizing data, fine-tuning models, and running performance evaluations. It serves as a unified environment for the entire model lifecycle, encompassing a training and fine-tuning suite, an evaluation framework, and tools for synthetic data generation and model distillation. The platform is distinguished by its iterative, failure-driven synthesis approach, which analyzes model weaknesses during evaluation to generate targeted training data. It utilizes an LLM-based judge framework to programmatically score respo
Yi is a bilingual language model and foundation model designed for natural language processing, reasoning, and reading comprehension in both English and Chinese. It is built as a transformer-based architecture capable of general purpose text generation and conversational tasks. The model is distinguished by its ability to function as a long context system, processing and analyzing extended input sequences up to 200k tokens. It also supports quantized versions that use low-bit precision to reduce memory footprints, enabling execution on consumer-grade hardware. The project covers a broad rang
LARK is a development toolkit for training, fine-tuning, and deploying large language models and multimodal models based on PaddlePaddle. It functions as a comprehensive framework that includes an LLM training orchestrator, an inference server, and a multimodal model framework for processing text, image, and video inputs. The project features a retrieval-augmented generation system for building conversational applications that integrate web search and private knowledge bases. It provides specific capabilities for multimodal reasoning and complex logic, enabling the extraction of structured da
HanLP is a natural language processing library and deep learning framework specifically optimized for the Chinese language, while also functioning as a multilingual text processor. It serves as a toolkit for performing linguistic analysis, semantic understanding, and script conversion. The project distinguishes itself through a dedicated focus on Chinese linguistic structures, including a specialized script converter for transforming text between Simplified Chinese, Traditional Chinese, and Pinyin. It further supports domain-specific model training to improve the recognition of professional t
YSDA course in Natural Language Processing
This project is a collection of educational resources and technical guides focused on the development and implementation of large language models. It provides a comprehensive curriculum covering transformer architectures, training methods, and deployment strategies. The materials provide detailed instructions for building autonomous agents using reasoning loops and tool integration, as well as guides for fine-tuning models through supervised learning and preference optimization. It also includes tutorials for constructing retrieval augmented generation pipelines and implementing transformer m