30 open-source projects similar to s3nh/pytorch-text-recognition, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Pytorch Text Recognition alternative.
High-level batteries-included neural network training library for Pytorch
YOLO-World is a vision-language framework and open-vocabulary object detection model. It identifies objects in images and video based on free-form text prompts without requiring predefined category labels. The system enables the identification of arbitrary objects by fusing image features with text embeddings. It includes a specialized tool for automated image labeling, which generates bounding box annotations for custom datasets using text-based prompts. The project provides a deployment pipeline for converting models into quantized ONNX and TFLite formats, supporting real-time inference on
Albumentations is an image augmentation library and computer vision preprocessing tool designed to expand datasets for deep learning models. It provides a collection of transformations that modify pixel values and spatial geometry to increase the diversity of training samples and improve model generalization. The library supports both 2D image augmentation and 3D volumetric data augmentation. It handles a variety of labels alongside images, ensuring that bounding boxes, keypoints, and segmentation masks remain accurately aligned when spatial transformations are applied. The tool incorporates
Albumentations is a computer vision image augmentation library designed to increase training data diversity for deep learning models. It provides a toolset for applying geometric and color transformations to images and annotations, including a specialized collection of 3D operations for volumetric data used in medical and scientific imaging. The library functions as an image mask and bounding box transformer, automatically updating masks, bounding boxes, and keypoints when images undergo geometric changes. This ensures that spatial alterations remain synchronized across images and their assoc
imgaug is a Python library for machine learning data augmentation and computer vision dataset expansion. It provides tools to increase the volume and variety of training sets by applying random geometric, color, and noise transformations to images. The library ensures spatial consistency by synchronizing transformations across images and their associated annotations, such as bounding boxes, keypoints, and segmentation maps. It uses a compositional pipeline pattern to chain multiple augmentations into sequences and employs deterministic seed management to reproduce specific data samples. The
Darknet is a high-performance C-based inference engine and computer vision library designed for real-time object identification and localization. It serves as a neural network framework for training and deploying detection models using the YOLO architecture, providing a toolset for deep learning training and deployment. The project differentiates itself through a C and CUDA implementation that enables hardware acceleration for matrix multiplication and inference speed optimization. It provides a shared library interface for embedding detection capabilities into external applications and suppo
An all-in-one toolkit for computer vision
FLAME (Fire Luminosity Airborne-based Machine learning Evaluation) Dataset
This is a PyTorch object detection framework that implements the Single Shot MultiBox Detector for identifying and localizing multiple objects within images and video. The project provides a neural network architecture designed for single-shot object detection, which predicts bounding boxes and class labels in one pass. The implementation includes a real-time object detector capable of processing live video streams to track and label objects across sequential frames. It also features a complete computer vision training pipeline for preparing image datasets and training model weights. The fra
Clojure wrapper for the Tesseract OCR software
Fast Neural Style for Image Style Transform by Pytorch
Pixel-wise segmentation on the VOC2012dataset dataset using pytorchpytorch.
DroneAid uses machine learning to detect calls for help on the ground placed by those in need. At the heart of DroneAid is a Symbol Language that is used to train a visual recognition model. That model analyzes video from a drone to detect and count specific images. A dashboard can be used to…
An open-source application for biological image analysis
A computer vision closed-loop learning platform where code can be run interactively online. 学习闭环《计算机视觉实战演练:算法与应用》中文电子书、源码、读者交流社区(持续更新中 ...) 📘 在线电子书 https://charmve.github.io/computer-vision-in-action/ 👇项目主页
DroneNet is Joseph Redmon's YOLO real-time object detection system retrained on 2664 images of DJI drones, labeled. The original and labeled images used for retraining can be found under the image and label folders respectively.
OpenPose is a real-time pose estimation engine designed to detect and track human body, face, hand, and foot landmarks. It functions as a multi-person motion tracker, identifying the spatial coordinates of multiple individuals simultaneously within video streams or static images. Beyond two-dimensional detection, the software acts as a three-dimensional kinematics processor, reconstructing spatial movement data from single or multiple synchronized camera perspectives. The system distinguishes itself through a bottom-up approach that utilizes part-affinity fields to associate body parts across
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of images from text prompts and the transformation of existing visual inputs based on semantic instructions. The architecture utilizes a modular execution environment that decouples model loading, scheduler logic, and inference components to support diverse hardware configurations. I
Fine-tune pretrained Convolutional Neural Networks with PyTorch
pytorch implementation of fast-neural-style
Yolact is a computer vision framework and real-time instance segmentation model. It utilizes a fully convolutional neural network to detect objects and generate pixel-level masks for images and video feeds. The system employs prototypical mask generation to create global mask prototypes that are linearly combined for instance-specific results. It incorporates deformable convolutional layers and deformable region-of-interest pooling to adapt spatial sampling to the irregular shapes of objects. The framework covers the full model development lifecycle, including training on custom datasets, ac
InsightFace is a comprehensive deep learning framework designed for face recognition, biometric identity verification, and feature extraction. It provides a specialized engine for one-to-one verification and one-to-many identification tasks, utilizing convolutional neural networks to transform raw image pixels into high-dimensional vector embeddings. The project includes a complete toolkit for detecting, aligning, and processing facial data to ensure consistent identity discrimination. Beyond core recognition, the platform distinguishes itself through an extensive model management and optimiz
An efficient video loader for deep learning with smart shuffling that's super easy to digest
This is a PyTorch-based computer vision library for detecting 2D and 3D facial landmark coordinates. It functions as a facial landmark detector and reconstruction tool, utilizing deep learning to identify precise geometric points on human faces from image datasets. The library allows for the selection of specific detection backends to balance accuracy and processing speed. It supports the integration of precomputed bounding box files, which enables the system to bypass the initial detection phase and proceed directly to landmark extraction. The toolkit includes capabilities for batch image p