30 open-source projects similar to shibing624/pycorrector, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Pycorrector alternative.
This project is a Chinese automatic speech recognition framework and deep learning system designed to convert spoken Chinese audio into written text. It functions as a toolkit for training, evaluating, and deploying speech-to-text models, utilizing a specialized pinyin-to-text converter that transforms phonetic sequences into Chinese characters using a probability graph model. The system is distinguished by its deployment flexibility, offering a dockerized recognition server that provides transcription capabilities as a remote API. It supports high-performance streaming through a gRPC speech-
This is a structured deep learning curriculum for programmers, delivered as a collection of Jupyter notebooks. It teaches the fundamentals of training neural networks for computer vision, natural language processing, tabular data analysis, and collaborative filtering using PyTorch and the fastai library. The course is designed to be hands-on, guiding learners from building a training loop from scratch to fine-tuning pretrained models for a variety of practical tasks. The curriculum distinguishes itself by covering the full lifecycle of a deep learning project, from data preparation and augmen
This project is a CJK input method framework and configuration set designed for the Rime input engine. It provides a comprehensive system of schemas and dictionary packs to optimize Chinese character entry through pinyin and double-pinyin workflows. The framework is distinguished by its use of Lua-powered extensions that add dynamic utilities, such as inline mathematical calculators, automated timestamps, and text formatting, directly to the input interface. It also features refined word libraries and language models specifically tuned to improve prediction accuracy and first-choice hit rates
This is a Chinese natural language processing toolkit providing a suite of tools for word segmentation, part-of-speech tagging, and named entity recognition. It includes a neural dependency parser for analyzing syntactic and semantic relationships between words and a machine learning training suite for creating custom linguistic models using annotated datasets. The toolkit distinguishes itself through its deployment flexibility, offering a dockerized server and a web service interface that exposes processing capabilities via API. It supports the use of pretrained models and allows for the int
LanguageTool is a multilingual grammar and style checking engine designed to detect spelling, grammar, and writing errors across multiple languages. It provides automated proofreading capabilities that can be deployed as a self-hosted server or executed as a standalone local desktop application. The project distinguishes itself through a flexible rule development framework, allowing linguistic patterns to be defined via XML or implemented as custom Java classes. It utilizes n-gram frequency modeling for confused word detection and supports neural word embeddings to improve disambiguation betw
RF-DETR is a Python library for training and deploying object detection, instance segmentation, and keypoint detection models built on a vision transformer architecture. It provides a unified command-line interface and Python API for the full workflow, from fine-tuning pretrained checkpoints on custom datasets to running inference on images, video files, and live camera streams. The project supports training on datasets in COCO or YOLO format, with automatic format detection and configurable augmentation pipelines. Models can be exported to ONNX, TFLite, or TensorRT for deployment across edge
oh-my-rime is a comprehensive configuration framework and collection of optimized schemes, lexicons, and layouts for the Rime Chinese input method. It provides a system for customizing text entry behaviors, candidate list layouts, and symbol mappings across different platforms. The project distinguishes itself through an automated deployment model, utilizing a command-line tool and recipe-driven installation to apply configuration packages to system directories. It also implements advanced logic via Lua scripting, enabling features such as mathematical expression calculation, date and time in
Qwen-Image is a text-to-image model and large language model image generation framework. It functions as an AI image editing suite and a personalized image trainer, capable of producing high-fidelity visuals and accurate typography from natural language descriptions. The system is distinguished by its precision text rendering engine, which integrates multi-script calligraphy and layout-coherent alphabetic text into images. It provides specialized capabilities for subject identity preservation and consistent subject generation across different poses and viewpoints, alongside a training pipelin
This project is a deep learning face classification system that detects human faces and classifies gender and emotion. It utilizes convolutional neural networks and computer vision tools to analyze facial attributes in both static images and live video streams. The system includes specialized classifiers for emotions based on the FER2013 dataset and gender based on IMDB datasets. These models are integrated into a containerized web service, allowing the classification logic to be exposed as an API that processes image data via network requests. The technical surface covers the entire pipelin
This project is a cloud-based AI deployment system and latent diffusion model trainer. It provides a framework for launching image generation interfaces and training pipelines on remote GPU infrastructure, specifically serving as a text-to-image model fine-tuner. The system features a specialized training interface for fine-tuning Stable Diffusion models on custom image datasets. It allows for the creation of personalized visual outputs by training models on specific subjects or artistic styles using a small set of reference images. The software covers generative AI deployment, custom style
MockingBird is an AI voice cloning tool and text-to-speech system designed to generate synthetic speech. It functions as a voice synthesis trainer for building custom models from audio datasets, a command-line generator for producing audio files, and a text-to-speech server for remote application integration. The project specializes in real-time voice cloning, which extracts vocal characteristics from short audio samples to mimic a target speaker's unique timbre. It utilizes reference-driven audio synthesis to condition pre-trained models on specific audio samples, allowing for the generation
PostgresML is a machine learning database extension for PostgreSQL that integrates model training and inference directly into the database. It functions as an in-database AI platform and vector database, enabling the execution of large language models and natural language processing tasks on stored records without exporting data to external services. The system distinguishes itself by utilizing GPU acceleration to minimize latency during model predictions and employing a hybrid storage engine that maintains relational data alongside high-dimensional vectors. It allows for the building and fin
ml5-library is a JavaScript machine learning library that functions as a browser-based inference engine. It provides a high-level wrapper for implementing neural networks and data models, allowing users to execute machine learning predictions directly on the client side. The library simplifies the integration of machine learning into web applications and creative coding projects by removing the requirement for deep mathematical expertise. It specifically enables web-based image classification through the use of pretrained deep learning models to identify and label objects within images. The
Mistral Inference is a library for running Mistral large language models on a GPU, generating text from prompts with token streaming. It loads pretrained model weights from local disk or a remote registry into GPU memory, then produces output tokens one by one for real-time display in interactive applications. The library supports multimodal prompts that accept image URLs alongside text, enabling visual description and reasoning. It includes content safety guardrails that scan generated text against predefined policies to block or flag policy violations. For structured interactions, it provid
Darkflow is an object detection framework and computer vision pipeline that provides a programmatic interface for performing real-time image analysis and object identification. It functions as a tool for loading weights, fine-tuning models, and executing inference on both static images and video feeds. The project serves as a converter that translates Darknet configurations and weights into TensorFlow graphs to enable retraining and deployment. It includes a model exporter that saves trained graphs into portable protobuf files for use on mobile and native devices. The system covers capabilit
This project is a comprehensive educational curriculum and structured learning path covering the full lifecycle of large language models. It provides a guided progression through the theory, architecture, training, and deployment of these models. The curriculum includes specialized guides on transformer architecture, model training tutorials, and frameworks for designing autonomous agents. It also provides dedicated resources for studying model safety and ethics. The material covers a wide range of technical capabilities, including distributed training strategies, parameter-efficient fine-tu
SimSwap is a deep learning face swapping framework and computer vision media processor built with PyTorch. It functions as an image synthesis tool designed to replace a person's identity in images and videos with a target face using a single trained model. The system operates as a video identity replacement tool that swaps identities across frames while preserving the original expressions and lighting of the source media. It enables digital identity manipulation and the production of synthetic media through automated facial feature mapping. The framework supports both the application of trai
Lama is an image restoration framework and deep learning model designed for image inpainting and object removal. It provides the tools necessary to train and evaluate neural networks that fill masked areas and repair corrupted visual data. The system utilizes a Fourier convolution neural network to maintain global image structure and reconstruct periodic patterns. This architecture allows for resolution-independent inference, enabling the processing of high-resolution images without increasing memory or computational requirements. The project includes a synthetic dataset generator that creat
This project is a deep learning text-to-speech toolkit used for training and deploying neural speech synthesis models. It provides a comprehensive framework for converting written text into spoken audio, utilizing neural vocoders to transform synthesized spectrograms into high-fidelity audio waveforms. The toolkit includes a voice cloning system that replicates specific human voices by extracting speaker embeddings from short audio samples. It also supports multi-speaker audio synthesis, allowing the generation of speech across different vocal identities using specialized model architectures.
Azure Docs is the official technical documentation repository for Microsoft Azure, the cloud computing platform. It provides comprehensive guidance on the full spectrum of Azure services, covering everything from core infrastructure components like virtual machines, Kubernetes clusters, and serverless computing to platform services for AI, machine learning, data analytics, and storage. The documentation details how to provision, manage, and govern cloud resources at scale, including policy enforcement, identity management, and cost optimization. The documentation distinguishes Azure through i
This project provides a cloud-based notebook configuration for deploying a Stable Diffusion web interface. It functions as a specialized environment for image generation, incorporating a model trainer for fine-tuning weights and creating training datasets. The system emphasizes infrastructure persistence by saving software installations and model files to cloud storage, avoiding repetitive setups between sessions. It uses a tunnel-based interface to expose the web dashboard to a public URL for remote interaction. The project covers end-to-end AI workflows, including dataset preparation and t
DeepDanbooru is a deep learning tool for tagging anime-style images with Danbooru-style tags. It uses a pre-trained convolutional neural network to analyze images and predict tags identifying characters, attributes, and artwork details. The project provides a complete pipeline for training custom tag recognition models. Users can prepare datasets by downloading tag definitions from a remote Danbooru server using authenticated API requests, then store image-tag pairs in a structured SQLite database. The training workflow supports filtering datasets by rating or score criteria, configuring hype
seed-vc is an AI voice conversion tool and voice cloning system designed to transform the timbre, accent, and emotion of speech recordings. It provides a framework for replicating specific speaker identities and singing styles using short reference audio samples. The project includes a voice fine-tuning framework for training models on custom audio datasets to increase the accuracy of voice clones. It also features speech anonymization tools that remove unique speaker traits to produce a generic average voice for identity protection. The system covers a broad range of audio processing capabi
pkuseg-python is a Chinese word segmentation toolkit and natural language processing library. It provides specialized models for splitting Chinese text into words across various domains, including news, medical, and web content, and includes a tool for assigning grammatical parts of speech tags to segmented words. The library allows for the training of custom segmentation models using annotated datasets and supports the integration of user-defined dictionaries to ensure specialized terminology is recognized correctly. It employs a multi-threaded execution engine to process large volumes of Ch
SwiftOCR is a native Swift library designed for extracting text and alphanumeric characters from images. It functions as a neural network text recognizer that identifies characters and strings from visual data. The library includes a custom OCR model trainer and tools for custom font recognition. These capabilities allow for the generation of specialized neural networks tailored to specific fonts and character sets to improve recognition accuracy. The system utilizes connected-component labeling to identify individual character regions and employs image processing to convert short alphanumer
The official PyTorch implementation of Google's Gemma models
This project is a PyTorch sentiment analysis tutorial and a deep learning implementation for analyzing text. It provides a natural language processing sequence classification pipeline designed to clean text data and train neural networks to categorize sequences of words. The implementation focuses on adapting pretrained language models for specific text classification tasks using custom datasets. It includes a process for fine-tuning large-scale language models and implementing recurrent networks and transformers for emotional tone detection. The project covers the broader surface of text se
This project is a comprehensive software suite for voice synthesis and model management, providing a framework for training custom acoustic models and performing voice conversion. It utilizes deep-learning-based acoustic modeling to map source audio characteristics to target voice identities, enabling the transformation of input audio into specific vocal profiles. The system distinguishes itself through a feature-retrieval-based inference mechanism, which employs vector index files to perform nearest-neighbor searches on acoustic features for high-fidelity timbre matching. Users can manage th
Second-Me is a framework for orchestrating local agent tasks and fine-tuning personal language models. It provides a system for training specialized assistants on local datasets to support custom knowledge retrieval and task execution requirements. The project distinguishes itself through a modular architecture that manages the lifecycle of machine learning tasks. It includes a state manager that persists intermediate training progress to local storage, allowing for the interruption and resumption of long-running configuration processes. Furthermore, the system utilizes standardized protocols
Magenta is a comprehensive toolkit for training, synthesizing, and performing music through neural models and hardware-integrated engines. It functions as a machine learning framework that enables the generation, manipulation, and real-time performance of audio, providing the structural foundations for musical intelligence through hierarchical sequence modeling and symbolic processing. The project distinguishes itself by enabling real-time, low-latency neural audio synthesis that can be integrated directly into professional digital audio workstations. It supports interactive musical jamming a