2 مستودعات
High-throughput processing of massive text datasets using multi-threaded execution.
Distinct from Large-Scale Text Handling: The candidates focus on memory-efficient string handling or UI rendering, not multi-threaded execution engines for bulk file processing.
Explore 2 awesome GitHub repositories matching data & databases · Parallel Text Processing. Refine with filters or upvote what's useful.
pkuseg-python is a Chinese word segmentation toolkit and natural language processing library. It provides specialized models for splitting Chinese text into words across various domains, including news, medical, and web content, and includes a tool for assigning grammatical parts of speech tags to segmented words. The library allows for the training of custom segmentation models using annotated datasets and supports the integration of user-defined dictionaries to ensure specialized terminology is recognized correctly. It employs a multi-threaded execution engine to process large volumes of Ch
Employs a multi-threaded execution engine to process large volumes of Chinese text files in parallel.
This repository is a comprehensive educational program and deep learning framework designed to teach practical deep learning using PyTorch through notebooks and code examples. It serves as a high-level library for building, training, and deploying neural networks, acting as a model training orchestrator that coordinates PyTorch models, optimizers, and loss functions. The project provides specialized toolkits for computer vision, natural language processing, and tabular data preprocessing. It distinguishes itself through advanced training controls such as discriminative learning rates, a two-w
Distributes tokenization tasks across multiple CPU workers to accelerate processing of large text datasets.