ToolGood.Words

ToolGood.Words - filter sensitive words and s… | Awesome Repos

Features

Character-to-Pinyin Converters - Ships a high-performance engine that converts Chinese characters into phonetic pinyin and initials to detect obfuscated language.
Obfuscation Detection - Identifies sensitive terms that use character skipping, repetitions, or case variations to bypass standard filters.
Chinese Natural Language Processing - Provides computational linguistics tools for Chinese text, including pinyin transliteration and script transformation.
Phonetic Representations - Generates pinyin letters or first-letter initials from Chinese characters to uncover hidden sensitive words.
Sensitive Word Filters - Detects and replaces banned terms in user input with asterisks or custom placeholder messages.
Chinese Script Normalizers - Standardizes Chinese text by converting between simplified and traditional scripts and adjusting character widths.
Pinyin Transliterations - Provides a comprehensive system for converting Chinese characters into Pinyin phonetic representations for content moderation.
Phonetic Dictionaries - Implements dictionary-based mapping of Chinese characters to pinyin to uncover phonetic aliases of banned words.
Chinese Character Simplifiers - Converts text between Simplified and Traditional Chinese scripts to ensure consistent sensitive word detection.
Sensitive Text Masking - Implements the replacement of detected sensitive patterns and banned words with placeholders to sanitize content.
Obfuscated Content Detection - Finds sensitive words hidden by phonetic pinyin substitutions, character repetitions, or intentional misspellings.
Substring Replacements - Replaces detected sensitive substrings with placeholder characters to sanitize the output text.
Filter Evasion Normalization - Standardizes text scripts, widths, and case to ensure consistent detection regardless of formatting tricks.
Variant Character Normalizers - Converts text between traditional and simplified Chinese character sets to ensure consistent detection across writing styles.
Regex Pattern Matching - Employs regular expressions and wildcards to identify sensitive terms that include character repetitions or spacing.
Multi-Stage Text Normalizers - Uses a sequential processing chain involving normalization and transliteration to clean and analyze text for sensitive content.
Pinyin Initialism Matchers - Matches text against a dictionary of pinyin acronyms and initials to catch obfuscated language.
Pinyin Sequence Pattern Matching - Identifies specific keywords or patterns within pinyin sequences using customizable index and splitting configurations.
Wildcard Pattern Matching - Identifies sensitive words using partial regular expressions, including dots and question marks.
Keyword Wildcard Filters - Detects sensitive words using regular expression patterns and wildcards to match various forms of a term.

Open-source alternatives to ToolGood.Words

Similar open-source projects, ranked by how many features they share with ToolGood.Words.

mozillazg/python-pinyin
mozillazg/python-pinyin
5,325View on GitHub
python-pinyin is a Python library for transliterating simplified and traditional Chinese characters into phonetic pinyin. It functions as a transliteration system that converts text while supporting tone sandhi and providing utilities to transform pinyin between different formats, such as numeric tones, accent marks, or phonetic initials. The library features a polyphonic character resolver that analyzes surrounding word context to select the correct pronunciation for characters with multiple sounds. It also includes a customizable dictionary system that allows the extension of default transl
Pythonchinesehanzihanzi-pinyin
View on GitHub5,325
zh-lx/pinyin-pro
zh-lx/pinyin-pro
4,646View on GitHub
pinyin-pro is a Chinese pinyin transcription library and text segmentation tool. It converts Chinese characters into pinyin with support for tones, initials, and finals, while resolving polyphonic characters based on context. The project includes a pinyin pattern matching engine that enables searching Chinese text using full spellings, initials, or hybrid phonetic patterns. It also features a pinyin HTML generator that wraps characters and their transcriptions in markup tags for styled web display. The library provides capabilities for Chinese text segmentation, surname pronunciation priorit
TypeScripthanzihanzi-pinyinhanzi2pinyin
View on GitHub4,646
byvoid/opencc
BYVoid/OpenCC
9,772View on GitHub
OpenCC is a library and command-line tool for converting text between Simplified Chinese, Traditional Chinese, and Japanese Kanji. It operates at both the individual character and multi-character phrase levels, and applies region-specific vocabulary choices for Mainland China, Taiwan, and Hong Kong during conversion. The conversion engine resolves ambiguous character mappings using semantic and contextual rules, normalizes variant character forms for consistent orthography, and sequences multiple dictionary files into a configurable pipeline. It supports embedding custom conversion rules dire
C++chinesechinese-conversionchinese-translation
View on GitHub9,772
isnowfy/snownlp
isnowfy/snownlp
6,631View on GitHub
SnowNLP is a Python library for Chinese natural language processing. It provides tools for text segmentation, sentiment analysis, document classification, and phonetic transliteration. The library includes capabilities for training and saving custom machine learning models for tokenization and sentiment analysis using raw training datasets. It covers a range of linguistic processing areas, including parts of speech tagging, sentence splitting, and text similarity measurement. The toolkit also provides utilities for extracting key information through text summarization and calculating word im
Python
View on GitHub6,631

See all 30 alternatives to ToolGood.Words

toolgoodToolGood.Words

Features

Open-source alternatives to ToolGood.Words

mozillazg/python-pinyin

zh-lx/pinyin-pro

BYVoid/OpenCC

isnowfy/snownlp

Star history

Open-source alternatives to ToolGood.Words

mozillazg/python-pinyin

zh-lx/pinyin-pro

BYVoid/OpenCC

isnowfy/snownlp