# konsheng/sensitive-lexicon

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/konsheng-sensitive-lexicon).**

3,137 stars · 348 forks · mit

## Links

- GitHub: https://github.com/konsheng/Sensitive-lexicon
- Homepage: https://github.com/konsheng/Sensitive-lexicon
- awesome-repositories: https://awesome-repositories.com/repository/konsheng-sensitive-lexicon.md

## Description

Sensitive-lexicon is a sensitive word detection service and content moderation tool designed to identify prohibited text. It utilizes a curated lexicon of thousands of categorized terms and a fuzzy matching text scanner to detect restricted words and phrases.

The project features specialized filters for Chinese language content across political, social, and adult domains. It supports approximate string matching to identify terms that use noise characters or whitespace to evade standard keyword filters.

The system includes a network interface for hosting the detection service, allowing for real-time lexicon updates without interrupting the active process. It organizes sensitive terms into domain labels to provide context for flagged text.

## Tags

### Artificial Intelligence & ML

- [Sensitive Word Filters](https://awesome-repositories.com/f/artificial-intelligence-ml/stop-word-filters/sensitive-word-filters.md) — Implements a network interface for detecting and filtering banned terms using fuzzy matching and dynamic lexicons. ([source](https://github.com/konsheng/Sensitive-lexicon/blob/main/README.md))
- [Moderation Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings/chinese/moderation-tools.md) — Provides a specialized moderation tool for detecting sensitive and prohibited text within Chinese communications.

### Content Management & Publishing

- [Content Moderation Tools](https://awesome-repositories.com/f/content-management-publishing/content-moderation-tools.md) — Identifies and filters inappropriate or restricted text in user-generated content to maintain community safety.

### Data & Databases

- [Content Moderation Lexicons](https://awesome-repositories.com/f/data-databases/content-moderation-lexicons.md) — Ships a curated database of thousands of categorized sensitive terms to automate identification of inappropriate content.
- [Fuzzy Matching](https://awesome-repositories.com/f/data-databases/fuzzy-matching.md) — Uses fuzzy matching algorithms to identify sensitive terms that attempt to bypass filters via noise characters or whitespace.
- [Chinese Text Moderation](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/text-segmentation/chinese-language-segmenters/chinese-text-moderation.md) — Detects prohibited terms and sensitive phrasing within Chinese text across various political and social domains.

### Security & Cryptography

- [Content Moderation Filters](https://awesome-repositories.com/f/security-cryptography/content-moderation-filters.md) — Detects inappropriate text across specialized political, social, and adult domains using categorized lists. ([source](https://github.com/konsheng/Sensitive-lexicon/tree/main/Organized))
- [Chinese Language Content Filters](https://awesome-repositories.com/f/security-cryptography/content-moderation-filters/chinese-language-content-filters.md) — Offers specialized detection capabilities to identify prohibited Chinese text across multiple domains. ([source](https://github.com/konsheng/Sensitive-lexicon))

### Part of an Awesome List

- [Categorized Keyword Mappings](https://awesome-repositories.com/f/awesome-lists/devtools/regex-and-pattern-matching/domain-pattern-matching/domain-keyword-matchers/categorized-keyword-mappings.md) — Organizes sensitive terms into distinct domain labels to provide context for flagged text.

### Development Tools & Productivity

- [Lexicon Management](https://awesome-repositories.com/f/development-tools-productivity/dictionary-and-translation-tools/lexicon-datasets/lexicon-driven-analysis/lexicon-management.md) — Manages the storage and organization of restricted word lists with support for real-time updates.

### Software Engineering & Architecture

- [Finite State Machine Engines](https://awesome-repositories.com/f/software-engineering-architecture/finite-state-machine-engines.md) — Employs an Aho-Corasick finite state machine engine to search for multiple patterns simultaneously in linear time.
- [Lexicon Storage Tries](https://awesome-repositories.com/f/software-engineering-architecture/trie-data-structures/prefix-trie-filters/lexicon-storage-tries.md) — Stores sensitive terms in a prefix tree to enable fast lookups and efficient memory usage.

### System Administration & Monitoring

- [Dynamic Configuration Reloading](https://awesome-repositories.com/f/system-administration-monitoring/log-configuration/dynamic-configuration-reloading.md) — Supports updating the internal search tree from an external source in real time without restarting the service.
