# byvoid/opencc

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/byvoid-opencc).**

9,772 stars · 1,051 forks · C++ · Apache-2.0

## Links

- GitHub: https://github.com/BYVoid/OpenCC
- Homepage: https://opencc.byvoid.com/
- awesome-repositories: https://awesome-repositories.com/repository/byvoid-opencc.md

## Topics

`chinese` `chinese-conversion` `chinese-translation` `simplified-chinese` `traditional-chinese`

## Description

OpenCC is a library and command-line tool for converting text between Simplified Chinese, Traditional Chinese, and Japanese Kanji. It operates at both the individual character and multi-character phrase levels, and applies region-specific vocabulary choices for Mainland China, Taiwan, and Hong Kong during conversion.

The conversion engine resolves ambiguous character mappings using semantic and contextual rules, normalizes variant character forms for consistent orthography, and sequences multiple dictionary files into a configurable pipeline. It supports embedding custom conversion rules directly in configuration files and can integrate external C++ segmentation plugins, such as Jieba, to improve phrase boundary detection before applying conversion rules.

OpenCC provides programming language bindings for Python, Node.js, and C++, enabling direct integration into applications and scripts. It also offers a command-line interface for batch conversion with in-place file editing and custom dictionary configurations, and can transform benchmark datasets for evaluating language models across Chinese script variants.

## Tags

### Data & Databases

- [Chinese Character Simplifiers](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/text-segmentation/chinese-language-segmenters/traditional-chinese-support/chinese-character-simplifiers.md) — Converts text between Simplified Chinese, Traditional Chinese, and Japanese Kanji at both character and phrase levels. ([source](https://cdn.jsdelivr.net/gh/byvoid/opencc@master/README.md))
- [Chinese Language Segmenters](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/text-segmentation/chinese-language-segmenters.md) — Segments Chinese text into phrases to improve conversion accuracy for multi-character terms and regional expressions.
- [Segmentation-Based Converters](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/text-segmentation/chinese-language-segmenters/segmentation-based-converters.md) — Segments Chinese text into phrases before conversion to improve accuracy for multi-character terms.
- [Pre-Conversion Segmenters](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/text-segmentation/linguistic-text-segmenters/pre-conversion-segmenters.md) — Segments input text into phrases before conversion to improve accuracy for multi-character terms and regional expressions. ([source](https://github.com/BYVoid/OpenCC/blob/master/NEWS.md))

### Artificial Intelligence & ML

- [Script Conversion](https://awesome-repositories.com/f/artificial-intelligence-ml/script-conversion.md) — Provides command-line conversion between Chinese script variants with custom dictionary support.
- [Chinese Script Normalizers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-normalization-tools/chinese-script-normalizers.md) — Normalizes Chinese text by converting script variants and standardizing variant characters for NLP pipelines.
- [Chinese Script Normalization Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/text-preprocessing-pipelines/chinese-script-normalization-pipelines.md) — Normalizes Chinese text corpora by converting between script variants during NLP preprocessing. ([source](https://github.com/BYVoid/OpenCC/blob/master/PUBLICATIONS.md))

### Business & Productivity Software

- [Regional Script Conversions](https://awesome-repositories.com/f/business-productivity-software/character-mapping-tables/regional-script-conversions.md) — Applies region-specific word choices for Mainland China, Taiwan, and Hong Kong during script conversion.
- [Regional Vocabulary Selectors](https://awesome-repositories.com/f/business-productivity-software/character-mapping-tables/regional-script-conversions/regional-vocabulary-selectors.md) — Applies region-specific word choices during script conversion to match local usage in Taiwan, Hong Kong, or mainland China. ([source](https://github.com/BYVoid/OpenCC/blob/master/NEWS.md))
- [Chinese Character Disambiguations](https://awesome-repositories.com/f/business-productivity-software/contextual-ambiguity-resolutions/chinese-character-disambiguations.md) — Disambiguates characters with multiple script counterparts using semantic and contextual rules.
- [Script Ambiguity Resolvers](https://awesome-repositories.com/f/business-productivity-software/contextual-ambiguity-resolutions/script-ambiguity-resolvers.md) — Disambiguates characters that map to multiple counterparts in the other script using semantic and contextual rules. ([source](https://github.com/BYVoid/OpenCC/blob/master/doc/characters-easy-to-misuse.md))

### Development Tools & Productivity

- [Dual-Level Character and Phrase Mappings](https://awesome-repositories.com/f/development-tools-productivity/character-level-text-processing/dual-level-character-and-phrase-mappings.md) — Converts text at both individual character and multi-character phrase levels using separate dictionary layers.
- [Terminal Text Conversions](https://awesome-repositories.com/f/development-tools-productivity/cli-file-managers/metadata-descriptions-from-terminal/terminal-text-conversions.md) — Runs conversion directly from the terminal with in-place file editing and multiple dictionary configurations. ([source](https://github.com/BYVoid/OpenCC/blob/master/NEWS.md))

### Software Engineering & Architecture

- [Dictionary-Chained Conversion Pipelines](https://awesome-repositories.com/f/software-engineering-architecture/pipeline-chaining-frameworks/dictionary-chained-conversion-pipelines.md) — Sequences multiple dictionary files into a configurable conversion pipeline for script transformation.
- [Inline Conversion Rule Embeddings](https://awesome-repositories.com/f/software-engineering-architecture/contextual-validation-rules/dynamic-validation-rules/validation-rule-engines/inline-closure-rules/inline-conversion-rule-embeddings.md) — Embeds small custom conversion rules directly in configuration files without modifying external dictionaries.
- [Plugin Integrations](https://awesome-repositories.com/f/software-engineering-architecture/integration-extensibility/extensibility/third-party-plugins/plugin-integrations.md) — Integrates external C++ segmentation plugins like Jieba to improve phrase-level conversion accuracy. ([source](https://cdn.jsdelivr.net/gh/byvoid/opencc@master/README.md))
- [Segmentation Plugin Loadings](https://awesome-repositories.com/f/software-engineering-architecture/plugin-architectures/programmatic-plugin-loading/segmentation-plugin-loadings.md) — Loads an optional Jieba segmentation plugin to improve phrase boundary detection during conversion. ([source](https://github.com/BYVoid/OpenCC/blob/master/NEWS.md))
- [Segmentation Plugin Integrations](https://awesome-repositories.com/f/software-engineering-architecture/software-architecture/architectural-patterns/plugin-module-systems/modular-plugin-architectures/plugin-based-architectures/plugin-based-architectures/segmentation-plugin-integrations.md) — Loads external C++ segmentation plugins like Jieba to improve phrase boundary detection during conversion.
- [Variant Character Normalizers](https://awesome-repositories.com/f/software-engineering-architecture/string-validation-and-normalization/speech-to-text-normalizers/character-width-normalizers/variant-character-normalizers.md) — Replaces variant character forms with a standard representative character to ensure consistent orthography. ([source](https://github.com/BYVoid/OpenCC/blob/master/doc/characters-easy-to-misuse.md))

### Part of an Awesome List

- [Scripting Language Integration](https://awesome-repositories.com/f/awesome-lists/devtools/scripting-language-integration.md) — Provides Python, Node.js, and C++ bindings for integrating Chinese script conversion into applications.

### Programming Languages & Runtimes

- [Language Bindings](https://awesome-repositories.com/f/programming-languages-runtimes/language-bindings.md) — Provides Python, Node.js, and C++ bindings for programmatic Chinese script conversion. ([source](https://github.com/BYVoid/OpenCC/blob/master/NEWS.md))
- [Multi-Language Bindings](https://awesome-repositories.com/f/programming-languages-runtimes/language-interoperability/foreign-function-interfaces/native-library-integrations/c-library-bindings/multi-language-bindings.md) — Provides C++ library with Python, Node.js, and command-line bindings for Chinese script conversion.
- [Dictionary Chaining](https://awesome-repositories.com/f/programming-languages-runtimes/programming-utilities/data-text-processing/custom-dictionaries/dictionary-chaining.md) — Loads and sequences multiple dictionaries to define custom conversion paths between Chinese script variants. ([source](https://github.com/BYVoid/OpenCC/blob/master/NEWS.md))
