# hankcs/hanlp

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/hankcs-hanlp).**

36,413 stars · 10,924 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/hankcs/HanLP
- Homepage: https://www.hanlp.com/
- awesome-repositories: https://awesome-repositories.com/repository/hankcs-hanlp.md

## Topics

`dependency-parser` `hanlp` `named-entity-recognition` `natural-language-processing` `nlp` `pos-tagging` `semantic-parsing` `text-classification`

## Description

HanLP is a natural language processing library and deep learning framework specifically optimized for the Chinese language, while also functioning as a multilingual text processor. It serves as a toolkit for performing linguistic analysis, semantic understanding, and script conversion.

The project distinguishes itself through a dedicated focus on Chinese linguistic structures, including a specialized script converter for transforming text between Simplified Chinese, Traditional Chinese, and Pinyin. It further supports domain-specific model training to improve the recognition of professional terminology within specialized datasets.

Its broader capabilities cover information extraction via named entity recognition and text summarization, as well as comprehensive linguistic analysis including part-of-speech tagging and dependency syntax parsing. The toolkit also provides semantic analysis for sentiment detection and coreference resolution, alongside text transformation utilities for grammar and style conversion.

## Tags

### Artificial Intelligence & ML

- [Natural Language Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing.md) — Provides a comprehensive toolkit for natural language processing specifically optimized for Chinese linguistic structures.
- [Text Tokenizers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-tokenizers.md) — Provides tools and algorithms for segmenting raw text into discrete tokens using linguistic and rule-based strategies. ([source](https://github.com/hankcs/hanlp#readme))
- [Chinese NLP Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/chinese-nlp-libraries.md) — Serves as a comprehensive natural language processing library specifically optimized for Chinese linguistic structures.
- [Deep Learning NLP Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/deep-learning-nlp-frameworks.md) — Utilizes neural networks for advanced text analysis, sentiment detection, and semantic understanding.
- [Dependency Syntax Analyzers](https://awesome-repositories.com/f/artificial-intelligence-ml/dependency-syntax-analyzers.md) — Links words through directed edges to represent grammatical dependencies and structural relationships within a sentence.
- [Textual Entity Extractors](https://awesome-repositories.com/f/artificial-intelligence-ml/entity-extraction-pipelines/textual-entity-extractors.md) — Provides automated processes for identifying and categorizing people, organizations, and locations within unstructured text. ([source](https://github.com/hankcs/hanlp#readme))
- [Part-of-Speech Taggers](https://awesome-repositories.com/f/artificial-intelligence-ml/part-of-speech-taggers.md) — Assigns grammatical labels to words based on context and linguistic rules. ([source](https://github.com/hankcs/hanlp#readme))
- [Script Conversion](https://awesome-repositories.com/f/artificial-intelligence-ml/script-conversion.md) — Transforms text between Pinyin, Simplified Chinese, and Traditional Chinese characters using script conversion rules. ([source](https://github.com/hankcs/hanlp#readme))
- [Script Converters](https://awesome-repositories.com/f/artificial-intelligence-ml/script-converters.md) — Transforms text between Simplified Chinese, Traditional Chinese, and Pinyin phonetic representations.
- [Semantic Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/semantic-analysis.md) — Evaluates text meaning through similarity calculations, coreference resolution, and semantic role labeling.
- [Semantic Analysis Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/semantic-analysis-tools.md) — Provides tools for dependency parsing, coreference resolution, and semantic role labeling to extract deep meaning.
- [Syntactic Parsers](https://awesome-repositories.com/f/artificial-intelligence-ml/syntactic-parsers.md) — Deconstructs sentences into hierarchical phrases and nested structures to reveal underlying grammatical organization.
- [Constituent Syntax Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/constituent-syntax-analysis.md) — Deconstructs sentences into hierarchical phrases and clauses to reveal the underlying grammatical structure. ([source](https://github.com/hankcs/hanlp#readme))
- [Deep Learning Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/deep-learning-architectures.md) — Integrates neural network architectures to perform complex linguistic tasks such as entity recognition and syntactic parsing.
- [Dependency Syntax Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/dependency-syntax-analysis.md) — Maps the grammatical relationships and dependencies between individual words within a sentence. ([source](https://github.com/hankcs/hanlp#readme))
- [Information Extraction](https://awesome-repositories.com/f/artificial-intelligence-ml/information-extraction.md) — Implements techniques for extracting structured data and key information from unstructured text. ([source](https://github.com/hankcs/hanlp#readme))
- [Keyword and Phrase Extraction](https://awesome-repositories.com/f/artificial-intelligence-ml/information-extraction/keyword-and-phrase-extraction.md) — Isolates the most important words and key phrases that represent the primary topic of a document. ([source](https://github.com/hankcs/hanlp#readme))
- [Language Detection Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/language-detection-tools.md) — Provides utilities for identifying the specific language of provided text content. ([source](https://github.com/hankcs/hanlp#readme))
- [Linguistic Pattern Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/linguistic-pattern-analysis.md) — Performs tokenization, part-of-speech tagging, and entity recognition across multiple languages to decode linguistic patterns. ([source](https://github.com/hankcs/hanlp#readme))
- [Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning.md) — Supports adapting pre-trained models to specialized datasets to improve the recognition of professional terminology.
- [Text Classification](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/language-tools/text-classification.md) — Groups documents into categories or clusters based on content and meaning to organize large bodies of text.
- [Multilingual Text Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/multilingual-text-processing.md) — Provides a system for tokenization, part-of-speech tagging, and named entity recognition across multiple languages.
- [Text Summarization](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/nlp-applications/text-summarization.md) — Provides automated methods for condensing long documents into concise summaries. ([source](https://github.com/hankcs/hanlp#readme))
- [Semantic Similarity Calculation](https://awesome-repositories.com/f/artificial-intelligence-ml/semantic-analysis-tools/semantic-similarity-calculation.md) — Calculates the semantic relationship between two texts to determine how closely they relate in meaning. ([source](https://github.com/hankcs/hanlp#readme))
- [Sentiment Analysis Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/sentiment-analysis-tools.md) — Classifies the emotional tone of text as positive, negative, or neutral. ([source](https://github.com/hankcs/hanlp#readme))
- [Text Summarization](https://awesome-repositories.com/f/artificial-intelligence-ml/text-summarization.md) — Condenses long documents into concise summaries and extracts key phrases to isolate important information.
- [Vector Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-embeddings.md) — Represents text as high-dimensional vectors to calculate mathematical similarity between different pieces of content.

### Part of an Awesome List

- [Coreference Resolution](https://awesome-repositories.com/f/awesome-lists/ai/coreference-resolution.md) — Provides tools for resolving references in text to identify when different phrases refer to the same entity. ([source](https://github.com/hankcs/hanlp#readme))
- [Domain Specific Models](https://awesome-repositories.com/f/awesome-lists/ai/domain-specific-models.md) — Supports training deep learning models on specialized datasets to recognize professional domain terminology. ([source](https://github.com/hankcs/hanlp#readme))
- [NLP Frameworks](https://awesome-repositories.com/f/awesome-lists/devtools/nlp-frameworks.md) — Multilingual library for advanced natural language processing.

### Programming Languages & Runtimes

- [Custom Dictionaries](https://awesome-repositories.com/f/programming-languages-runtimes/programming-utilities/data-text-processing/custom-dictionaries.md) — Allows defining custom word lists to force, merge, or correct how text is split into tokens. ([source](https://github.com/hankcs/hanlp#readme))
