# infinilabs/analysis-ik

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/infinilabs-analysis-ik).**

17,466 stars · 3,279 forks · Java · Apache-2.0

## Links

- GitHub: https://github.com/infinilabs/analysis-ik
- awesome-repositories: https://awesome-repositories.com/repository/infinilabs-analysis-ik.md

## Topics

`analyzer` `easysearch` `elasticsearch` `ik-analysis` `java` `opensearch`

## Description

Analysis-ik is a Chinese text segmenter and analysis plugin for Lucene-based search engines. It provides a specialized analyzer for splitting Chinese sentences into meaningful words to improve indexing and search accuracy within Elasticsearch and OpenSearch.

The project features a dynamic dictionary manager that can load word libraries and stop-word files from remote HTTP endpoints. It monitors metadata headers on these remote files to trigger automatic vocabulary updates without requiring a service restart.

The analyzer supports both fine-grained exhaustive and coarse-grained smart segmentation modes. Users can further customize text processing through configuration-driven vocabulary extensions and custom dictionary definitions.

## Tags

### Data & Databases

- [Chinese Language Segmenters](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/text-segmentation/chinese-language-segmenters.md) — Provides a specialized tool for splitting Chinese sentences into meaningful words for search indexing.
- [Elasticsearch Analysis Plugins](https://awesome-repositories.com/f/data-databases/elasticsearch-analysis-plugins.md) — Provides a specialized analysis plugin for Elasticsearch to customize how Chinese text is processed and indexed.
- [Search Engine Analysis Extensions](https://awesome-repositories.com/f/data-databases/search-engine-analysis-extensions.md) — Integrates the Lucene IK analyzer into Elasticsearch and OpenSearch for improved Chinese language processing.
- [Lucene-Based Search Engines](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/search-engine-platforms/lucene-based-search-engines.md) — Integrates a specialized Chinese text analyzer directly into Lucene-based search engine runtimes.
- [Linguistic Text Segmenters](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/text-segmentation/linguistic-text-segmenters.md) — Implements language-specific segmentation rules for Chinese text using both exhaustive and smart modes.
- [Search Result Optimizations](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/matching-ranking-logic/search-result-optimizations.md) — Improves search result precision for Chinese queries through fine-grained and coarse-grained segmentation.
- [Segmentation Granularity Modes](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/text-segmentation/chinese-language-segmenters/segmentation-granularity-modes.md) — Supports both fine-grained exhaustive and coarse-grained smart segmentation for dividing Chinese sentences. ([source](https://github.com/infinilabs/analysis-ik#readme))

### Artificial Intelligence & ML

- [Comprehensive Dictionary Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/language-tools/dictionary-management-utilities/comprehensive-dictionary-managers.md) — Implements a system for managing multi-layered vocabulary databases and stop-word files via remote URLs.
- [Vocabulary Management](https://awesome-repositories.com/f/artificial-intelligence-ml/vocabulary-management.md) — Manages the updating of analysis dictionaries from remote URLs to ensure domain-specific term recognition.
- [Runtime Vocabulary Reloading](https://awesome-repositories.com/f/artificial-intelligence-ml/vocabulary-management/runtime-vocabulary-reloading.md) — Updates word libraries from remote URLs without restarting the instance by monitoring HTTP headers. ([source](https://github.com/infinilabs/analysis-ik#readme))
- [Vocabulary Extension Sets](https://awesome-repositories.com/f/artificial-intelligence-ml/vocabulary-management/vocabulary-extension-sets.md) — Provides support for expanding the analyzer vocabulary via user-defined stop-word files and custom dictionary configurations.

### Programming Languages & Runtimes

- [Custom Dictionaries](https://awesome-repositories.com/f/programming-languages-runtimes/programming-utilities/data-text-processing/custom-dictionaries.md) — Allows defining user-provided word lists and stop-word files to override default segmentation behavior. ([source](https://github.com/infinilabs/analysis-ik/blob/master/README.md))

### Software Engineering & Architecture

- [Remote Dictionary Loading](https://awesome-repositories.com/f/software-engineering-architecture/integration-extensibility/extensibility/plugin-architectures/plugin-installation-utilities/url-based-plugin-loading/remote-dictionary-loading.md) — Loads segmentation word lists from remote HTTP endpoints to decouple dictionary management from the local file system.
