Compromise | Awesome Repository

Compromise is a natural language processing library and rule-based text parser designed to analyze unstructured text. It functions as a toolkit for identifying parts of speech, linguistic patterns, and semantic meaning, while providing specialized engines for named entity recognition and the parsing of temporal and numeric data.

The project is distinguished by its linguistic morphological engine, which can conjugate verbs across different tenses and inflect nouns and adjectives. It further allows for linguistic model customization through a plugin system that enables the extension of lexicons and the modification of baseline grammar rules.

The library covers a broad range of computational linguistics capabilities, including part-of-speech tagging, phonetic analysis, and sentence structure detection. It provides utilities for text normalization and formatting standardization, as well as tools for pattern matching, text statistics analysis, and the conversion of written numbers and currencies into structured values.

Processing performance is managed through parallel text parsing across worker threads and the use of partial parse caching for document segments.

Features

Natural Language Processing - Analyzes raw text to identify linguistic patterns, parts of speech, and semantic meaning.
Rule-Based Text Parsers - Provides a rule-based text parser that tokenizes strings and tags words using customizable grammars.
Textual Entity Extractors - Identifies and extracts specific categories of information such as people, places, and organizations from text.
Grammatical Inflection Engines - Provides a system to programmatically transform words into different grammatical forms, such as tenses and plurals.

Features

Natural Language Processing - Analyzes raw text to identify linguistic patterns, parts of speech, and semantic meaning.
Rule-Based Text Parsers - Provides a rule-based text parser that tokenizes strings and tags words using customizable grammars.
Textual Entity Extractors - Identifies and extracts specific categories of information such as people, places, and organizations from text.
Grammatical Inflection Engines - Provides a system to programmatically transform words into different grammatical forms, such as tenses and plurals.

Processing performance is managed through parallel text parsing across worker threads and the use of partial parse caching for document segments.