Compromise is a natural language processing library and rule-based text parser designed to analyze unstructured text. It functions as a toolkit for identifying parts of speech, linguistic patterns, and semantic meaning, while providing specialized engines for named entity recognition and the parsing of temporal and numeric data.
The project is distinguished by its linguistic morphological engine, which can conjugate verbs across different tenses and inflect nouns and adjectives. It further allows for linguistic model customization through a plugin system that enables the extension of lexicons and the modification of baseline grammar rules.
The library covers a broad range of computational linguistics capabilities, including part-of-speech tagging, phonetic analysis, and sentence structure detection. It provides utilities for text normalization and formatting standardization, as well as tools for pattern matching, text statistics analysis, and the conversion of written numbers and currencies into structured values.
Processing performance is managed through parallel text parsing across worker threads and the use of partial parse caching for document segments.