# stanfordnlp/stanza

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/stanfordnlp-stanza).**

7,809 stars · 941 forks · Python · NOASSERTION

## Links

- GitHub: https://github.com/stanfordnlp/stanza
- Homepage: https://stanfordnlp.github.io/stanza/
- awesome-repositories: https://awesome-repositories.com/repository/stanfordnlp-stanza.md

## Topics

`artificial-intelligence` `corenlp` `deep-learning` `machine-learning` `named-entity-recognition` `natural-language-processing` `nlp` `python` `pytorch` `universal-dependencies`

## Description

Stanza is a Python natural language processing library designed for tokenization, lemmatization, and dependency parsing across many human languages using neural models. It provides a neural processing pipeline that converts raw text into structured linguistic data objects, alongside a specialized analyzer for extracting medical insights from clinical and biomedical language.

The project includes a wrapper that connects Python scripts to Java-based natural language processing tools and remote annotation servers. This enables a bridge for extracting linguistic annotations and analysis data from Java-based software.

The library covers a broad range of linguistic analysis, including named entity recognition, coreference resolution, and syntactic dependency parsing. It supports the construction of annotation pipelines to extract features such as parts of speech and morphological properties across diverse linguistic datasets.

Users can perform custom training of neural network modules using project-specific data to refine the accuracy of tokenizers and parsers.

## Tags

### Artificial Intelligence & ML

- [Natural Language Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing.md) — Provides a comprehensive library for tokenization, lemmatization, and dependency parsing across many human languages.
- [Dependency Syntax Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/dependency-syntax-analysis.md) — Maps grammatical dependencies between words to determine the overall syntactic structure of sentences. ([source](https://stanfordnlp.github.io/stanza/))
- [Named Entity Recognition](https://awesome-repositories.com/f/artificial-intelligence-ml/named-entity-recognition.md) — Identifies and classifies entities like people, organizations, and locations within raw text.
- [Transformer-Based NLP Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-based-nlp-libraries.md) — Provides a sequence of transformer-based annotators that transform raw text into structured linguistic objects.
- [Transformer Models](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-models.md) — Uses transformer-based deep learning architectures to predict linguistic tags and dependencies.
- [Clinical Entity Recognition Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/clinical-entity-recognition-toolkits.md) — Provides a specialized analyzer for extracting medical insights from clinical and biomedical language.
- [Custom Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training.md) — Allows training of neural network modules using project-specific data to refine tokenizers and parsers. ([source](https://github.com/stanfordnlp/stanza#readme))
- [NLP-Specific](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/nlp-specific.md) — Trains neural network modules with project-specific data to refine the accuracy of tokenizers and parsers.
- [Biomedical Text Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/entity-extraction-pipelines/textual-entity-extractors/financial-entity-recognizers/biomedical-text-analysis.md) — Provides specialized syntactic analysis and entity recognition to extract medical insights from clinical language. ([source](https://github.com/stanfordnlp/stanza#readme))
- [Model Downloaders](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-management/model-downloaders.md) — Manages the retrieval of model weights and configurations to support both online and offline environments. ([source](https://stanfordnlp.github.io/stanza/faq.html))
- [Plugin Model Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-management/plugin-model-managers.md) — Retrieves and caches language-specific model binaries from remote repositories based on configuration.
- [Word Stemming](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-stemming.md) — Converts words to their base or dictionary form to normalize text for consistent analysis. ([source](https://stanfordnlp.github.io/stanza/))
- [Morphological Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/sentence-structure-analysis/morphological-analysis.md) — Breaks down raw text into sentences and words while identifying parts of speech and morphological features. ([source](https://stanfordnlp.github.io/stanza/))
- [Universal Linguistic Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/universal-linguistic-analysis.md) — Performs tokenization, tagging, lemmatization, and dependency parsing across languages using universal linguistic data. ([source](https://github.com/stanfordnlp/stanza#readme))

### Part of an Awesome List

- [Python NLP Libraries](https://awesome-repositories.com/f/awesome-lists/devtools/python-nlp-libraries.md) — Provides a comprehensive Python library for deep learning-based linguistic analysis, tokenization, and dependency parsing.
- [Coreference Resolution](https://awesome-repositories.com/f/awesome-lists/ai/coreference-resolution.md) — Implements tools for resolving coreferences to maintain entity context throughout a document. ([source](https://stanfordnlp.github.io/stanza/))
- [Natural Language Processing](https://awesome-repositories.com/f/awesome-lists/ai/natural-language-processing.md) — Python NLP library for multiple human languages.

### Data & Databases

- [Linguistic Data Processors](https://awesome-repositories.com/f/data-databases/linguistic-data-processors.md) — Implements a processing pipeline for named entity recognition and sentence segmentation across diverse datasets.
- [Text Processing Pipelines](https://awesome-repositories.com/f/data-databases/text-processing-pipelines.md) — Implements modular workflows that sequence annotators to transform raw text into structured linguistic data. ([source](https://stanfordnlp.github.io/CoreNLP/))
- [Batch Input Processing](https://awesome-repositories.com/f/data-databases/batch-input-processing.md) — Groups multiple documents into a single execution stream to increase throughput and reduce processing overhead.

### Development Tools & Productivity

- [Annotation Pipelines](https://awesome-repositories.com/f/development-tools-productivity/workflow-automations/annotation-pipelines.md) — Sequences multiple linguistic processors in a linear chain to incrementally add metadata to raw text.
