What are the best Awesome Python NLP Libraries GitHub Repositories?

Comprehensive frameworks and toolkits for deep learning and linguistic analysis. Explore 35 awesome GitHub repositories matching part of an awesome list · Python NLP Libraries. Refine with filters or upvote what's useful. Top picks: huggingface/transformers, explosion/spacy, pytorch/fairseq, deepset-ai/haystack, zalandoresearch/flair, allenai/allennlp, huggingface/tokenizers, yandexdataschool/nlp_course, nebuly-ai/nebullvm, stanfordnlp/stanza.

Why is huggingface/transformers a recommended Python NLP Libraries GitHub Repositories repository?

State-of-the-art library for Transformer-based models.

Why is explosion/spacy a recommended Python NLP Libraries GitHub Repositories repository?

Industrial-strength library for advanced natural language processing.

Why is pytorch/fairseq a recommended Python NLP Libraries GitHub Repositories repository?

Facebook AI Research implementations of sequence-to-sequence models.

Why is deepset-ai/haystack a recommended Python NLP Libraries GitHub Repositories repository?

End-to-end framework for building natural language search interfaces.

Why is zalandoresearch/flair a recommended Python NLP Libraries GitHub Repositories repository?

Simple framework for multilingual NLP built on PyTorch.

Why is allenai/allennlp a recommended Python NLP Libraries GitHub Repositories repository?

Research library for building deep learning models on PyTorch.

Why is huggingface/tokenizers a recommended Python NLP Libraries GitHub Repositories repository?

High-performance tokenization for research and production.

Why is yandexdataschool/nlp_course a recommended Python NLP Libraries GitHub Repositories repository?

Teaches NLP algorithms using Python with NumPy, PyTorch, and NLTK for all assignments and examples.

Why is nebuly-ai/nebullvm a recommended Python NLP Libraries GitHub Repositories repository?

Optimizes inference speed for deep learning models.

Why is stanfordnlp/stanza a recommended Python NLP Libraries GitHub Repositories repository?

Provides a comprehensive Python library for deep learning-based linguistic analysis, tokenization, and dependency parsing.

35 Repos

Awesome GitHub RepositoriesPython NLP Libraries

Comprehensive frameworks and toolkits for deep learning and linguistic analysis.

Explore 35 awesome GitHub repositories matching part of an awesome list · Python NLP Libraries. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

huggingface/transformers
huggingface/transformers
161,630Auf GitHub ansehen
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference. The library features extensive support for model optimization and
State-of-the-art library for Transformer-based models.
Pythonaudiodeep-learningdeepseek
Auf GitHub ansehen161,630
explosion/spacy
explosion/spaCy
33,688Auf GitHub ansehen
spaCy is a Python natural language processing framework designed for industrial-scale text processing. It converts raw text into structured data for machine learning pipelines through a combination of statistical language model trainers, transformer-based text processors, and syntactic dependency parsers. The project enables the integration of pretrained transformer architectures to perform complex linguistic analysis and multi-task learning. It also provides a specialized system for neural named entity recognition to identify and categorize key entities within text. The framework covers a b
Industrial-strength library for advanced natural language processing.
Pythonaiartificial-intelligencecython
Auf GitHub ansehen33,688
pytorch/fairseq
pytorch/fairseq
32,228Auf GitHub ansehen
Fairseq is a deep learning research toolkit and sequence-to-sequence framework built on PyTorch. It provides a system for training and deploying models that map input sequences to output sequences, with a primary focus on neural machine translation and speech recognition. The toolkit allows for the generation of text sequences through search algorithms such as beam search and nucleus sampling. It includes capabilities for producing synthetic parallel training data by translating monolingual text using reverse sequence models. The framework supports large scale model training through multi-de
Facebook AI Research implementations of sequence-to-sequence models.
Python
Auf GitHub ansehen32,228
deepset-ai/haystack
deepset-ai/haystack
24,253Auf GitHub ansehen
Haystack is an orchestration framework designed for building complex search and generative AI pipelines. It functions as an agentic workflow engine, enabling the construction of automated sequences that allow AI agents to perform multi-step reasoning and data analysis. The framework utilizes a modular, component-based architecture that connects processing steps into directed acyclic graphs. By employing a provider-agnostic integration layer, it decouples core logic from specific external AI services and vector databases, allowing for the flexible exchange of underlying technologies. This desi
End-to-end framework for building natural language search interfaces.
MDXagentagentsai
Auf GitHub ansehen24,253
zalandoresearch/flair
zalandoresearch/flair
14,378Auf GitHub ansehen
Flair is a natural language processing framework for training and applying models for sequence labeling and text classification. It provides a system for generating word embeddings and identifying semantic entities within text. The framework includes a dedicated system for zero and few-shot learning, enabling text classification and entity extraction using minimal training examples by leveraging pre-trained knowledge. Its capabilities cover named entity recognition, sentiment analysis, and the training of specialized models using custom datasets. It also includes tooling for the visual highl
Simple framework for multilingual NLP built on PyTorch.
Python
Auf GitHub ansehen14,378
allenai/allennlp
allenai/allennlp
11,889Auf GitHub ansehen
AllenNLP is a PyTorch-based research library and deep learning language toolkit designed for developing and training neural network architectures for linguistic tasks. It provides a distributed training system that coordinates data and gradients across multiple GPUs and a framework for integrating pretrained transformer architectures. The system distinguishes itself with a dedicated algorithmic bias mitigation tool used to identify and reduce bias in linguistic model predictions. It also includes model influence analysis to interpret predictions by calculating the influence of specific traini
Research library for building deep learning models on PyTorch.
Python
Auf GitHub ansehen11,889
huggingface/tokenizers
huggingface/tokenizers
10,825Auf GitHub ansehen
This project is a high-performance library for converting raw text into tokens and IDs for machine learning models. It functions as a fast text encoder and a text preprocessing pipeline designed to transform strings into numerical representations with high throughput for research and production. The library includes a subword tokenizer trainer used to analyze text datasets and create custom vocabularies using algorithms such as byte-pair encoding and wordpiece. It provides capabilities for subword vocabulary training and text alignment, allowing character offsets to be tracked during normaliz
High-performance tokenization for research and production.
Rustbertgptlanguage-model
Auf GitHub ansehen10,825
yandexdataschool/nlp_course
yandexdataschool/nlp_course
10,591Auf GitHub ansehen
YSDA course in Natural Language Processing
Teaches NLP algorithms using Python with NumPy, PyTorch, and NLTK for all assignments and examples.
Jupyter Notebook
Auf GitHub ansehen10,591
nebuly-ai/nebullvm
nebuly-ai/nebullvm
8,338Auf GitHub ansehen
Nebullvm is an AI inference accelerator, GPU resource orchestrator, and performance optimization library for large language models. It functions as an optimization layer designed to lower operational costs by aligning model execution with underlying hardware architectures. The system maximizes cluster efficiency through real-time dynamic partitioning and elastic quotas for shared hardware resources. It employs alignment methods and techniques to reduce the hardware and data requirements necessary for tuning large language models. The project covers broad capability areas including AI infrast
Optimizes inference speed for deep learning models.
Python
Auf GitHub ansehen8,338
stanfordnlp/stanza
stanfordnlp/stanza
7,809Auf GitHub ansehen
Stanza is a Python natural language processing library designed for tokenization, lemmatization, and dependency parsing across many human languages using neural models. It provides a neural processing pipeline that converts raw text into structured linguistic data objects, alongside a specialized analyzer for extracting medical insights from clinical and biomedical language. The project includes a wrapper that connects Python scripts to Java-based natural language processing tools and remote annotation servers. This enables a bridge for extracting linguistic annotations and analysis data from
Provides a comprehensive Python library for deep learning-based linguistic analysis, tokenization, and dependency parsing.
Pythonartificial-intelligencecorenlpdeep-learning
Auf GitHub ansehen7,809
mervinpraison/praisonai
MervinPraison/PraisonAI
5,592Auf GitHub ansehen
PraisonAI is an autonomous AI agent platform that coordinates multiple LLM-powered agents for research, planning, and execution of complex workflows. It functions as a multi-agent orchestration framework, a workflow builder, and a Model Context Protocol server, while also providing retrieval-augmented generation through vector knowledge bases. Agents can interact via CLI, web, or standardized protocols with sandboxed code execution. The platform distinguishes itself with a rich set of agent communication protocols, including A2A, REST, WebSocket, voice and telephony integration, and MCP, allo
Multi-agent framework with LLM support and agentic workflows.
Pythonagentsaiai-agent-framework
Auf GitHub ansehen5,592
snipsco/snips-nlu
snipsco/snips-nlu
3,972Auf GitHub ansehen
snips-nlu ist eine Python-Bibliothek und eine Engine für Natural Language Understanding, die entwickelt wurde, um unstrukturierten Text in strukturierte Daten umzuwandeln. Sie identifiziert Benutzerabsichten (Intents) und extrahiert zugehörige Entitäten aus natürlichsprachlichen Sätzen, um eine maschinenlesbare Befehlsverarbeitung zu ermöglichen. Die Engine fungiert als mehrsprachiger Parser, der in der Lage ist, Text in mehreren Sprachen zu verarbeiten. Sie bildet identifizierte Entitäten auf kanonische Werte oder standardisierte ISO-Formate ab, wie z. B. Zeitstempel, um die Datenkonsistenz sicherzustellen. Das Projekt deckt Intent-Klassifizierung und Named Entity Recognition ab und nutzt Sequenz-Labeling und Tokenisierung, um Benutzerziele und spezifische Daten-Slots zu identifizieren.
Production-ready library for intent parsing and slot filling.
Python
Auf GitHub ansehen3,972
qdata/textattack
QData/TextAttack
3,435Auf GitHub ansehen
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Framework for adversarial attacks and data augmentation in NLP.
Python
Auf GitHub ansehen3,435
nervanasystems/nlp-architect
NervanaSystems/nlp-architect
2,934Auf GitHub ansehen
A Deep Learning NLP/NLU library by Intel® AI Lab
Library for exploring state-of-the-art deep learning topologies.
Python
Auf GitHub ansehen2,934
dmlc/gluon-nlp
dmlc/gluon-nlp
2,546Auf GitHub ansehen
.. raw:: html
Deep learning toolkit for research and industrial NLP deployment.
Python
Auf GitHub ansehen2,546
brikerman/kashgari
BrikerMan/Kashgari
2,383Auf GitHub ansehen
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Keras-powered framework for named entity recognition and classification.
Python
Auf GitHub ansehen2,383
jasonkessler/scattertext
JasonKessler/scattertext
2,330Auf GitHub ansehen
Beautiful visualizations of how language differs among document types.
Visualizes language differences between corpora using D3.
Pythoncomputational-social-scienced3eda
Auf GitHub ansehen2,330
chartbeat-labs/textacy
chartbeat-labs/textacy
2,242Auf GitHub ansehen
NLP, before and after spaCy
Higher-level NLP utilities built on top of spaCy.
Python
Auf GitHub ansehen2,242
petrochukm/pytorch-nlp
PetrochukM/PyTorch-NLP
2,226Auf GitHub ansehen
Basic Utilities for PyTorch Natural Language Processing (NLP)
Toolkit for rapid prototyping with data loaders and metrics.
Python
Auf GitHub ansehen2,226
deepset-ai/farm
deepset-ai/FARM
1,752Auf GitHub ansehen
:housewithgarden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
Transfer learning framework focused on industrial question answering.
Python
Auf GitHub ansehen1,752