30 open-source projects similar to loadfive/knwl.js, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Knwl.js alternative.
Compromise is a natural language processing library and rule-based text parser designed to analyze unstructured text. It functions as a toolkit for identifying parts of speech, linguistic patterns, and semantic meaning, while providing specialized engines for named entity recognition and the parsing of temporal and numeric data. The project is distinguished by its linguistic morphological engine, which can conjugate verbs across different tenses and inflect nouns and adjectives. It further allows for linguistic model customization through a plugin system that enables the extension of lexicons
CoreNLP is a Java natural language processing library designed to convert raw human language text into structured data. It utilizes a suite of linguistic annotators to analyze text through a pipeline, extracting grammatical structures, sentiment, and linguistic patterns. The project includes a coreference resolution engine that links multiple mentions of the same entity to maintain contextual consistency across documents. It also provides tools for named entity recognition to categorize people, companies, and locations, and a part-of-speech tagger to assign grammatical categories and base for
Stanza is a Python natural language processing library designed for tokenization, lemmatization, and dependency parsing across many human languages using neural models. It provides a neural processing pipeline that converts raw text into structured linguistic data objects, alongside a specialized analyzer for extracting medical insights from clinical and biomedical language. The project includes a wrapper that connects Python scripts to Java-based natural language processing tools and remote annotation servers. This enables a bridge for extracting linguistic annotations and analysis data from
KnowledgeGraphData is a collection of structured datasets and corpora designed to provide a foundational layer for cognitive intelligence and artificial intelligence systems. It primarily consists of large-scale Chinese knowledge graph datasets, including entity-relation data and NLP training sets used to drive semantic understanding and automated question answering. The project focuses on the construction and export of massive entity-attribute-value graphs, organizing knowledge into portable formats. It provides specialized domain partitioning to tailor information retrieval for professional
nlp.js is a JavaScript natural language processing library and development framework used to build natural language understanding engines. It provides a toolkit for creating local machine learning models for intent classification and acts as a multilingual text processor that detects languages and normalizes text across various dialects. The framework distinguishes itself by supporting local execution on both servers and mobile devices, enabling chatbot functionality without an internet connection. It features a specialized system for conversational slot filling to collect mandatory informati
DeepKE is a knowledge extraction toolkit and framework designed to transform unstructured text into structured knowledge graphs. It provides a pipeline for identifying and classifying named entities, semantic relations, and events, converting raw datasets into structured triples. The project utilizes large language models as tool callers through a standardized context protocol to drive automated data extraction processes. It supports schema-driven extraction across multiple domains and bilingual text, employing joint entity and relation extraction to identify components in a single structured
nlp-recipes is a collection of implementation guides and reference templates for applying natural language processing techniques to real-world tasks. It provides standardized workflows and code examples for developing NLP pipelines, from dataset preparation and model training to performance evaluation. The project focuses on the practical application of transformer-based models, offering patterns for fine-tuning pretrained architectures for tasks such as text classification, named entity recognition, and question answering. It also includes a toolkit for model interpretability, allowing users
Agriculture Knowledge Graph is a structured triple-store system and decision support platform designed to transform raw agricultural documents into a machine-readable graph. It functions as a domain information retrieval system that extracts and queries agricultural data to provide intelligent answers and planning support. The project implements a full pipeline for knowledge graph construction, featuring a relation extraction framework and named entity recognition tools. It utilizes remote supervision and machine learning to identify and classify relationships between entities, converting uns
DeepPavlov is a conversational AI framework and deep learning NLP library designed for building end-to-end dialogue systems and chatbots. It functions as an NLP pipeline orchestrator that allows users to compose pre-trained models and text processing components into sequential data flows for complex linguistic tasks. The system is distinguished by its ability to act as a chatbot deployment server, exposing trained conversational models as web services via REST and Socket APIs. It utilizes JSON-based pipeline configurations and dynamic variable interpolation to decouple model logic from infras
This project is a natural language processing system designed for named entity recognition and text classification. It uses a machine learning approach to identify specific names and key information from raw text to organize unstructured content into a structured format. The system implements a multi-layer architecture that combines a pre-trained transformer for embeddings, bidirectional long short-term memory for sequence modeling, and a conditional random field for label transitions. It supports transfer learning through the fine-tuning of these models on task-specific datasets. The projec
This repository provides a collection of deep learning models and neural network architectures built for natural language processing tasks. It functions as a library of pre-trained models designed to process, analyze, and generate human language data using the TensorFlow framework. The project utilizes sequence-to-sequence modeling and layered neural architectures to handle variable-length language data. By employing static dataflow graphing and tensor-based representations, the models execute mathematical operations to transform input features into abstract linguistic meanings. Users can loa
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
This project is a collection of transformer natural language processing tutorial notebooks and educational resources. It provides a guide for using the Hugging Face Transformers library through interactive coding exercises and demonstrations. The repository contains ready-to-run Jupyter notebooks that provide practical examples for implementing transformer models. These resources demonstrate how to execute specific natural language processing workflows using pre-trained models. The notebooks cover a range of natural language processing tasks, including text classification, automatic text sum
bert4keras is a lightweight reimplementation of the BERT transformer architecture for the Keras deep learning framework. It serves as a natural language processing toolkit and transformer model library used for text classification, sequence labeling, and semantic embedding extraction. The framework includes a sequence-to-sequence model system for question answering and text generation, as well as a model inference server to deploy trained transformers as web APIs for real-time predictions. Capabilities cover a broad range of natural language understanding tasks, including reading comprehensi
Flair is a transformer-based natural language processing framework used to build and train models for text classification and sequence tagging. It provides a specialized library for generating contextual text embeddings and performing linguistic analysis. The framework includes dedicated tools for named entity recognition, including the identification of specialized biomedical entities across multiple languages. It further supports entity linking to map identified text mentions to unique entries within general or biomedical knowledge bases. The project covers a broad range of language analys
Natural is a natural language processing library for Node.js that provides tools for text analysis, tokenization, and phonetic matching. It functions as a collection of specialized toolsets for word stemming, string similarity quantification, and pattern-based text classification. The library includes a phonetic sound analyzer that converts words into phonetic representations to identify matches based on sound rather than literal spelling. It also features a text classification engine that assigns categories to text inputs using trained models and pattern recognition. Additional capabilities
PyText is an extensible PyTorch-based framework for building, training, and deploying custom natural language processing models, including text classifiers, sequence taggers, and intent-slot predictors. It provides a modular toolkit that allows developers to assemble these models using pluggable registries for model architectures, data formats, and tensorizers, all configurable through YAML files without requiring code changes. The framework distinguishes itself through its comprehensive support for the full NLP model lifecycle, from training to production inference. It includes pre-built neu
This project is a comprehensive Python toolkit designed for natural language processing, research, and education. It functions as a linguistic data processor that provides a standardized framework for managing, cleaning, and analyzing large collections of annotated text corpora and lexical resources. The library distinguishes itself through its integration of both symbolic and statistical methods, allowing users to perform complex tasks ranging from rule-based grammar parsing to machine learning-driven classification. It offers a modular pipeline for text processing, enabling the transformati
SentencePiece is a text segmentation engine and tokenization library designed for machine learning workflows. It provides a comprehensive toolkit for transforming raw text into subword units or numerical identifiers, enabling consistent data representation for neural network training and inference. The library supports the training of segmentation models from raw text, allowing for the creation of custom vocabularies tailored to specific domain requirements. The project distinguishes itself through its byte-level encoding and fallback mechanisms, which ensure that every input can be represent
Open Llama is an open source large language model and pre-trained transformer designed as a permissively licensed alternative to proprietary weights. It serves as a base model reproduction of the Llama architecture, providing a set of weights for a decoder-only transformer. The project provides a transparently trained model based on the RedPajama dataset, supporting unrestricted commercial and research use. It includes systems for serving pre-trained weights in various sizes. The project covers natural language processing research and performance benchmarking through text quality evaluation
This is a machine learning framework for treating diverse natural language processing tasks as a unified text-to-text problem. It provides a toolkit for pre-training and fine-tuning large-scale transformer models, utilizing a system where both inputs and outputs are formatted as raw text sequences. The framework is distinguished by its distributed training system, which uses mesh-based strategies to scale model weights and training batches across multiple TPU cores. It supports multi-task learning by combining diverse datasets into a single training stream using configurable mixture rates, al
This is a Chinese natural language processing toolkit providing a suite of tools for word segmentation, part-of-speech tagging, and named entity recognition. It includes a neural dependency parser for analyzing syntactic and semantic relationships between words and a machine learning training suite for creating custom linguistic models using annotated datasets. The toolkit distinguishes itself through its deployment flexibility, offering a dockerized server and a web service interface that exposes processing capabilities via API. It supports the use of pretrained models and allows for the int
Fairseq is a PyTorch toolkit for sequence-to-sequence modeling, specializing in neural machine translation, automatic speech recognition, and large-scale language model training. It provides a framework for processing and aligning diverse data sources, including text, audio, and video, to support tasks such as speech-to-text conversion and multimodal sequence learning. The project is distinguished by its distributed training capabilities, which utilize parameter sharding, mixed-precision training, and CPU offloading to handle models that exceed single-device memory. It also includes specializ
Pattern is a Python web mining library that functions as an HTML web scraper, a natural language processing toolkit, and a network analysis tool. It provides a mathematical framework for categorizing datasets through a vector space model library. The project enables the extraction of structured data from web services and the creation of searchable web content indexes. It processes unstructured text using sentiment analysis, part-of-speech tagging, and n-gram searching. The library covers machine learning classification through the training of models using perceptron algorithms and support ve
Rasa is a chatbot development platform and conversational AI framework used to design, deploy, and integrate multi-turn conversational agents. It functions as an LLM orchestration engine and NLU dialogue manager, combining large language model fluency with structured business logic to control agent behavior. The framework enables the development of conversational assistants that automate text and voice interactions. It allows for the definition of conversational flows using flexible sequences and provides tools to inspect agent decisions to debug and validate the internal reasoning process.
This project is a Chinese text segmentation library and tokenizer designed to split Chinese sentences into individual words. It serves as a natural language processing tool for splitting characters into words, tagging parts of speech, and extracting keywords using statistical analysis. The library distinguishes itself through support for custom dictionary configuration and vocabulary file management, allowing users to override default segmentation rules for domain-specific accuracy. It also includes a TF-IDF keyword extractor to identify significant words and core topics within documents. Th
TextBlob is a natural language processing library that provides a unified interface for common linguistic tasks. It operates as a wrapper-based API, simplifying the use of complex processing libraries by delegating core operations to specialized external frameworks. The project features a pluggable processing pipeline that allows for the integration of custom logic and alternative language engines. It supports the extension of processing models through plugins to add specific language support or custom data processing. The library covers a broad range of linguistic capabilities, including se
This project is a TensorFlow implementation of a transformer model, providing a text-to-text deep learning framework designed to recognize and generate sequence patterns. It functions as an attention-based sequence model and a neural machine translation framework for converting text from one language to another. The system implements the transformer network architecture, utilizing multi-head attention and positional encoding to process sequential data. It provides the necessary tools for transformer model training and machine translation inference, allowing for the execution of trained models
This repository serves as an educational resource for learning the foundational architectures of natural language processing through concise code implementations. It provides a structured collection of deep learning models designed to process and understand human language, focusing on the core mechanics of neural network sequence modeling and text analysis. The project distinguishes itself by offering direct, hands-on implementations of complex architectures, including Transformers, attention mechanisms, and word embedding generation. By utilizing tensor-based computational graphs and gradien
This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management. The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computation