30 open-source projects similar to dakrone/clojure-opennlp, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Clojure Opennlp alternative.
Rails-like inflection library for Clojure and ClojureScript
Knwl.js is a JavaScript named entity recognition library and text entity extractor. It functions as an extensible text parsing engine designed to scan unstructured strings for specific data patterns and convert them into structured information. The engine utilizes a modular framework that allows for the recognition of custom data types. This is achieved through a plugin system for language pattern matching, enabling the integration of custom logic to identify unique data types within text. The library identifies and isolates entities such as dates, times, phone numbers, emails, and locations
Ohm is a formal grammar parser generator and domain-specific language framework. It provides a system for defining custom languages to parse, validate, and extract data from input text, transforming raw strings into hierarchical abstract syntax trees based on specified formal rules. The project utilizes an Earley parsing algorithm, which allows it to support all context-free grammars, including those with left recursion and ambiguity, without requiring predefined operator precedence. It also includes a dedicated debugging toolkit for tracing and visualizing the step-by-step state transitions
nom is a parser combinator framework for Rust used to build complex parsers by combining small, reusable parsing functions. It functions as a zero-copy parsing tool that minimizes memory overhead by returning slices of the original input instead of allocating new memory. The framework is designed for diverse data formats, serving as a binary data parser with configurable endianness and a bitstream processing library capable of extracting values of arbitrary bit length. It also functions as a streaming data parser that can process data arriving in chunks and signal when additional input is req
Gibran is an Elixir natural language processor, and a port of WordsCounted.
Multilingual text (NLP) processing toolkit
ChatALL is a desktop application that functions as a multi-model chat client and aggregator for artificial intelligence services. It enables users to send a single prompt to multiple AI models simultaneously, allowing for the side-by-side comparison of generated responses within a unified interface. The application distinguishes itself through a local-first approach to data management, ensuring that all conversation logs and user configurations are stored directly on the user's device. This architecture supports privacy and offline access while providing a centralized system for managing and
AudioGPT is an LLM-driven audio framework and processing suite that uses large language models to orchestrate neural audio pipelines. It functions as a multimodal audio generator and processing system, integrating a collection of pretrained models to handle speech synthesis, sound generation, and audio manipulation. The system is distinguished by its ability to generate audio from diverse inputs, including text and images, and its capacity to produce synchronized talking head videos. It also operates as a neural speech translator, converting spoken language between different tongues while pre
RSS/Atom Feed Parsing and Generating for Clojure. Bidirectional. Data-driven.
Chat with your favourite LLaMA models in a native macOS app
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/
Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)
OpenHands is an autonomous AI software engineer and coding assistant designed to execute software engineering tasks by interacting directly with codebases and development environments. It functions as a platform for running AI agents that can write code and manage files to automate complex development workflows. The system distinguishes itself through a container-based execution environment that isolates agent actions within a sandboxed Linux environment. It employs an autonomous agent loop of observation, planning, and action, supported by a standardized communication protocol that allows it
AllenNLP is a PyTorch-based research library and deep learning language toolkit designed for developing and training neural network architectures for linguistic tasks. It provides a distributed training system that coordinates data and gradients across multiple GPUs and a framework for integrating pretrained transformer architectures. The system distinguishes itself with a dedicated algorithmic bias mitigation tool used to identify and reduce bias in linguistic model predictions. It also includes model influence analysis to interpret predictions by calculating the influence of specific traini
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
This repository contains custom pipes and models related to using spaCy for scientific documents.
Summary of Responses to Questionnaire on Annotation Platform https://forms.gle/iZk8kehkjAWmB8xe9
This repository consists of all my NLP Projects
DocsGPT is a retrieval-augmented generation platform and private knowledge base used to build AI agents that perform grounded search and analysis. It functions as a multi-model AI orchestrator and enterprise agent builder, allowing for the integration of various local and cloud language models to customize reasoning and text generation. The project provides a visual environment for developing automated assistants using conditional logic and third-party API connectivity. It enables the creation of private AI agents capable of performing enterprise search and detailed document analysis using pr
Argilla is a collaborative AI feedback tool and data curation management system. It serves as a human-in-the-loop dataset platform designed to coordinate workforce annotators and domain experts in labeling, rating, and refining data samples for machine learning projects. The platform focuses on large language model dataset curation and reinforcement learning from human feedback workflows. It provides a shared workspace for integrating human expertise into AI development to validate model outputs and correct data errors. The system manages the end-to-end machine learning data pipeline, includ
Implementation of various topic models
lecture notes for probabilistic topic models using ipython notebook
This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset