30 open-source projects similar to 2shou/textgrocery, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best TextGrocery alternative.
Gibran is an Elixir natural language processor, and a port of WordsCounted.
Multilingual text (NLP) processing toolkit
Open Academic Research on Improving LLaMA to SOTA LLM
ChatALL is a desktop application that functions as a multi-model chat client and aggregator for artificial intelligence services. It enables users to send a single prompt to multiple AI models simultaneously, allowing for the side-by-side comparison of generated responses within a unified interface. The application distinguishes itself through a local-first approach to data management, ensuring that all conversation logs and user configurations are stored directly on the user's device. This architecture supports privacy and offline access while providing a centralized system for managing and
AudioGPT is an LLM-driven audio framework and processing suite that uses large language models to orchestrate neural audio pipelines. It functions as a multimodal audio generator and processing system, integrating a collection of pretrained models to handle speech synthesis, sound generation, and audio manipulation. The system is distinguished by its ability to generate audio from diverse inputs, including text and images, and its capacity to produce synchronized talking head videos. It also operates as a neural speech translator, converting spoken language between different tongues while pre
Chat with your favourite LLaMA models in a native macOS app
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/
Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)
OpenHands is an autonomous AI software engineer and coding assistant designed to execute software engineering tasks by interacting directly with codebases and development environments. It functions as a platform for running AI agents that can write code and manage files to automate complex development workflows. The system distinguishes itself through a container-based execution environment that isolates agent actions within a sandboxed Linux environment. It employs an autonomous agent loop of observation, planning, and action, supported by a standardized communication protocol that allows it
AllenNLP is a PyTorch-based research library and deep learning language toolkit designed for developing and training neural network architectures for linguistic tasks. It provides a distributed training system that coordinates data and gradients across multiple GPUs and a framework for integrating pretrained transformer architectures. The system distinguishes itself with a dedicated algorithmic bias mitigation tool used to identify and reduce bias in linguistic model predictions. It also includes model influence analysis to interpret predictions by calculating the influence of specific traini
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
This repository contains custom pipes and models related to using spaCy for scientific documents.
Summary of Responses to Questionnaire on Annotation Platform https://forms.gle/iZk8kehkjAWmB8xe9
This repository consists of all my NLP Projects
DocsGPT is a retrieval-augmented generation platform and private knowledge base used to build AI agents that perform grounded search and analysis. It functions as a multi-model AI orchestrator and enterprise agent builder, allowing for the integration of various local and cloud language models to customize reasoning and text generation. The project provides a visual environment for developing automated assistants using conditional logic and third-party API connectivity. It enables the creation of private AI agents capable of performing enterprise search and detailed document analysis using pr
Argilla is a collaborative AI feedback tool and data curation management system. It serves as a human-in-the-loop dataset platform designed to coordinate workforce annotators and domain experts in labeling, rating, and refining data samples for machine learning projects. The platform focuses on large language model dataset curation and reinforcement learning from human feedback workflows. It provides a shared workspace for integrating human expertise into AI development to validate model outputs and correct data errors. The system manages the end-to-end machine learning data pipeline, includ
Implementation of various topic models
lecture notes for probabilistic topic models using ipython notebook
This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️
103976个英语单词库(sql版,csv版,Excel版)包含英文单词,中文翻译,单词的词性及多种词义,执行SQL语句就可以生成表,支持SQL Server,MySQL等多种数据库