30 open-source projects similar to opencoder-4/opencoder, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best OpenCoder alternative.
pkuseg-python is a Chinese word segmentation toolkit and natural language processing library. It provides specialized models for splitting Chinese text into words across various domains, including news, medical, and web content, and includes a tool for assigning grammatical parts of speech tags to segmented words. The library allows for the training of custom segmentation models using annotated datasets and supports the integration of user-defined dictionaries to ensure specialized terminology is recognized correctly. It employs a multi-threaded execution engine to process large volumes of Ch
WizardLM is a large language model and instruction-tuning framework designed to execute sophisticated coding, mathematical, and conversational tasks. It functions as an AI system for mathematical reasoning and code generation, as well as a synthetic dataset generator used to train other language models. The project is distinguished by its evolutionary instruction tuning, which uses a method to rewrite simple instructions into complex tasks. This process expands training dataset difficulty and produces a high volume of open-domain tasks across various difficulty levels. The system covers capa
LMOps is a research-driven operations framework for optimizing the deployment, fine-tuning, and performance of large language models. It provides a specialized toolkit for foundation model adaptation, inference acceleration, prompt optimization, and context orchestration. The framework distinguishes itself through an inference accelerator that reduces token generation latency by verifying and copying overlapping text spans from reference documents. It also features a prompt engineering optimizer that employs reinforcement learning, beam search, and non-natural language markers to automaticall
HanLP is a natural language processing library and deep learning framework specifically optimized for the Chinese language, while also functioning as a multilingual text processor. It serves as a toolkit for performing linguistic analysis, semantic understanding, and script conversion. The project distinguishes itself through a dedicated focus on Chinese linguistic structures, including a specialized script converter for transforming text between Simplified Chinese, Traditional Chinese, and Pinyin. It further supports domain-specific model training to improve the recognition of professional t
Marqo is an ecommerce product discovery platform, multimodal vector database, and AI search merchandising tool. It provides infrastructure for implementing semantic search and recommendations, allowing shoppers to find products using natural language and images. The platform distinguishes itself through a hybrid ranking pipeline that combines neural semantic scores with business-defined boosting and pinning rules. It features a conversational commerce engine that uses large language models to process user intent and provides a search performance analytics suite for measuring conversion uplift
HuatuoGPT, Towards Taming Language Models To Be a Doctor. (An Open Medical GPT)
HuatuoGPT2, One-stage Training for Medical Adaption of LLMs. (An Open Medical GPT)
Repository of DISC-MedLLM, it is a comprehensive solution that leverages Large Language Models (LLMs) to provide accurate and truthful medical response in end-to-end conversational healthcare services.
A Scientific Large Language Model in Geoscience
Codes for our paper InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery
夫子•明察司法大模型是由山东大学、浪潮云、中国政法大学联合研发,以 ChatGLM 为大模型底座,基于海量中文无监督司法语料与有监督司法微调数据训练的中文司法大模型。该模型支持法条检索、案例分析、三段论推理判决以及司法对话等功能,旨在为用户提供全方位、高精准的法律咨询与解答服务。
All the recommendation experiments are conducted under our content based recommendation repository Legommenders. It involves a set of news recommenders and click-through rate prediction models. It is a modular-design framework, supporting the integration with pretrained language models (PLMs)…
Yunxiang Li 1 , Zihan Li 2 , Kai Zhang 3 , Ruilong Dan 4 , Steve Jiang 1 , You Zhang 1 1 UT Southwestern Medical Center, USA 2 University of Illinois at Urbana-Champaign, USA 3 Ohio State University, USA 4 Hangzhou Dianzi University, China
Organization: University of New South Wales(UNSW) AI4Science & GreenDynamics AI
An Open-sourced Knowledgable Large Language Model Framework.
中文法律LLaMA (LLaMA for Chinese legel domain)
The official codes for "PMC-LLaMA: Towards Building Open-source Language Models for Medicine"
Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024
🌟 Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.
ChatLaw is a specialized large language model legal assistant designed to provide automated consulting and question answering within Chinese legal frameworks. It functions as a system for legal knowledge management, processing complex legal texts to deliver accurate statutory answers and advisory services. The system utilizes a mixture-of-experts modeling approach and multi-agent coordination to research information and generate professional consultation reports. To ensure factual reliability and minimize hallucinations, it integrates a legal knowledge graph and a standardized operating proce
Qwen3 is a transformer-based large language model designed as a generative AI foundation for understanding, reasoning, and generating human language. It functions as a comprehensive ecosystem for model training, fine-tuning, and production-ready inference, providing the underlying architecture and weights necessary to build diverse artificial intelligence applications. The project distinguishes itself through extensive support for model quantization and distributed inference, enabling efficient execution across a wide range of hardware from consumer-grade devices to scalable cloud infrastruct
Huatuo-Llama-Med-Chinese is a medical large language model specialized in processing and generating natural language text in Chinese. It is an instruction-tuned system designed to answer professional healthcare questions by leveraging a dedicated medical knowledge base. The model integrates structured medical literature and knowledge graphs to ensure clinical accuracy during response generation. It employs knowledge-graph augmented inference to combine structured entity relationships with neural network outputs. The system is developed through domain-specific weight adaptation, cross-lingual
ChiMed-GPT is a Chinese medical large language model (LLM) built by continually training Ziya-v2 on Chinese medical data, where pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF) are comprehensively performed on it.
CodeGeeX is an open-source code model and multilingual large language model designed to generate, translate, and complete source code across multiple programming languages. It functions as an AI coding assistant and a cross-lingual code translator that produces executable code and technical documentation. The project enables natural language programming by turning plain English descriptions into functional programs. It also provides the ability to convert source code from one programming language to another while preserving the original logic and functionality. The system covers a range of c
This repository contains the implementation of the LLM-Prop model. LLM-Prop is an efficiently finetuned large language model (T5 encoder) on crystals text descriptions to predict their properties. Given a text sequence that describes the crystal structure, LLM-Prop encodes the underlying crystal…