27 repository-uri
Capabilities for analyzing and extracting information from extremely large input sequences in a single pass.
Distinct from Large Scale Training: The candidates focus on large-scale training, data computation, or image processing, whereas this feature specifically concerns the model's inference-time context window size.
Explore 27 awesome GitHub repositories matching artificial intelligence & ml · Long Context Processing. Refine with filters or upvote what's useful.
Qwen2.5 is a suite of large language model foundation models designed for natural language generation, code production, and complex mathematical reasoning. The project encompasses a multilingual language model capable of processing dozens of languages and a specialized code generation model for technical problem solving and debugging. The framework is distinguished by its long context capabilities, enabling the analysis of massive inputs ranging from 256K up to 1 million tokens. It further functions as an agentic framework, utilizing standardized templates and parsers to execute autonomous wo
Processing and extracting information from massive inputs up to one million tokens in a single pass.
Qwen-7B is a pretrained causal language model designed for natural language generation, text processing, and complex reasoning tasks. It is available as an instruction-tuned model optimized for conversational interactions and a tool-use model capable of executing function calls and interacting with external APIs. The project provides a quantized version of the model to reduce GPU memory usage and supports the development of autonomous agents that can execute code and perform functions to complete complex goals. The system covers a wide range of capabilities including model fine-tuning throug
Handles extended input sequences using interpolation and attention scaling for long-context processing.
This project is a collection of implementation guides, recipes, and developer resources for building applications with Llama models. It serves as a comprehensive kit for developing autonomous agents, establishing retrieval-augmented generation systems, and executing model fine-tuning. The resource provides specific patterns for multimodal workflows that process text, images, and audio. It includes specialized guidance on adapting pre-trained model weights for targeted tasks and implementing tool-calling orchestration to connect models with external APIs and functions. The codebase covers a b
Implements capabilities for analyzing extremely large input sequences using expanded context windows.
ChatGLM2-6B is an open-weight large language model designed for natural language conversations and text generation in both English and Chinese. It functions as a bilingual chat model capable of processing and maintaining coherence across text sequences up to 32K tokens. The model is optimized for local deployment through precision quantization, which reduces memory requirements to allow execution on consumer-grade hardware. It supports distributing model weights across multiple graphics cards to handle parameters that exceed the memory of a single device. The project covers capabilities for
Analyzes and extracts information from extensive documents using a 32K token context window.
ChatGLM2-6B is a bilingual chat large language model designed for natural conversation and text generation in both English and Chinese. It functions as a fine-tunable language model that supports updating weights via specialized scripts to adapt to specific datasets and tasks. The project serves as a quantized inference engine and multi-GPU model orchestrator, enabling the execution of large models on consumer-grade hardware. It is capable of processing long context sequences up to 32K tokens to maintain understanding across extended documents. The system covers capabilities for multilingual
Processes and analyzes extended input sequences up to 32K tokens in a single pass.
ChatGLM3 is an open-weights large language model designed for bilingual conversational interactions in English and Chinese. It functions as a tool-augmented system capable of calling external functions and executing internal code to resolve complex tasks. The model utilizes four-bit quantization to reduce memory requirements, enabling inference on consumer hardware and diverse processing units including GPUs and CPUs. It features an expanded context window for processing and summarizing long documents and includes a supervised fine-tuning pipeline for adapting the model to specialized domains
Supports the processing of extremely large input sequences to analyze and summarize long documents.
OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO. The framework distinguishes itself through a distributed inference engine that overlaps sample rollout with training to increase throughput. It supports scaling to models exceeding 70 billion parameters via parameter sharding and handles long-context sequences through ring-attention sequence parallelism. The project
Processes sequences exceeding 8K tokens using ring-attention and sequence parallelism across the compute cluster.
MiniCPM is a collection of small language models designed for local, on-device deployment in resource-constrained environments. The project focuses on running dense Transformer models on consumer hardware, including GPUs, CPUs, and Apple Silicon, without requiring custom code forks. The project distinguishes itself through heavy optimization for edge hardware, utilizing quantized weight compression in GGUF and MLX formats to reduce memory overhead. It implements advanced inference techniques such as speculative sampling and radix-tree prefix caching to accelerate generation speed and throughp
Combines sparse and linear attention with hybrid embeddings to handle million-token windows.
This project is a platform for the deployment of open source large language and multimodal models. It provides a unified interface to serve text, image, and speech models across local or cloud hardware. The system enables distributed AI inference by orchestrating model workloads across multiple nodes and devices. It includes a unified API adapter layer to standardize inputs and outputs, as well as tools for multimodal chat and structural image generation. The platform covers a broad capability surface including request batching for throughput optimization, dynamic model loading, and integrat
Includes benchmarking tools to measure how inference performance scales as input context size increases.
Intel XPU LLM Acceleration Library is a toolkit designed to accelerate large language model inference and finetuning on Intel CPUs, GPUs, and NPUs. It provides a distributed inference engine for scaling models across multiple accelerators, a multimodal model runtime for vision and speech tasks, and a low-bit model quantization tool for converting weights into INT4, FP8, and GGUF formats. The project features a parameter-efficient finetuning framework that enables model adaptation using QLoRA and DPO on Intel hardware. It distinguishes itself by providing specialized optimizations for Intel XP
Processes and generates text using extended context windows on compatible graphics hardware.
vibe-coding-cn is an AI software development workflow and prompt engineering framework designed to transform product ideas into functional applications using natural language. It functions as an AI agent orchestration system that coordinates specialized skills and quality gates to guide the incremental creation of software. The framework distinguishes itself through a project memory system that maintains architectural and design documentation to preserve context during long-term collaborations. It employs a prompt optimization library that utilizes recursive loops, chain-of-thought reasoning,
Organizes large datasets by placing documents at the start of the prompt and requiring source citations.
Yi is a bilingual language model and foundation model designed for natural language processing, reasoning, and reading comprehension in both English and Chinese. It is built as a transformer-based architecture capable of general purpose text generation and conversational tasks. The model is distinguished by its ability to function as a long context system, processing and analyzing extended input sequences up to 200k tokens. It also supports quantized versions that use low-bit precision to reduce memory footprints, enabling execution on consumer-grade hardware. The project covers a broad rang
Processes and analyzes extended input sequences up to 200k tokens in a single pass.
Acest proiect oferă un framework fundamental și o implementare de referință pentru executarea modelării limbajului cauzal și a raționamentului multimodal pe sisteme locale. Include un set de componente de bază pentru gestionarea activelor modelului, un framework de fine-tuning și definițiile structurale necesare pentru a instanția arhitecturi bazate pe transformatoare. Sistemul se distinge prin capacitatea de a procesa intrări combinate de text și imagine prin modele transformatoare multimodale pentru raționament vizual și analiză de documente. De asemenea, suportă implementarea modelelor cuantizate, reducând amprenta de memorie prin tehnici de precizie scăzută pentru a permite inferența pe dispozitive edge. Proiectul acoperă domenii largi de capabilități, inclusiv fine-tuning supervizat și adaptare low-rank pentru personalizarea pe domeniu, precum și un manager de active cuprinzător pentru descărcarea, verificarea și organizarea ponderilor modelelor și a tokenizerelor. Funcționalitățile suplimentare includ generarea de text multilingv, procesarea contextului lung și ancorarea limbajului vizual (visual language grounding).
Handles large volumes of input text in single requests to maintain coherence across extended documents.
This project is a long context inference engine and optimizer designed to process infinite text streams using large language models without memory growth or performance degradation. It serves as a system for maintaining constant memory usage during the generation of text from arbitrarily long input sequences. The implementation utilizes a rolling key-value cache manager and attention sink mechanisms to stabilize the attention process during continuous stream processing. By retaining initial tokens and employing a sliding window of key-value pairs, the system enables constant-time inference an
Generates text from extremely long inputs while keeping memory usage constant and avoiding performance degradation.
InternLM is a large language model and a comprehensive suite of weights designed for text generation and complex reasoning. It functions as an inference engine for serving responses, a fine-tuning framework for adjusting model weights, and a platform for building autonomous AI agents. The system is capable of processing long-context input sequences up to one million tokens for document analysis. It employs chain-of-thought reasoning to solve knowledge-intensive tasks by generating intermediate logic steps before producing a final answer. The project covers model weight optimization through s
Supports processing extended input sequences up to one million tokens for comprehensive document analysis.
This project provides a Chinese large language model based on the LLaMA architecture. It is an instruction-tuned model optimized for natural language processing and multi-turn conversations in Chinese. The system includes a framework for parameter-efficient fine-tuning using low-rank adaptation and quantization to reduce memory requirements. It also implements retrieval augmented generation for local document question answering and supports long-context processing for sequences up to 64K tokens. The project covers a broad set of capabilities including supervised instruction tuning, reinforce
Supports extended input sequences up to 32K tokens using interpolation to maintain coherence.
GLM-4 is a large language model and fine-tuning framework designed for human-like text production, complex reasoning, and multilingual conversation. It functions as a multimodal system capable of processing high-resolution visual content and as a long-context model designed to analyze documents with a context window of up to one million tokens. The project differentiates itself through a function calling interface that enables AI agent development by connecting the model to external APIs and real-time web browsing. It includes specialized capabilities for generating functional programming cod
Analyzes and extracts information from extremely large input sequences in a single pass.
Handles input sequences up to 128,000 tokens for reasoning over entire codebases in a single pass.
Acest proiect este un framework de procesare a limbajului natural axat pe un pre-antrenor autoregresiv generalizat conceput pentru reprezentarea limbajului nesupervizat. Implementează un model de limbaj care combină antrenamentul bazat pe permutare cu un backbone Transformer-XL pentru a funcționa ca un procesor de text cu context lung. Sistemul se distinge prin capacitatea de a gestiona secvențe de text care depășesc limitele standard de lungime prin utilizarea recurenței la nivel de segment și a codificării poziționale relative. Acesta scalează pre-antrenamentul de înaltă performanță pe mai multe GPU-uri și clustere TPU folosind implementări de antrenament distribuit. Codul sursă acoperă întregul flux de lucru de machine learning, inclusiv curățarea textului și tokenizarea subcuvintelor pentru preprocesarea datelor, precum și fine-tuning-ul specific sarcinii pentru răspunsul la întrebări, înțelegerea lecturii și clasificarea textului. Include utilitare pentru optimizarea parametrilor, programarea ratei de învățare și evaluarea probabilităților de răspuns prin metrici de precizie-rechemare. Proiectul oferă configurații pentru gestionarea hiperparametrilor modelului și antrenamentul accelerat hardware pe mai multe gazde.
Processes and analyzes sequences of text that exceed standard length limits by managing long-range dependencies.
This is an open-source Python SDK for building and orchestrating production-grade AI agents. It provides a unified framework for creating conversational agents that can use tools, maintain state, and coordinate across multiple language model providers including OpenAI, Anthropic, Google, Amazon Bedrock, and locally-hosted models. The SDK supports multi-agent orchestration through graphs, teams, and swarms, allowing several specialized agents to collaborate on complex tasks. Agents can be composed as callable tools that other agents invoke, and the framework includes policy handlers that inspe
Processes documents up to one million tokens in a single context for summarization and analysis of lengthy texts.