Mistral Src

Mistral Src - run multimodal LLM inference | Awesome Repos

Features

AI Model Inference - Executes large language models to generate text, solve mathematical problems, and provide coding assistance.
Inference Libraries - Provides a programmatic library to load and execute large language models for text generation and problem solving.
Multimodal Frameworks - Ships a framework for processing combined image and text inputs to describe visual content and answer questions.
Distributed Model Execution - Spreads large model workloads across multiple GPUs to increase processing speed and memory capacity.
Function Calling Interfaces - Provides interfaces that enable language models to execute external tools and API functions.
Large Language Models - Provides a programmatic interface for running Mistral models to generate text and solve problems.
Multimodal Inference Engines - Ships an engine capable of processing combined image and text inputs to describe visual content.
Model Inference Execution - Executes model weights through a processing pipeline to generate text completions and predictions.
Multi-GPU Inference Runtimes - Implements a runtime that distributes model execution across multiple GPUs using tensor parallelism to handle large model weights.
Multi-GPU Distribution - Employs techniques to split model parameters across multiple graphics cards to overcome memory limitations and increase speed.
Tensor Parallelism - Splits large model weight matrices across multiple GPUs to distribute memory load.
Tool-Use Function Mapping - Maps model-generated structured text to external API calls to extend capabilities beyond internal knowledge.
Multimodal Analysis Engines - Processes images and text together to describe visual content or answer questions about images.
Fill-In-Middle Sequence Masking - Adjusts attention masks to enable the prediction of missing tokens between two existing blocks of text.
Input Sequence Attentions - Calculates weighted relationships between tokens to determine the context for the next prediction.
Distribution-Based Sampling - Selects tokens from a probability distribution using temperature and top-p filtering.
Fill-In-The-Middle Coding - Predicts and inserts missing code segments within existing text blocks to assist with software development.
Code In-filling - Implements fill-in-the-middle techniques to predict and insert missing code segments within existing text blocks.
KV Cache Management - Manages key-value caches for transformer models to avoid redundant calculations during text generation.
Tool-Using Model Inference - Enables the model to reason about and trigger external function calls to extend capabilities beyond text generation.
Interactive Model Inference Sessions - Provides a command-line interface for maintaining interactive conversational sessions with models.
Model-to-Image Packaging - Provides utilities to create container images for serving high-performance inference engines.
Containerized Model Serving - Ships pre-packaged images and dependencies for serving inference engines in isolated container environments.
GPU Linear Algebra Libraries - Offloads matrix multiplications to specialized GPU kernels for high-throughput inference.
Large Language Models - Reference implementation for Mistral model architectures.
Large Language Models (LLMs) - Listed in the “Large Language Models (LLMs)” section of the The Incredible Pytorch awesome list.

Open-source alternatives to Mistral Src

Similar open-source projects, ranked by how many features they share with Mistral Src.

facebookresearch/llama
facebookresearch/llama
59,466View on GitHub
Llama is a large language model runtime and inference engine designed to load and execute autoregressive transformer models. It enables the generation of natural language text completions from prompts using pretrained weights. The system features multi-GPU model parallelism, which distributes model weights and workloads across multiple graphics processors to support larger parameter counts. It also incorporates a content safety filter that uses classifiers to intercept and block unsafe inputs or outputs during the inference process. The project covers broad capabilities in distributed model
Python
View on GitHub59,466
pytorch-labs/gpt-fast
pytorch-labs/gpt-fast
6,225View on GitHub
gpt-fast is a PyTorch transformer inference engine designed for low-latency text generation. It functions as a distributed GPU inference library, a quantized model runner, and a speculative decoding framework. The system utilizes a speculative decoding workflow where a small draft model predicts token sequences for verification by a larger model to accelerate generation. It supports quantized model execution to reduce memory footprint and implements tensor parallelism to split computations across multiple GPUs. The project includes a standardized evaluation harness to measure the accuracy an
Python
View on GitHub6,225
intel/ipex-llm
intel/ipex-llm
8,836View on GitHub
Intel XPU LLM Acceleration Library is a toolkit designed to accelerate large language model inference and finetuning on Intel CPUs, GPUs, and NPUs. It provides a distributed inference engine for scaling models across multiple accelerators, a multimodal model runtime for vision and speech tasks, and a low-bit model quantization tool for converting weights into INT4, FP8, and GGUF formats. The project features a parameter-efficient finetuning framework that enables model adaptation using QLoRA and DPO on Intel hardware. It distinguishes itself by providing specialized optimizations for Intel XP
Python
View on GitHub8,836
modeltc/lightllm
ModelTC/LightLLM
3,901View on GitHub
LightLLM is a high-performance serving framework for deploying and executing large language models. It functions as a multi-GPU inference engine and server capable of handling dense architectures, mixture-of-experts designs, and multimodal models that process both text and images. The system is distinguished by its specialized support for Mixture-of-Experts models using expert parallelism and fused kernels. It implements structured text generation through deterministic state machines and pushdown automata to enforce precise output formats. To optimize throughput, the framework employs specula
Pythondeep-learninggptllama
View on GitHub3,901

See all 30 alternatives to Mistral Src

mistralaimistral-srcArchived

Features

Open-source alternatives to Mistral Src

facebookresearch/llama

pytorch-labs/gpt-fast

intel/ipex-llm

ModelTC/LightLLM

Star history

Open-source alternatives to Mistral Src

facebookresearch/llama

pytorch-labs/gpt-fast

intel/ipex-llm

ModelTC/LightLLM