What are the best Awesome Dynamic Inference Batching GitHub Repositories?

Question 1

Accepted Answer

Combines short requests into batches and splits long sequences across GPUs to balance throughput and latency.

**Distinct from Request Batching:** Distinct from Request Batching: focuses on dynamic batching for inference workloads with sequence splitting, not general data operation batching.

Explore 9 awesome GitHub repositories matching data & databases · Dynamic Inference Batching. Refine with filters or upvote what's useful. Top picks: wang-xinyu/tensorrtx, infrasys-ai/aiinfra, nvidia/isaac…

Question 2

Why is wang-xinyu/tensorrtx a recommended Dynamic Inference Batching GitHub Repositories repository?

Accepted Answer

Implements dynamic batching for inference workloads to optimize the balance between throughput and latency.

Question 3

Why is infrasys-ai/aiinfra a recommended Dynamic Inference Batching GitHub Repositories repository?

Accepted Answer

Combines short requests into batches and splits long sequences across GPUs for balanced throughput.

Question 4

Why is nvidia/isaac-gr00t a recommended Dynamic Inference Batching GitHub Repositories repository?

Accepted Answer

Combines dynamic batching and concurrent execution to maximize hardware utilization during model serving.

Question 5

Why is imoneoi/openchat a recommended Dynamic Inference Batching GitHub Repositories repository?

Accepted Answer

Uses dynamic request batching to group multiple API requests into a single inference pass for higher throughput.

Question 6

Why is ztxz16/fastllm a recommended Dynamic Inference Batching GitHub Repositories repository?

Accepted Answer

Groups multiple incoming requests into single execution passes to maximize GPU utilization and reduce token latency.

Question 7

Why is tianxiaomo/pytorch-yolov4 a recommended Dynamic Inference Batching GitHub Repositories repository?

Accepted Answer

Supports both static and dynamic batch configurations to optimize GPU memory usage and inference throughput.

Question 8

Why is opennmt/ctranslate2 a recommended Dynamic Inference Batching GitHub Repositories repository?

Accepted Answer

Processes multiple requests in parallel across CPU cores or GPUs, with dynamic memory allocation per batch size.

Question 9

Why is thudm/slime a recommended Dynamic Inference Batching GitHub Repositories repository?

Accepted Answer

Packs variable-length sequences into batches up to a token limit per GPU, preserving per-sample loss while maximizing throughput.

Question 10

Why is lightning-ai/litserve a recommended Dynamic Inference Batching GitHub Repositories repository?

Accepted Answer

Implements a dynamic-batching request queue to maximize GPU throughput by grouping individual requests.

Awesome GitHub RepositoriesDynamic Inference Batching

wang-xinyu/tensorrtx

Infrasys-AI/AIInfra

NVIDIA/Isaac-GR00T

imoneoi/openchat

ztxz16/fastllm

Tianxiaomo/pytorch-YOLOv4

OpenNMT/CTranslate2

THUDM/slime

Lightning-AI/LitServe

Unter-Tags erkunden