SakuraLLM | Awesome Repository

SakuraLLM

SakuraLLM is a multi-format document translation system that hosts large language models for translating Japanese text into other languages. It functions as an inference server that exposes translation models through an OpenAI-compatible API, allowing any tool supporting the OpenAI client format to send translation requests. The system is designed as a glossary-aware translation engine that applies user-defined term dictionaries to ensure consistent translation of proper nouns and names across outputs.

The project distinguishes itself by supporting multiple high-performance inference backends including llama.cpp, vLLM, and Ollama, enabling flexible deployment across consumer CPU and GPU hardware. It features a format-preserving translation pipeline that extracts, translates, and reassembles text from structured formats like ebooks and subtitles while retaining timestamps, line breaks, and markup. The system also supports CPU-GPU hybrid inference for memory-constrained setups, tensor parallel multi-GPU distribution for larger models, and token probability filtering to refine translation precision.

SakuraLLM provides translation capabilities for ebooks, subtitles, visual novels, galgames, RPG Maker games, manga, and plain-text novels. It processes documents by dividing long texts into manageable segments, translating each segment through the language model, and reassembling the output with original formatting intact. The system includes glossary management for maintaining terminology consistency, degeneration detection that monitors token generation and retries with adjusted parameters when output quality degrades, and multi-threaded inference for improved throughput.

The project offers a Docker-based deployment with API authentication and supports running on consumer NVIDIA and AMD GPUs.

Features

LLM Inference Servers - Hosts large language models behind an OpenAI-compatible API with multi-GPU support and multiple backend engines.

OpenAI-Compatible APIs - Exposes the translation model through an OpenAI-format API for integration with any OpenAI-compatible client.

EPUB and Subtitle Translation - Processes Japanese ebooks and subtitle files through an inference server while maintaining structural integrity.

Format-Preserving Pipelines - Extracts, translates, and reassembles text from EPUBs and subtitles while retaining timestamps, line breaks, and markup.

Subtitle Translation - Converts subtitle files via language model inference while preserving timestamps and formatting.

Japanese Text Translation - Provides a specialized pipeline for translating Japanese text using large language models while preserving formatting and timestamps.

Japanese Text Translations - Translates Japanese text into Chinese using a large language model, preserving formatting and timestamps.

Multi-Format Document Translators - Translates Japanese ebooks, subtitles, and visual novel scripts while preserving original formatting and timestamps.

Large Language Model Serving - Hosts a large language model behind an HTTP API so clients can send text and receive translated output.

Translation Servers - Hosts large language models for translating Japanese text into other languages via a standardized API.

EPUB Translation Pipelines - Processes EPUB files through a translation pipeline that extracts, translates, and reassembles content while maintaining structure.

Terminology-Aware Translation - Uses a glossary to keep proper nouns and character names consistent across translations.

Translation Servers - Starts a Docker-based inference server with authentication that accepts translation requests via HTTP.

Translation Term Mapping - Accepts a user-provided term dictionary to enforce consistent translation of specific names and phrases.

Translation - Applies a user-defined glossary to maintain consistent translation of proper nouns and pronouns.

Translation API Endpoints - Serves a standardized API endpoint that other translation tools can call to use the model.

Translation API Hosting - Loads a large language model and exposes it over an API so other tools can send text for translation.

Ollama Model Runners - Loads models from the Ollama library using Docker for simplified installation and management.

Ollama Model Servers - Pulls and runs models from the Ollama library, managing them with Docker for simplified deployment.

Multi-Backend GPU Inference Engines - Supports llama.cpp, vLLM, and Ollama backends for flexible deployment across CPU and GPU hardware.

Manga Translation Pipelines - Translates Japanese manga into Chinese, preserving the original image layout and text placement.

CPU-GPU Hybrid Runtimes - Enables translation on consumer hardware by offloading model parts to CPU when GPU memory is insufficient.

vLLM Backend Runners - Loads full-precision models using the vLLM backend with PagedAttention and tensor parallel multi-GPU acceleration.

Game Script Translations - Integrates with Galgame translation tools to provide real-time or offline translation of game dialogue and text.

Visual Novel Script Translators - Translates visual novel and galgame scripts while preserving inline formatting, control characters, and ruby annotations.

Tensor Parallelism - Splits model computation across multiple GPUs using tensor parallelism to accelerate inference and handle larger models.

Galgame Translators - Integrates with galgame translation tools to provide real-time or offline translation of game dialogue.

Ebook Translation Pipelines - Processes ebook content by extracting text, translating it, and reassembling the result while preserving the original structure.

Segmented Processing Pipelines - Divides long documents into manageable segments for translation, then reassembles them with original formatting intact.

Consumer GPU Optimizations - Runs the translation model on NVIDIA and AMD GPUs with CPU-GPU hybrid inference for lower-memory setups.

llama.cpp Backend Runners - Loads quantized GGUF models using the llama.cpp backend for efficient CPU and GPU inference.

RPG Servers - Supports RPG game translation through integration with batch translation tools for RPGMaker.

RPG Game Text Translators - Integrates with RPG translation tools to automatically translate in-game text for RPG Maker engines.

Visual Novel Script Translators - Translates visual novel scripts while preserving inline line breaks, control characters, and ruby annotations.

Document Format Preservations - Keeps original line breaks, timestamps, and markup intact when translating subtitles or ebooks.

Light Novel Translators - Processes e-book files and web novel content to produce machine-translated Chinese versions while maintaining formatting.

Novel Translators - Reads a plain-text novel file, splits it into segments, and sends each segment to a language model for translation.

Visual Novel Script Translations - Processes visual novel scripts, preserving inline line breaks, control characters, and ruby annotations.

llama.cpp Backend Servers - Loads quantized GGUF models and runs them on GPU with the llama.cpp backend for efficient inference serving.

vLLM Backend Servers - Loads full-precision models and distributes inference across multiple GPUs using tensor parallelism for faster throughput.

Subtitle Translation Utilities - Converts subtitle files via language model inference while preserving timestamps and formatting for media localization.