BabelDOC is a technical document translation system designed to translate PDF files while preserving their original layout and styling. It functions as a layout-preserving translator that utilizes large language models to convert content into target languages, specifically tailored for scientific and technical documents.
The system distinguishes itself through specialized handling of academic content, including the identification and preservation of mathematical formulas and complex layout structures. It ensures technical accuracy by employing glossary-driven terminology enforcement, using source-to-target mappings to maintain consistency across translated text.
The software covers a broad range of document processing capabilities, including PDF content extraction, spatial-based text reconstruction, and layout detection. It supports both monolingual and bilingual PDF generation, allowing for side-by-side comparisons of original and translated content through coordinate-normalized layout reflow.
The system uses TOML-based configuration files to manage processing pipelines and supports offline asset management for deployment in air-gapped environments.