1 مستودع
Concurrent processing of multiple file paths for text extraction in a single batch operation.
Distinct from Parallel Batch Processing: Distinct from Parallel Batch Processing: focuses on document file extraction rather than large-scale data processing with grouping keys.
Explore 1 awesome GitHub repository matching data & databases · Document Batch Processors. Refine with filters or upvote what's useful.
Kreuzberg is a document extraction engine that converts PDFs, Office files, images, and over 90 other formats into clean, structured text and metadata. It is built around a compiled Rust core that can be used as a native library, a command-line tool, a REST API server, or a WebAssembly module for browser-based processing. The system is designed to run entirely on self-hosted infrastructure, with no data leaving the user's environment. What distinguishes Kreuzberg is its breadth of integration surfaces and its pipeline architecture. It exposes extraction capabilities through native bindings fo
Processes multiple file paths concurrently for text extraction in a single batch operation.