This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to provide context-aware responses for chat and completion requests.
The system distinguishes itself through a database-agnostic abstraction layer that supports various storage backends, ranging from local disk storage to enterprise-grade vector databases. It offers flexible deployment options, enabling users to run language models entirely on private hardware or connect to external cloud-based providers through a unified interface. To improve the quality of generated output, the engine incorporates reranking logic that refines retrieved document chunks before they are processed by the language model.
The platform includes a comprehensive suite of tools for managing document intelligence pipelines, including automated parsing, text chunking, and embedding generation. Users can configure the system through environment-based profiles to match specific hardware capabilities, such as CPU or GPU-accelerated setups, and stream responses in real time to reduce latency.
The application is configured via runtime settings files and environment variables, with support for building custom container images to suit specific deployment requirements.