2 repos
Automated workflows for ingesting, parsing, and vectorizing documents for semantic analysis.
Explore 2 awesome GitHub repositories matching data & databases · Document Intelligence Pipelines. Refine with filters or upvote what's useful.
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov
Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing