Chandra | Awesome Repository

sChandra is a document processing platform that converts images, PDFs, Word documents, spreadsheets, and other formats into structured output such as HTML, Markdown, or JSON while preserving layout. It can also extract specific data fields from invoices, contracts, or reports using user-defined JSON schemas, with citations back to source locations. The service supports form filling in PDF and image documents, document generation from Markdown, and extraction of tracked changes from Word files.

The platform distinguishes itself with pipeline-based processing chains that combine multiple processing steps into versioned, reusable pipelines, managed through draft, saved, and published states. These pipelines can execute as single requests with runtime parameter overrides and webhook callbacks for asynchronous completion. For batch workloads, documents can be processed in single requests to improve throughput, and PDF segmentation splits combined or batch-scanned documents into logical sections. Security controls include API key management, data usage preferences, result auto-expiration, and authenticated webhook delivery with cryptographic signatures.

Additional capabilities include a typed Python SDK, automatic request retry with exponential backoff, file collection management, API health checks, and request analytics monitoring for self-hosted deployments. The service can be deployed on-premises in a containerized setup with restricted network access, TLS termination, and authentication.

Features

Document Conversion - Converts PDFs, images, Office files, and ebooks into structured HTML, Markdown, or JSON while preserving layout.
Structured Document Extraction - Converts PDFs, images, and Office files into structured HTML, Markdown, or JSON while preserving layout.
Document AI Containers - Ships a self-hosted container for on-premises document conversion, extraction, and analytics.
Schema-Driven Extraction - Extracts structured data from documents by applying user-defined JSON schemas and returning citations to source locations.

Features

Document Conversion - Converts PDFs, images, Office files, and ebooks into structured HTML, Markdown, or JSON while preserving layout.
Structured Document Extraction - Converts PDFs, images, and Office files into structured HTML, Markdown, or JSON while preserving layout.
Document AI Containers - Ships a self-hosted container for on-premises document conversion, extraction, and analytics.
Schema-Driven Extraction - Extracts structured data from documents by applying user-defined JSON schemas and returning citations to source locations.