Pdfcraft is a containerized service for self-managed PDF processing, editing, and conversion. It provides a toolkit for document manipulation, a multi-format converter, and OCR software to transform scanned documents into searchable and editable text. The project features a visual, node-based workflow editor that allows users to build automated pipelines by chaining together various PDF conversion and optimization operations. The service covers a broad range of capabilities, including document management for merging and splitting files, format conversion between PDFs and office documents or
PyMuPDF is a comprehensive PDF manipulation library and document analysis tool. It serves as a text extraction tool, OCR engine, and image converter, providing a programmatic interface to edit, merge, split, and optimize PDF and Office documents. The project distinguishes itself through high-performance capabilities, including the use of C-bindings for low-level manipulation and parallelized page processing to accelerate workloads. It provides specialized conversion paths, such as transforming PDF content into Markdown for retrieval-augmented generation and large language model pipelines. It
Docling is a multimodal content converter and document parser designed to transform PDFs, Office files, and HTML into structured Markdown or JSON for generative AI applications. It functions as an OCR document processor and a PDF layout analyzer that extracts tables, charts, and hierarchical structures while preserving the original page layout. The system operates as a local-first inference engine, allowing for the processing of sensitive data in air-gapped environments without external network connectivity. It can also be deployed as an API or a Model Context Protocol server to provide parsi
Acontext is an LLM orchestration backend and agent memory framework designed to manage session state and knowledge for AI agents. It functions as a context manager and orchestration layer that integrates model providers with a secure code sandbox and a zero-knowledge data store. The project is distinguished by its approach to knowledge distillation, capturing agent learnings as reusable Markdown skills and structured memory files. It provides a secure execution environment where shell commands and scripts run in isolated containers with the ability to mount these persistent skill files direct