# Document and Office Format Converters

> Search results for `convert between document and office formats` on awesome-repositories.com. 117 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/convert-between-document-and-office-formats

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/convert-between-document-and-office-formats).**

## Results

- [avelino/awesome-go](https://awesome-repositories.com/repository/avelino-awesome-go.md) (175,576 ⭐) — This project serves as a comprehensive language ecosystem index, functioning as a centralized, community-curated directory for the Go programming language. It organizes a vast landscape of software components, libraries, and development tools into a structured, navigable hierarchy, enabling developers to efficiently discover resources tailored to specific functional domains.

The repository distinguishes itself through a decentralized contribution model, where community-driven updates ensure the index remains current with the rapidly evolving software landscape. Beyond simple resource listing, it acts as a technical knowledge repository, aggregating professional literature, style guides, and best practices to support developer onboarding and professional growth across the entire software development lifecycle.

The directory covers a broad capability surface, including essential utilities for distributed systems engineering, application security, data processing, and development productivity. It provides access to specialized tools for database management, web framework integration, testing, and build automation, alongside educational materials that help developers master language-specific architectural patterns.

The project is maintained as a static resource aggregation, providing a holistic view of external links and documentation to orient developers within the Go ecosystem.
- [awesome-selfhosted/awesome-selfhosted](https://awesome-repositories.com/repository/awesome-selfhosted-awesome-selfhosted.md) (299,516 ⭐) — This project is a community-curated directory of open-source software designed for deployment in private server environments and home labs. It serves as a comprehensive resource for discovering independent, self-hosted alternatives to mainstream cloud services, enabling users to maintain full data ownership and control over their digital infrastructure.

The directory is structured through a hierarchical taxonomy that organizes a vast collection of applications into logical categories, ranging from media management and data analytics to private communication and team productivity tools. It distinguishes itself through a collaborative peer-review process, where community members validate the quality and relevance of each submission to ensure the directory remains accurate and reliable.

The project covers a broad capability surface, including infrastructure automation, container-based service deployment, and declarative configuration management. These tools assist users in maintaining reproducible server environments and managing complex service dependencies across private hardware.

The directory is maintained as a version-controlled repository, ensuring that all updates and community-driven changes are tracked and transparent.
- [codehubapp/codehub](https://awesome-repositories.com/repository/codehubapp-codehub.md) (22,662 ⭐) — CodeHub is a mobile application designed for managing remote repositories and reviewing code changes directly from a smartphone or tablet. It functions as a mobile client for GitHub, enabling users to browse repositories, monitor project progress, and interact with pull requests while away from a desktop computer.

Beyond its repository management capabilities, the application serves as a document conversion utility and software comparison platform. It provides tools for transforming files between various formats while maintaining formatting integrity, as well as resources for evaluating and ranking conversion services based on performance, pricing, and technical requirements.

The application supports collaborative workflows by allowing users to inspect code diffs and provide feedback on specific lines within a changeset. It also includes features for optimizing document conversion workflows, including guidance for batch processing and troubleshooting common formatting or compatibility issues.
- [run-llama/liteparse](https://awesome-repositories.com/repository/run-llama-liteparse.md) (10,782 ⭐) — A fast, helpful, and open-source document parser
- [deanishe/alfred-convert](https://awesome-repositories.com/repository/deanishe-alfred-convert.md) (714 ⭐) — Convert between different units in Alfred
- [docling-project/docling](https://awesome-repositories.com/repository/docling-project-docling.md) (61,674 ⭐) — Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing diverse input formats into a consistent internal representation, the library enables uniform processing across various document types.

The project distinguishes itself through a schema-driven approach that maps document regions to strongly-typed objects, ensuring data accuracy through validation against predefined templates. Its pipeline-based architecture supports pluggable processing backends, allowing for the dynamic integration of specialized engines for optical character recognition and complex visual layout analysis. Users can control parsing behavior and extraction parameters through declarative configuration files, facilitating integration into automated workflows and server-based architectures.

The library provides both a programmatic interface and a command-line toolkit to support automated document processing and format conversion. It utilizes optional dependency management to allow for modular installation of specific features, such as media rendering or advanced processing capabilities, depending on the requirements of the application.
- [haxefoundation/format](https://awesome-repositories.com/repository/haxefoundation-format.md) (0 ⭐) — The format library contains support for different file-formats for the Haxe programming language.
- [unidoc/unioffice](https://awesome-repositories.com/repository/unidoc-unioffice.md) (4,809 ⭐) — unioffice is a comprehensive document processing suite that provides a PDF document processor, an Open XML document library, a document security toolkit, and a document content extractor. It is designed to programmatically create, read, and modify Word, Excel, and PowerPoint files, as well as generate and edit PDF documents.

The project is distinguished by its native language implementation of the Open XML standard, which removes native binary dependencies to simplify container deployments. It features advanced capabilities for digital document security, including hardware-based PDF signing, content encryption, and sensitive information redaction using regular expressions.

The library covers a broad range of capabilities including the generation and manipulation of spreadsheets with formulas and charts, the creation of presentations, and the editing of Word documents. It also provides tools for PDF form automation, HTML to PDF conversion, PDF/A compliance validation, and AI-powered structured data extraction from unstructured documents.
- [portswigger/office-open-xml-editor](https://awesome-repositories.com/repository/portswigger-office-open-xml-editor.md) (13 ⭐) — Burp extension that add a tab to edit Office Open XML document (xlsx,docx,pptx)
- [iofficeai/aionui](https://awesome-repositories.com/repository/iofficeai-aionui.md) (16,621 ⭐) — AionUi is an AI agent orchestration platform designed to manage and coordinate multiple autonomous assistants within a local environment. It functions as a framework for executing background processes and scheduled tasks that operate independently of the user interface, ensuring that automated workflows continue to run without manual oversight.

The platform distinguishes itself through a local-first approach to document generation and file manipulation, allowing users to create and modify office files directly on their hardware to maintain data privacy. It supports parallel agent execution, enabling multiple specialized assistants to operate concurrently within a shared workspace while accessing a centralized registry of functional tools.

Beyond core orchestration, the system provides a management layer that connects external messaging platforms to local agents. This integration allows for remote task triggering and monitoring, providing control over automated workflows even when away from the primary workstation. The architecture relies on role-based scoping to ensure that each assistant operates with defined permissions and domain expertise.
- [dokploy/dokploy](https://awesome-repositories.com/repository/dokploy-dokploy.md) (34,901 ⭐) — Dokploy is a self-hosted platform-as-a-service designed to simplify the deployment and management of containerized applications and databases. It provides a centralized control plane that decouples administrative management from application workloads, allowing users to oversee infrastructure across multiple server nodes through a unified web interface or a command-line tool.

The platform distinguishes itself through an extensive library of pre-configured application templates, enabling the rapid deployment of databases, identity providers, and various productivity or development tools. It supports complex orchestration by allowing users to define multi-container services using standard configuration files, which can be managed through automated build pipelines, Git integration, and real-time performance monitoring.

Beyond core deployment, the system includes robust infrastructure management capabilities such as automated backups to external object storage, horizontal and vertical scaling, and granular access control. It also provides secure configuration management, including environment variable synchronization, HTTPS certificate handling, and zero-downtime deployment strategies to ensure application stability and security.

The platform is designed for ease of use, offering an interactive API documentation interface and instructional resources to guide users through installation and configuration. It supports a wide range of modern web frameworks and runtimes, providing a flexible environment for hosting and maintaining services on private server hardware.
- [thangchung/awesome-dotnet-core](https://awesome-repositories.com/repository/thangchung-awesome-dotnet-core.md) (21,277 ⭐) — This project is a curated, community-driven directory of frameworks, libraries, and development tools designed for the .NET ecosystem. It serves as a comprehensive resource index for developers seeking to build, maintain, and scale software projects using .NET technologies.

The collection provides a structured catalog of utilities that support the full software development lifecycle. It covers essential capability areas including web service development, data persistence integration, and system observability. The directory also highlights tools for managing application dependencies, implementing identity and access control, and automating build and deployment pipelines.

Beyond core infrastructure, the repository includes resources for background task scheduling, media and document processing, and performance optimization. This index is maintained as an open-source reference to assist in discovering relevant components for modular application architecture and cross-platform development.
- [dotnet/format](https://awesome-repositories.com/repository/dotnet-format.md) (1,947 ⭐) — Home for the dotnet-format command
- [denoland/deno](https://awesome-repositories.com/repository/denoland-deno.md) (107,110 ⭐) — Deno is a high-performance runtime for JavaScript and TypeScript that prioritizes security and developer productivity. Built on the V8 engine, it provides a secure execution environment that enforces a default-deny security model, requiring explicit user authorization for access to system resources like the file system, network, and environment variables. The runtime natively supports modern web-standard APIs, ensuring consistent behavior and portability across different environments.

What distinguishes Deno is its integrated approach to the software development lifecycle. It bundles essential utilities—including a formatter, linter, test runner, and dependency manager—directly into the runtime, eliminating the need for external build tools or complex transpilation steps. The platform features a universal module resolution system that supports remote HTTPS URLs, local paths, and standard package registries, all backed by lockfiles to ensure build determinism and supply chain security.

Beyond its core runtime capabilities, Deno includes a built-in, persistent key-value database engine that supports atomic transactions and reactive data monitoring. It also provides a robust compatibility layer for the Node.js ecosystem, allowing for the seamless execution of legacy modules and native binary addons. For multi-tenant or distributed applications, the runtime offers isolated sandbox environments that manage resource constraints and security boundaries, facilitating secure code execution in shared infrastructure.

The project is distributed as a single binary, providing a unified toolchain for managing dependencies, executing tasks, and configuring runtime security policies.
- [benyamindsmith/ig.degree.betweenness](https://awesome-repositories.com/repository/benyamindsmith-ig-degree-betweenness.md) (40 ⭐) — Implementation of the "Node Degree+Edge" Betweenness Community Detection Algorithm for 'igraph' Objects with R
- [rustcrypto/formats](https://awesome-repositories.com/repository/rustcrypto-formats.md) (322 ⭐) — Cryptography-related format encoders/decoders: DER, PEM, PKCS, PKIX
- [cinnamon/kotaemon](https://awesome-repositories.com/repository/cinnamon-kotaemon.md) (25,139 ⭐) — Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document processing, retrieval-augmented generation, and multi-step reasoning. It provides a comprehensive platform for developing document-based question answering systems, allowing users to chain language models, prompt templates, and external tools into complex, automated pipelines.

The system distinguishes itself through a highly modular architecture that emphasizes component-based composition and schema-driven data exchange. It supports autonomous agents capable of decomposing complex queries through iterative processing and tool-calling, while its hybrid retrieval orchestration combines vector similarity and full-text search with re-ranking to improve the accuracy of retrieved context. The framework also features event-driven streaming, which delivers incremental results from long-running pipelines to the user interface in real-time.

Beyond its core reasoning capabilities, the platform includes a suite of functional modules for the entire lifecycle of document-based applications. This includes multi-modal parsing for extracting text, tables, and visual elements from diverse file formats, as well as administrative tools for managing document collections, vector stores, and multi-user access. The system is designed to be interface-agnostic, allowing developers to wrap third-party libraries and external services into standardized, reusable processing units.

The project provides a web-based user interface for interactive querying and configuration, and it supports deployment of private, isolated instances through predefined templates.
- [kreuzberg-dev/kreuzberg](https://awesome-repositories.com/repository/kreuzberg-dev-kreuzberg.md) (8,527 ⭐) — Kreuzberg is a document extraction engine that converts PDFs, Office files, images, and over 90 other formats into clean, structured text and metadata. It is built around a compiled Rust core that can be used as a native library, a command-line tool, a REST API server, or a WebAssembly module for browser-based processing. The system is designed to run entirely on self-hosted infrastructure, with no data leaving the user's environment.

What distinguishes Kreuzberg is its breadth of integration surfaces and its pipeline architecture. It exposes extraction capabilities through native bindings for 18 programming languages, a Model Context Protocol (MCP) server for direct AI agent integration, and a REST API with an OpenAPI schema. The extraction pipeline is plugin-based and configurable, supporting multiple OCR backends (Tesseract, PaddleOCR, EasyOCR, and vision-language models) with quality-based fallback, parallel batch processing with work-stealing, and ONNX Runtime model inference with hardware acceleration for CPU, GPU, or NPU.

Beyond core text extraction, Kreuzberg provides a document enrichment pipeline that includes page classification, named entity recognition, summarization, translation, captioning, and PII redaction. It prepares content for retrieval-augmented generation (RAG) workflows by chunking text, generating vector embeddings, and reranking results. The system also supports structured data extraction via LLMs, source code extraction from 306 programming languages, and transcription of audio and video files using Whisper ONNX models.

The project is available as a library installable via standard package managers, a CLI tool installable via Homebrew or Docker, and a production-ready deployment option with a Helm chart for Kubernetes.
- [ffmpeg/ffmpeg](https://awesome-repositories.com/repository/ffmpeg-ffmpeg.md) (61,176 ⭐) — FFmpeg is a cross-platform multimedia framework designed for the recording, conversion, and streaming of audio and video content. It functions as a comprehensive toolkit that provides both a command-line utility for direct media manipulation and a collection of low-level libraries for integration into custom applications. At its core, the project utilizes a packet-based stream engine and a format-agnostic abstraction layer to handle diverse media standards, containers, and network protocols.

The framework distinguishes itself through a modular, graph-based filter execution model that allows for complex, non-linear transformations of audio and video frames. It supports high-performance processing by offloading intensive encoding and decoding tasks to dedicated hardware and utilizing threaded parallel processing to maximize throughput across multiple processor cores. This architecture enables users to construct intricate pipelines for tasks ranging from simple format conversion to advanced real-time media filtering and analysis.

Beyond core transcoding, the project covers a broad functional surface including live streaming, hardware device capture, and secure network transport. It provides extensive capabilities for metadata management, subtitle processing, and stream synchronization, alongside diagnostic tools for inspecting media integrity and performance. The system is highly extensible, allowing for the dynamic integration of external codecs and third-party libraries to support specialized media requirements.
- [documentationjs/documentation](https://awesome-repositories.com/repository/documentationjs-documentation.md) (5,798 ⭐) — :book: documentation for modern JavaScript
- [llmware-ai/llmware](https://awesome-repositories.com/repository/llmware-ai-llmware.md) (14,838 ⭐) — llmware is a Python framework for AI agent orchestration and model management, designed to coordinate multi-model workflows and autonomous agents. It provides a unified model catalog and standardized interface to execute specialized language models for complex research, analysis, and structured data generation.

The project distinguishes itself through its heavy emphasis on local execution and quantized inference, allowing models to run on private infrastructure using CPU, GPU, and NPU acceleration via runtimes like ONNX and OpenVino. It features a specialized ability to translate natural language queries into structured SQL or CSV formats by analyzing database schemas.

The framework covers a broad range of capabilities including end-to-end retrieval-augmented generation pipelines, hybrid search engines, and multimodal content processing for PDFs, Office documents, audio, and images. It also incorporates tools for structured function calling, named entity recognition, and text risk classification to detect toxicity and prompt injections.

The system integrates with various SQL and vector database backends to manage knowledge collection indexing and document embeddings.
- [smjonas/snippet-converter.nvim](https://awesome-repositories.com/repository/smjonas-snippet-converter-nvim.md) (183 ⭐) — Bundle snippets from multiple sources and convert them to your format of choice.
- [gohugoio/hugo](https://awesome-repositories.com/repository/gohugoio-hugo.md) (88,701 ⭐) — Hugo is a high-performance static site generator that transforms source content and templates into optimized web assets. Built with a focus on speed and scalability, it provides a comprehensive framework for managing large-scale documentation and editorial projects through structured content organization, taxonomies, and a flexible template-driven rendering engine.

The project distinguishes itself through a sophisticated build system that utilizes incremental caching to minimize redundant processing during site updates. It supports complex content requirements by enabling multidimensional modeling, which allows for the generation of diverse page variations from a single source, and multi-format output rendering that can produce HTML, JSON, RSS, or CSV simultaneously. Authors can extend their content using a modular shortcode system, while the integrated asset pipeline handles the transformation, minification, and optimization of images and stylesheets directly within the build lifecycle.

Beyond its core generation capabilities, Hugo offers a robust command-line interface for managing the entire project lifecycle, including real-time development previews and automated deployment workflows. The system also features a modular dependency architecture, allowing users to import and version shared themes, layouts, and configuration components to maintain consistent design systems across multiple projects.
- [fffaraz/awesome-cpp](https://awesome-repositories.com/repository/fffaraz-awesome-cpp.md) (71,817 ⭐) — This project is a comprehensive, curated directory of high-quality libraries, tools, and educational resources for C and C++ development. It serves as an ecosystem discovery index, helping developers navigate the vast landscape of third-party components, frameworks, and technical documentation available for the language.

The collection is distinguished by its focus on high-performance systems programming and technical mastery. It provides deep coverage of specialized domains including SIMD-accelerated data processing, compile-time template metaprogramming, and asynchronous event-driven architectures. The repository also acts as a developer knowledge base, offering access to industry-standard coding guidelines, conference materials, and academic papers that support professional software engineering.

Beyond core language features, the directory catalogs a wide array of practical tools for the entire development lifecycle. This includes build systems, static analysis tooling, debuggers, and integrated development environments. It also covers a broad surface of application-level capabilities, ranging from scientific computing and embedded systems development to graphics, networking, and cross-platform library integration.
- [alibaba/roll](https://awesome-repositories.com/repository/alibaba-roll.md) (2,844 ⭐) — ROLL is a distributed reinforcement learning framework and model alignment toolkit designed for large language models. It serves as a scalable training pipeline and GPU cluster manager, providing the infrastructure to align model behavior using reinforcement learning algorithms and preference optimization techniques.

The project distinguishes itself through an agentic rollout orchestrator that generates and collects multi-turn interaction trajectories between AI agents and simulated environments. It supports specialized alignment methods including Direct Preference Optimization, reinforcement learning from verifiable rewards, and group-relative reward optimization.

The framework covers a broad range of capabilities for large-scale distributed training, including tensor, pipeline, and expert parallelism to support ultra-large-scale models. It manages hardware resources through GPU multiplexing and disaggregated deployment, while providing tools for automated reward evaluation using code sandboxes and mathematical verification.

Pre-configured environment deployments are provided for different GPU architectures and library versions to accelerate setup.
- [sasha240100/between.js](https://awesome-repositories.com/repository/sasha240100-between-js.md) (0 ⭐) — EXAMPLES * Examples collection
- [quivrhq/megaparse](https://awesome-repositories.com/repository/quivrhq-megaparse.md) (7,389 ⭐) — Megaparse is a document parsing tool and RAG data preprocessor designed to convert PDFs, Word documents, and presentations into clean text formats. It functions as a vision-based document extractor that recovers high-fidelity information from images and complex layouts to optimize data for large language model ingestion.

The system employs multimodal AI and vision models to perform schema-preserving parsing, which maintains structural hierarchies such as tables and headers. It utilizes lossless structural transformation to turn layout-heavy binary files into text sequences while preserving the semantic relationships between elements.

The project provides a document parsing API that allows these conversion and analysis capabilities to be deployed as a standalone server via standard HTTP endpoints.
- [yerongai/office-tool](https://awesome-repositories.com/repository/yerongai-office-tool.md) (12,966 ⭐) — Office-Tool is a PowerShell-based utility designed to automate the deployment, configuration, and maintenance of office productivity suites. It functions as a command-line manager that handles the full lifecycle of software installations, including initial setup, version updates, and the removal of existing applications.

The tool distinguishes itself by providing granular control over software environments through modular script execution and direct registry manipulation. It enables users to bundle installation files into standard disk images for simplified distribution, while also offering automated diagnostic routines to verify system states, repair common errors, and reset configurations to their original defaults.

Beyond core deployment tasks, the utility includes comprehensive support for software license management and activation. It facilitates the verification of product keys and registration status across multiple methods to ensure compliance and continued access to software features. The project is distributed as a collection of scripts intended for direct execution within a local host environment.
- [bytebytegohq/system-design-101](https://awesome-repositories.com/repository/bytebytegohq-system-design-101.md) (83,491 ⭐) — This project is a centralized engineering knowledge repository that provides a structured curriculum for mastering system design, architectural patterns, and fundamental software development workflows. It serves as a professional development resource for engineers, offering foundational knowledge and real-world case studies to support the design of scalable, secure, and efficient distributed systems.

The repository distinguishes itself through a visual-first approach to knowledge synthesis, distilling complex technical concepts into high-density graphical diagrams and succinct illustrations. By employing cross-domain concept mapping and modular topic decomposition, it connects disparate engineering disciplines—such as infrastructure, security, and application layers—into granular, self-contained modules that facilitate rapid mental modeling and targeted learning.

The content covers a broad spectrum of technical domains, including API and web development, database scaling strategies, networking protocols, and DevOps deployment pipelines. These educational assets are organized as a static, version-controlled repository, allowing users to consume technical insights asynchronously at their own pace.
- [fincept-corporation/finceptterminal](https://awesome-repositories.com/repository/fincept-corporation-finceptterminal.md) (26,900 ⭐) — FinceptTerminal is a quantitative finance platform and financial engineering library designed for asset valuation, risk management, and fixed-income analytics. It provides a comprehensive suite for algorithmic trading and investment strategy automation, integrating specialized language model agents and node-based workflows to automate market research and alpha generation.

The project distinguishes itself with a dedicated game theory analysis engine for calculating Nash equilibria and simulating strategic interactions in competitive markets. It also features a specialized credit risk modeling tool for estimating default probabilities, building credit scorecards, and calculating expected losses.

The system covers a broad range of capability areas, including derivatives pricing, yield curve construction, and multi-asset portfolio analysis. It incorporates machine learning tools for credit scorecard development and feature engineering, as well as economic analysis frameworks for utility theory and exchange economies.

The platform includes an algorithmic trading suite for real-time trade execution and an LLM investment agent framework for geopolitical and market modeling.
- [adamcooke/documentation](https://awesome-repositories.com/repository/adamcooke-documentation.md) (213 ⭐) — A Rails engine to provide the ability to add documentation to a Rails application
- [vikparuchuri/marker](https://awesome-repositories.com/repository/vikparuchuri-marker.md) (36,164 ⭐) — Marker is an LLM-powered document parser and OCR pipeline designed to convert PDFs and unstructured files into structured markdown, JSON, and HTML. It functions as a data preprocessor that transforms complex documents into machine-readable formats while preserving tables, equations, and layout structures.

The system utilizes large language models to refine OCR accuracy, clean mathematical notation, and merge fragmented tables across multiple pages. It employs model-based layout analysis to predict block types and bounding boxes, ensuring a more precise conversion of document elements.

Capabilities include extracting images and structured data based on predefined schemas, as well as chunking documents for retrieval augmented generation pipelines. The project supports high-volume processing by distributing conversion tasks across multiple GPUs.
- [amruthpillai/reactive-resume](https://awesome-repositories.com/repository/amruthpillai-reactive-resume.md) (38,613 ⭐) — This project is a web-based platform designed for creating, managing, and sharing professional resumes. It functions as a structured document builder that integrates artificial intelligence to assist with content generation, editing, and analysis. Users can maintain a collection of resumes, customize their visual presentation through various templates, and export them into multiple formats for job applications.

The platform distinguishes itself through its autonomous AI agent capabilities, which can perform research, suggest incremental edits, and apply data patches directly to documents. It also provides a secure, self-hostable environment that allows users to maintain full control over their data and infrastructure. The system supports advanced authentication methods, including passkeys and federated identity providers, ensuring that personal and professional information remains protected.

Beyond core editing, the application includes tools for document organization, such as tagging, filtering, and legacy data migration. It features a robust document generation engine that separates content from design, allowing for precise layout control and styling. Users can share their resumes via password-protected public URLs and monitor document performance through integrated analytics.

The application is designed for containerized deployment, utilizing Docker Compose to facilitate consistent installation across private infrastructure. It includes built-in health monitoring and feature flagging to manage system performance and functionality without requiring code redeployments.
- [jgm/pandoc](https://awesome-repositories.com/repository/jgm-pandoc.md) (44,822 ⭐) — Pandoc is a universal document converter that translates content between a wide range of markup and binary formats. It functions by parsing input documents into a unified intermediate abstract syntax tree, which serves as the foundation for consistent manipulation and transformation across diverse output types.

The system is distinguished by its modular reader-writer pipeline, which decouples input parsing from output generation to allow for granular control over document structure. Users can programmatically manipulate this intermediate tree through a robust filter system, supporting both external JSON-based interop and an integrated scripting environment for custom transformations. This architecture enables complex document processing tasks, such as automated scholarly publishing, where citations, bibliographies, and mathematical expressions are managed through a specialized toolchain.

Beyond core conversion, the project provides a comprehensive templating engine that merges structured document data with customizable templates to produce final outputs with specific styling and layout requirements. It also offers a network-based server mode for API-driven and batch processing, allowing the tool to be integrated into automated technical content pipelines.

The software is primarily operated via a command-line interface, which provides extensive configuration options for managing input formats, citation styles, and document metadata.
- [niuiic/format.nvim](https://awesome-repositories.com/repository/niuiic-format-nvim.md) (33 ⭐) — An asynchronous, multitasking, and highly configurable formatting plugin.
- [collaboraonline/online](https://awesome-repositories.com/repository/collaboraonline-online.md) (2,985 ⭐) — This project is a cloud-based office suite and self-hosted document server that enables the creation and editing of documents, spreadsheets, and presentations. It functions as a headless office application, utilizing a server-side processing engine to handle file rendering and formatting without requiring a local graphical user interface.

The system operates as a real-time collaborative editor, employing operational transformation to allow multiple users to edit files simultaneously. It also serves as a web-based document processor capable of automating office tasks through macro execution and programmatic field population.

The platform covers a broad range of office capabilities, including advanced spreadsheet management with protected cells and filtered views, digital document certification via PDF export and electronic signatures, and tools for professional layout, grammar auditing, and mail merge.
- [aaronlidman/osm-and-geojson](https://awesome-repositories.com/repository/aaronlidman-osm-and-geojson.md) (91 ⭐) — Converts between OSM XML and GeoJSON
- [facebookresearch/nougat](https://awesome-repositories.com/repository/facebookresearch-nougat.md) (10,015 ⭐) — Nougat is a neural OCR system and LLM document parser designed to convert images of academic PDF documents into structured markdown text and mathematical formulas. It functions as a PDF to markdown converter that uses deep learning to handle layout and formula recognition.

The project provides a document training pipeline for generating datasets and training neural networks to recognize specific academic document styles. This includes utilities for training dataset generation, neural model training, and model checkpoint management to ensure reproducible deployment.

The system covers a broad range of capabilities including academic document digitization and automated text extraction. It incorporates tools for model accuracy evaluation, performance testing, and training metric logging to monitor model convergence and stability.

Programmatic access to these capabilities is available via web service endpoints for document conversion, text prediction, and structured OCR extraction.
- [duplicati/duplicati](https://awesome-repositories.com/repository/duplicati-duplicati.md) (14,283 ⭐) — Duplicati is a self-hosted backup server designed to perform encrypted, incremental, and compressed backups to a wide range of local, network, and cloud-based storage providers. It functions as a background service that automates recurring data protection tasks, ensuring that only changed data blocks are stored to maximize efficiency and minimize bandwidth usage.

The project distinguishes itself through a centralized management console that allows for the orchestration of multiple distributed backup agents from a single web-based dashboard. It supports multi-tenant management, enabling the organization of users and resources into hierarchical structures for delegated access and data isolation. Furthermore, it provides robust security features, including AES-256 encryption for data at rest, support for OIDC and SAML2 authentication, and provider-level immutability protections to prevent unauthorized modification of backup archives.

Beyond its core backup capabilities, the system includes comprehensive tools for data lifecycle management, such as automated retention policies, versioning, and integrity verification. It offers flexible configuration through both a graphical interface and a command-line utility, supporting automation scripting and dry-run simulations to verify workflows before execution. The software also handles complex environments by managing locked files and providing metadata indexing to ensure rapid restoration even if the primary configuration database is unavailable.

Duplicati is available through various installation formats, including native system packages, portable archives, and containerized deployments, allowing it to run in diverse operating environments.
- [dagwieers/unoconv](https://awesome-repositories.com/repository/dagwieers-unoconv.md) (2,748 ⭐) — Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.
- [c4illin/convertx](https://awesome-repositories.com/repository/c4illin-convertx.md) (15,905 ⭐) — ConvertX is a web-based file conversion management platform designed to transform documents, images, and video files between various formats. It utilizes system-level binary orchestration to execute conversion tasks, leveraging background worker threads to handle concurrent, high-volume bulk processing without blocking the user interface.

The platform distinguishes itself through a comprehensive security and access control framework, which includes multi-user account management, session-based token authentication, and role-based permissions. Users can secure their output files with passwords and configure service visibility, while the system automatically enforces data privacy through ephemeral storage cleanup policies that purge processed files after a set duration.

Beyond core conversion capabilities, the project includes integrated tools for automated dependency lifecycle management and build maintenance. These features allow for the automated checking and merging of package updates, as well as the enforcement of consistent styling standards across the codebase. The system provides real-time progress tracking for all active jobs and allows for granular configuration of media encoding parameters to suit specific processing requirements.
- [hiddify/hiddify-app](https://awesome-repositories.com/repository/hiddify-hiddify-app.md) (30,948 ⭐) — Hiddify is a cross-platform proxy client designed to manage secure network connections and traffic routing across desktop and mobile operating systems. It functions as a unified proxy manager, providing a centralized interface to configure and control various network proxy protocols for encrypted and private internet access.

The application distinguishes itself by integrating local loopback interception, which configures the operating system network stack to route traffic through a local port for granular filtering. It also serves as a self-hosted infrastructure tool, enabling users to automate the deployment of private proxy servers on remote infrastructure through simplified command-line initialization.

The system maintains consistency across environments by synchronizing remote server states through declarative configuration files and utilizing an event-driven daemon to monitor proxy health and network state changes. It employs a shared bridge layer to interact with native system APIs and firewall rules, while bundling all necessary dependencies into a singular, self-contained executable package.
- [dingjunyao/picgo-plugin-convert-heic](https://awesome-repositories.com/repository/dingjunyao-picgo-plugin-convert-heic.md) (0 ⭐) — Convert HEIC Photo to other formats (eg. JPEG).
- [alsyundawy/microsoft-office-for-macos](https://awesome-repositories.com/repository/alsyundawy-microsoft-office-for-macos.md) (5,966 ⭐) — This project provides a collection of tools for deploying, activating, and managing volume-licensed Microsoft Office suites on macOS, supporting versions from 2011 through LTSC 2024. It handles the full lifecycle of Office installation and licensing, including the ability to install a complete Office suite as a single package and activate it using volume license serializers that bypass individual product keys or Microsoft accounts.

The project includes capabilities for troubleshooting and repairing broken Office installations by cleaning remnants of previous versions using official Microsoft reset tools. It also offers license management utilities that can remove existing licenses and reinstall them to resolve activation errors or corruption. Additionally, the project provides scripts for hardening privacy by disabling telemetry, cloud login, and online connectivity across all Office applications through system preference domain manipulation.
- [iib0011/omni-tools](https://awesome-repositories.com/repository/iib0011-omni-tools.md) (9,710 ⭐) — omni-tools is a browser-based utility suite that provides client-side tools for manipulating PDFs, media files, and data formats. It functions as a collection of web-based processors and calculation engines that execute directly within the browser without requiring server-side processing.

The suite includes a client-side PDF editor for merging, splitting, and reorganizing document structures, and a web-based media processor for resizing, trimming, and converting image and video files. It also features a data format converter that transforms structured information between JSON, CSV, and XML formats using schema-based mapping.

The project further provides technical calculation utilities for date and time analysis, electrical property computations, and mathematical operations. Additional capabilities include text formatting tools for modifying casing and shuffling list items.
- [garrytan/gstack](https://awesome-repositories.com/repository/garrytan-gstack.md) (110,596 ⭐) — gstack is an AI agent framework and development workflow system designed to automate the software development lifecycle. It coordinates specialized AI personas to manage tasks across product design, engineering management, and quality assurance, transforming product intent into technical specifications and final releases.

The project is distinguished by its deep integration of headless browser automation and semantic code memory. It utilizes a persistent Chromium daemon for web scraping and visual auditing, and implements a searchable knowledge base that logs architectural decisions and repository structures to maintain institutional memory across sessions.

Its capabilities extend to autonomous quality assurance, including the ability to drive physical iOS devices via USB for bug fixing and visual auditing. The system also covers automated technical documentation generation, security guardrails to prevent prompt injection and secret leakage, and the orchestration of multi-agent swarms for concurrent technical tasks.
- [wsdjeg/format.nvim](https://awesome-repositories.com/repository/wsdjeg-format-nvim.md) (15 ⭐) — asynchronous code formatting plugin
- [kovidgoyal/calibre](https://awesome-repositories.com/repository/kovidgoyal-calibre.md) (24,146 ⭐) — Calibre is a comprehensive suite for digital library management, serving as a centralized hub for organizing, converting, and editing e-book collections. It functions as a multi-purpose platform that combines a relational database for metadata tracking with a powerful processing engine capable of transforming document formats and restructuring internal markup. Beyond local management, the software acts as a content server, enabling users to host their libraries over a network for remote access and reading via standard web browsers.

The project distinguishes itself through its deep extensibility and automation capabilities. It features a modular plugin architecture that allows for custom code injection, alongside a sophisticated template-driven logic system that enables complex metadata manipulation, arithmetic, and conditional branching. Users can automate recurring tasks such as news aggregation and content retrieval, or utilize command-line utilities to integrate library administration into broader workflows. The system also provides specialized tools for book validation, repair, and version tracking, ensuring that digital materials remain consistent and compatible across various reading devices.

The platform covers a broad spectrum of content-related operations, including bibliographic metadata retrieval, advanced text searching, and granular control over reading appearance and page layout. It supports synchronization across multiple devices, including the management of reading progress and direct transfers to hardware readers. Security is maintained through user account management and encrypted network connections, while the interface remains accessible through both graphical and terminal-based environments.
- [deepfakes/faceswap](https://awesome-repositories.com/repository/deepfakes-faceswap.md) (55,289 ⭐) — Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames.

The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated processing and multi-stage image post-processing. It includes specialized tools for manual alignment verification, allowing users to refine detected facial data through a graphical interface to ensure high-quality results. The system also features robust batch-oriented data processing, which partitions media into standardized chunks to optimize memory usage and throughput during intensive neural network operations.

Beyond its core synthesis capabilities, the framework covers a broad range of computer vision tasks including facial landmark detection, pose estimation, and mask generation. It integrates sophisticated model management utilities, such as automated loss calculation, gradient clipping, and snapshot recovery, to ensure stable training sessions. The system also provides extensive diagnostic tools for hardware performance monitoring and environment validation, ensuring compatibility across various compute accelerators.

The software is managed through a centralized command-line and graphical toolkit that supports persistent configuration and session state management. It is designed to run on diverse hardware configurations by dynamically querying available compute resources and routing tensor operations to the optimal processor.
- [jina-ai/reader](https://awesome-repositories.com/repository/jina-ai-reader.md) (9,832 ⭐) — Reader is an AI data ingestion pipeline and web content parser designed to convert websites and documents into clean markdown for use with large language models. It functions as a headless browser content extractor and web-to-markdown converter, transforming URLs and PDF files into structured text formats while removing irrelevant web clutter.

The system optimizes retrieval augmented generation by acting as a search optimizer that retrieves web results and applies re-ranking to improve context relevance. It further enhances content accessibility by using vision models to generate descriptive captions for images and creating vector embeddings for semantic retrieval.

The tool provides broad capabilities for document conversion, web content extraction, and data preprocessing. These include headless browser rendering for JavaScript execution, multi-format conversion of office documents, and bucket-based content caching to reduce latency.

The conversion engine can be deployed as a self-hosted container including all necessary headless browsers and document processors.
