# Self-Hosted Document Management Systems

> Search results for `self-hosted document management to scan and organize paper documents and PDFs` on awesome-repositories.com. 118 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/self-hosted-document-management-to-scan-and-organize-paper-documents-and-pdfs

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/self-hosted-document-management-to-scan-and-organize-paper-documents-and-pdfs).**

## Results

- [awesome-selfhosted/awesome-selfhosted](https://awesome-repositories.com/repository/awesome-selfhosted-awesome-selfhosted.md) (299,516 ⭐) — This project is a community-curated directory of open-source software designed for deployment in private server environments and home labs. It serves as a comprehensive resource for discovering independent, self-hosted alternatives to mainstream cloud services, enabling users to maintain full data ownership and control over their digital infrastructure.

The directory is structured through a hierarchical taxonomy that organizes a vast collection of applications into logical categories, ranging from media management and data analytics to private communication and team productivity tools. It distinguishes itself through a collaborative peer-review process, where community members validate the quality and relevance of each submission to ensure the directory remains accurate and reliable.

The project covers a broad capability surface, including infrastructure automation, container-based service deployment, and declarative configuration management. These tools assist users in maintaining reproducible server environments and managing complex service dependencies across private hardware.

The directory is maintained as a version-controlled repository, ensuring that all updates and community-driven changes are tracked and transparent.
- [athensresearch/athens](https://awesome-repositories.com/repository/athensresearch-athens.md) (6,298 ⭐) — Athens is no longer maintainted. Athens was an open-source, collaborative knowledge graph, backed by YC W21
- [tony-xlh/chat-with-scanned-documents](https://awesome-repositories.com/repository/tony-xlh-chat-with-scanned-documents.md) (6 ⭐) — A demo chatting with documents scanned with Dynamic Web TWAIN
- [pdfcrafttool/pdfcraft](https://awesome-repositories.com/repository/pdfcrafttool-pdfcraft.md) (3,113 ⭐) — Pdfcraft is a containerized service for self-managed PDF processing, editing, and conversion. It provides a toolkit for document manipulation, a multi-format converter, and OCR software to transform scanned documents into searchable and editable text.

The project features a visual, node-based workflow editor that allows users to build automated pipelines by chaining together various PDF conversion and optimization operations.

The service covers a broad range of capabilities, including document management for merging and splitting files, format conversion between PDFs and office documents or images, and security tools for encryption and metadata removal. It also includes utilities for content editing, interactive form creation, and file optimization.
- [formbricks/formbricks](https://awesome-repositories.com/repository/formbricks-formbricks.md) (12,391 ⭐) — Formbricks is an open-source survey and feedback platform designed to help teams capture and analyze user insights through targeted, in-app, and website-based interactions. It functions as a comprehensive customer experience analytics system that allows organizations to maintain full control over their data, user attributes, and survey workflows.

The platform distinguishes itself through its event-driven architecture, which enables precise behavioral targeting by triggering surveys based on specific user actions or application events. It supports deep integration with external ecosystems by automatically synchronizing response data to CRMs, databases, and communication tools, while providing programmatic interfaces for managing resources and automating feedback loops.

Beyond core collection, the system includes advanced logic for conditional branching, scoring, and personalized routing to create adaptive survey experiences. It offers extensive customization options, including white-labeling, CSS overrides, and multi-channel distribution across web, mobile, and email environments.

The platform is built for self-hosting, supporting containerized deployments with built-in multi-tenant data isolation and enterprise-grade security features like single sign-on and role-based access control.
- [stirling-tools/stirling-pdf](https://awesome-repositories.com/repository/stirling-tools-stirling-pdf.md) (81,109 ⭐) — Stirling-PDF is a self-hosted document processing suite designed for secure, private file management. It functions as a comprehensive transformation engine that executes complex operations—such as merging, splitting, converting, and redacting documents—directly on the host machine. The platform provides both a browser-based interface for interactive editing and a programmatic, API-first architecture that allows for the automation of document workflows through standard HTTP requests.

The project distinguishes itself through its focus on private, infrastructure-agnostic deployment and granular security. It supports role-based access control and stateless session authentication, ensuring that sensitive operations remain protected within a user-controlled environment. By offering a unified interface for sequential file transformations, it enables users to chain multiple processing tasks into single, automated pipelines while maintaining full control over document integrity and security.

The system covers a broad range of document manipulation capabilities, including optical character recognition, digital signature validation, and advanced layout operations like booklet imposition and page reorganization. It is built for flexible integration, supporting deployment across containerized environments, bare metal, or native desktop installations. Configuration is managed through environment variables, YAML files, or the web interface, allowing for consistent behavior across diverse infrastructure setups.
- [adamcooke/documentation](https://awesome-repositories.com/repository/adamcooke-documentation.md) (213 ⭐) — A Rails engine to provide the ability to add documentation to a Rails application
- [cisofy/lynis](https://awesome-repositories.com/repository/cisofy-lynis.md) (15,284 ⭐) — Lynis is an automated security auditing and system hardening framework designed for UNIX-based operating systems. It functions as a command-line utility that inspects local system configurations to identify security vulnerabilities, configuration weaknesses, and compliance gaps. By executing a series of modular tests, the tool generates actionable reports and remediation suggestions to assist in strengthening system defenses.

The project distinguishes itself through a highly modular architecture that relies on shell-script-based execution and native system inspection. Users can define custom audit profiles to standardize security policies across diverse environments, while the plugin-driven extensibility allows for the development of specialized security checks tailored to unique infrastructure requirements. This flexibility enables the tool to operate in non-interactive batch modes, facilitating integration into automated scheduling and continuous monitoring workflows.

Beyond core auditing, the framework supports enterprise-wide security management by aggregating data from multiple hosts into centralized reports. It provides capabilities for tracking system integrity, enforcing compliance baselines, and prioritizing hardening tasks based on risk assessments. The system also supports structured data serialization, allowing audit findings to be exported for external analysis and visualization.
- [frooodle/stirling-pdf](https://awesome-repositories.com/repository/frooodle-stirling-pdf.md) (81,168 ⭐) — Stirling-PDF is a web-based PDF management suite used for editing, merging, splitting, and converting PDF documents. It functions as a self-hosted document manager, providing a centralized interface for users to manipulate files on a private server.

The system features a workflow automation engine that allows for the creation of processing pipelines to handle large volumes of documents without writing custom code. It also includes an optical character recognition tool to convert scanned PDFs into searchable and editable text.

Access is managed through single sign-on integration and OIDC compatibility, which supports secure authentication and the maintenance of audit logs for compliance.

The application is delivered as a container-based deployment and exposes its functions through a REST API for external software integration.
- [documentationjs/documentation](https://awesome-repositories.com/repository/documentationjs-documentation.md) (5,798 ⭐) — :book: documentation for modern JavaScript
- [omnivore-app/omnivore](https://awesome-repositories.com/repository/omnivore-app-omnivore.md) (15,882 ⭐) — Omnivore is an open-source, self-hostable read-it-later application designed to centralize web articles, newsletters, and digital documents into a personal library. It functions as a comprehensive content archiver that captures web pages and stores them locally, ensuring permanent access and readability regardless of internet connectivity.

The platform distinguishes itself through an event-sourced synchronization engine that maintains a consistent state across multiple devices by replaying user actions. It utilizes a headless web scraping service to extract clean text and metadata from raw web pages, providing a uniform reading experience. Users can manage their collections through a research-oriented workflow that supports highlighting passages and attaching personal notes to saved content.

The application provides a full suite of content management capabilities, including offline reading, cross-device progress synchronization, and structured data persistence. It is distributed as an open-source project, allowing users to maintain full control over their personal data and reading history.
- [hoppscotch/hoppscotch](https://awesome-repositories.com/repository/hoppscotch-hoppscotch.md) (79,618 ⭐) — Hoppscotch is an open-source API development ecosystem designed for building, testing, and debugging REST, GraphQL, and real-time APIs. It provides a unified platform that functions across web browsers, desktop applications, and command-line interfaces, allowing developers to manage the entire API lifecycle from a single environment.

The platform distinguishes itself through a highly interactive, command-driven interface that utilizes a global spotlight palette and keyboard shortcuts to streamline complex workflows. It supports advanced request manipulation and validation by executing JavaScript-based scripts and assertions within a sandboxed runtime. Furthermore, it integrates AI-assisted tools to automate the generation of request payloads, test scripts, and documentation, while maintaining compatibility with existing API definitions and collections from other formats.

Beyond core testing capabilities, the project offers a collaborative workspace for teams to organize, share, and synchronize API collections and environment variables. It includes robust support for diverse authorization methods, proxy interception for network requests, and enterprise-grade features such as SCIM user provisioning and activity auditing. The software is available for self-hosted deployment via containerized architectures, ensuring consistent behavior across various production and development environments.
- [jamiebuilds/documentation-handbook](https://awesome-repositories.com/repository/jamiebuilds-documentation-handbook.md) (299 ⭐) — How to write high-quality friendly documentation that people want to read.
- [mblouka/svelte-document](https://awesome-repositories.com/repository/mblouka-svelte-document.md) (0 ⭐) — Create documents, resumes, or presentations from a collection of Svelte files. No configuration needed, and exports to a very portable PDF file.
- [bitwarden/clients](https://awesome-repositories.com/repository/bitwarden-clients.md) (13,114 ⭐) — This project is a comprehensive zero-knowledge security suite designed for enterprise credential management, secrets orchestration, and password management. It provides a secure, end-to-end encrypted vault that allows users to store, synchronize, and manage sensitive information, including passwords, passkeys, and infrastructure secrets, across desktop, mobile, and browser environments.

The platform distinguishes itself through a strict zero-knowledge architecture where all encryption and decryption occur locally on the client, ensuring that plaintext data remains inaccessible to the server. It supports flexible deployment models, allowing organizations to choose between managed cloud services or self-hosted infrastructure to meet specific data sovereignty and compliance requirements. Furthermore, the system integrates with external identity providers to streamline user provisioning and authentication, while offering advanced administrative controls for policy enforcement and security auditing.

Beyond core storage, the platform provides extensive tools for DevOps and automated workflows, including command-line interfaces for secret injection and programmatic SDKs for custom integrations. It also includes robust collaboration features for secure data sharing, team resource management, and credential health monitoring to help organizations maintain a strong security posture.
- [bitwarden/server](https://awesome-repositories.com/repository/bitwarden-server.md) (18,074 ⭐) — This project provides a comprehensive, self-hosted platform for zero-knowledge credential management and enterprise secrets orchestration. It functions as a secure vault that ensures all encryption and decryption processes occur exclusively on the client side, preventing the server from ever accessing plaintext data. By combining identity federation with robust access controls, the system enables organizations to centralize the management of passwords, passkeys, and sensitive infrastructure credentials.

The platform distinguishes itself through its focus on both human-centric security and automated machine-to-machine workflows. It supports advanced authentication methods including hardware security keys, passkeys, and biometric unlocking, while simultaneously offering programmatic interfaces for injecting secrets directly into development pipelines and automated infrastructure deployments. This dual-purpose design allows teams to maintain strict data sovereignty through local hosting and containerized deployments while enforcing granular governance across their entire user base.

Beyond core storage, the system includes extensive observability and compliance tools, such as immutable audit logging, credential risk analysis, and integration with external security information and event management platforms. It also facilitates secure collaboration through encrypted information sharing, emergency access delegation, and automated identity provisioning. The software is designed for flexible deployment across diverse infrastructure environments and includes command-line utilities for administrative tasks, bulk data migration, and secret retrieval.
- [sitewhere/sitewhere-documentation](https://awesome-repositories.com/repository/sitewhere-sitewhere-documentation.md) (0 ⭐) — This repository contains artifacts used to generate documentation for SiteWhere Community Edition.
- [barryvdh/laravel-dompdf](https://awesome-repositories.com/repository/barryvdh-laravel-dompdf.md) (7,270 ⭐) — This project is a Laravel integration for the Dompdf rendering engine, providing a tool to convert HTML and CSS templates into PDF documents. It functions as a wrapper that allows Laravel applications to generate downloadable or streamable PDF files from web-standard content.

The library includes specialized tools for producing PDF/A-3b compliant documents intended for long-term electronic preservation. This archival capability includes the ability to embed XML metadata and attachments, which supports electronic invoicing standards for digital business transactions.

The software covers a broad range of document generation tasks, including the conversion of HTML strings, PDF file export to filesystems, and the delivery of documents via browser streams. It leverages template-driven generation and standardized storage interfaces to manage the output of rendered files.
- [the-paperless-project/paperless](https://awesome-repositories.com/repository/the-paperless-project-paperless.md) (7,917 ⭐) — Paperless is a self-hosted document management system designed to digitize, index, and archive paper documents. It functions as an optical character recognition system that converts scanned images and PDFs into a searchable digital library, providing a web-based interface for querying and retrieving documents from a database.

The system features an automated file ingestion pipeline that monitors specific directories and email inboxes to process and import documents without manual uploading. To maintain a private archive, it includes on-disk encryption for sensitive files and the ability to organize physical storage using metadata-driven filename templates.

The platform covers broad capabilities for document processing, including image cleaning to remove speckles and correct skewing for better text recognition. It also provides tools for exporting archived documents to local directories for external backups and allows for user interface customization via custom styles and scripts.

The application is packaged as a containerized deployment to ensure consistent installation across different environments.
- [datalab-to/surya](https://awesome-repositories.com/repository/datalab-to-surya.md) (20,889 ⭐) — Surya is a document processing platform designed to transform unstructured files into structured, machine-readable data. It provides a comprehensive suite of tools for text recognition, layout analysis, and reading order detection, enabling the conversion of PDFs and images into formats such as JSON, HTML, or markdown. The platform is built to handle complex document workflows, offering capabilities for data extraction, document segmentation, and automated form completion.

The platform distinguishes itself through a robust pipeline-based architecture that allows users to chain analysis tasks into versioned, reusable sequences. It supports high-volume operations through batch processing and provides granular control over data extraction via schema management and confidence scoring. For enterprise requirements, it offers containerized deployment options that allow for on-premises execution, ensuring data privacy and security while maintaining consistent performance across environments.

Beyond core analysis, the system includes integrated management for document lifecycles, storage, and event-driven notifications via webhooks. It provides a strongly-typed software development kit to facilitate programmatic interaction, alongside monitoring tools that track system health and usage metrics. Security is maintained through API access controls, request throttling, and payload validation for event notifications.
- [bendrucker/document-ready](https://awesome-repositories.com/repository/bendrucker-document-ready.md) (59 ⭐) — Document ready listener for modern browsers
- [websitebeaver/capacitor-document-scanner](https://awesome-repositories.com/repository/websitebeaver-capacitor-document-scanner.md) (0 ⭐) — This is a Capacitor plugin that lets you scan documents using Android and iOS. You can use it to create apps that let users scan notes, homework, business cards, receipts, or anything with a rectangular shape.
- [dokploy/dokploy](https://awesome-repositories.com/repository/dokploy-dokploy.md) (34,901 ⭐) — Dokploy is a self-hosted platform-as-a-service designed to simplify the deployment and management of containerized applications and databases. It provides a centralized control plane that decouples administrative management from application workloads, allowing users to oversee infrastructure across multiple server nodes through a unified web interface or a command-line tool.

The platform distinguishes itself through an extensive library of pre-configured application templates, enabling the rapid deployment of databases, identity providers, and various productivity or development tools. It supports complex orchestration by allowing users to define multi-container services using standard configuration files, which can be managed through automated build pipelines, Git integration, and real-time performance monitoring.

Beyond core deployment, the system includes robust infrastructure management capabilities such as automated backups to external object storage, horizontal and vertical scaling, and granular access control. It also provides secure configuration management, including environment variable synchronization, HTTPS certificate handling, and zero-downtime deployment strategies to ensure application stability and security.

The platform is designed for ease of use, offering an interactive API documentation interface and instructional resources to guide users through installation and configuration. It supports a wide range of modern web frameworks and runtimes, providing a flexible environment for hosting and maintaining services on private server hardware.
- [datalab-to/marker](https://awesome-repositories.com/repository/datalab-to-marker.md) (36,137 ⭐) — Marker is a comprehensive document processing platform designed to automate the conversion, extraction, and structuring of data from complex files. It functions as an orchestration engine that chains modular processing steps into versioned, reusable pipelines, allowing organizations to standardize document handling and automate repetitive business tasks at scale.

The platform distinguishes itself through its support for secure, private infrastructure deployment, enabling users to run containerized services within their own environments to maintain strict data privacy. It features specialized engines for schema-driven data extraction and programmatic form automation, which map unstructured content from PDFs, images, and office files into predefined data structures. Additionally, the system provides robust change tracking and analysis tools to simplify collaborative review cycles by exporting redlines and comments into structured formats.

Beyond core extraction, the platform includes a wide range of operational capabilities for managing document lifecycles. This includes asynchronous task queueing for high-throughput batch processing, granular concurrency and rate-limiting controls to ensure system stability, and event-driven webhook notifications for real-time integration with external systems. The platform also offers built-in usage analytics and monitoring tools to track performance metrics and infrastructure health.

The project provides a complete set of client-side primitives and configuration utilities to manage the entire document processing workflow. Users can interact with the service through a documented API, supported by automatic retry logic and secure credential management to ensure reliable and authorized access to processing capabilities.
- [infisical/infisical](https://awesome-repositories.com/repository/infisical-infisical.md) (27,374 ⭐) — Infisical is a centralized secrets management platform designed to store, synchronize, and control access to sensitive credentials and configuration data across distributed development, staging, and production environments. It employs client-side encryption to ensure that secrets remain unreadable to the underlying storage infrastructure, while providing a hierarchical permission model to govern both user and machine access.

The platform distinguishes itself through dynamic credential provisioning, which generates short-lived access tokens that are automatically revoked after use. It supports complex security workflows by integrating with external identity providers for federated authentication and offering a reverse tunneling gateway that allows secure access to private network resources without exposing inbound ports. Additionally, the system includes an event-driven audit engine that maintains an immutable record of all configuration changes and access requests to support compliance requirements.

Beyond core secret storage, the platform provides comprehensive orchestration capabilities, including automated secret injection into containerized environments and infrastructure pipelines. It also features integrated public key infrastructure management for the lifecycle of digital certificates and automated scanning to detect hardcoded secrets in source code and CI pipelines.

The platform supports flexible deployment models, allowing teams to either utilize managed cloud services or self-host the infrastructure within their own private networks. It provides a broad ecosystem of SDKs and a command-line interface to facilitate integration across various programming languages and deployment workflows.
- [tpn/pdfs](https://awesome-repositories.com/repository/tpn-pdfs.md) (9,828 ⭐) — This project is a digital document repository and technical PDF library. It serves as a computer science reference archive designed to store a curated collection of academic papers, specifications, and manuals focused on computing and software engineering.

The archive functions as an engineering knowledge base for technical research archiving. It manages a structured library of documents to preserve institutional knowledge and ensure technical documentation remains accessible.

The system employs a curated content pipeline and metadata-driven indexing to organize materials. Documents are managed through flat-file storage and folder-based category mapping, with the resulting assets served via static hosting.
- [lissy93/awesome-privacy](https://awesome-repositories.com/repository/lissy93-awesome-privacy.md) (9,500 ⭐) — This project is a curated directory and catalog of privacy-respecting software and security-focused services. It serves as a structured resource for finding alternatives to corporate services, focusing on tools that prioritize data sovereignty, end-to-end encryption, and user anonymity.

The directory is maintained as a markdown-based resource list and rendered via a static site generator. It further extends its utility through a CORS-enabled public API and a JSON-based data schema, allowing the curated catalog of tools and providers to be retrieved programmatically.

The collection covers a wide range of capability areas, including secure communication tools, network privacy configuration, digital identity protection, and system security hardening. It also lists resources for personal data sovereignty, such as encrypted storage, private note management, and self-hosted hosting options.
- [dusk-labs/dim](https://awesome-repositories.com/repository/dusk-labs-dim.md) (4,062 ⭐) — Dim is a self-hosted media server and manager designed to index and organize local media libraries for remote access and playback. It functions as a private web-based portal that allows users to stream locally stored video and audio content over a network.

The system operates as a local media indexer that scans storage to structure and beautify collections, creating a consistent user interface for managing digital content. It uses metadata-driven beautification to enrich raw file lists into organized libraries.

The application is deployed as a containerized service, utilizing static path mapping to bind host system directories to the internal environment for persistent media storage.
- [duplicati/duplicati](https://awesome-repositories.com/repository/duplicati-duplicati.md) (14,283 ⭐) — Duplicati is a self-hosted backup server designed to perform encrypted, incremental, and compressed backups to a wide range of local, network, and cloud-based storage providers. It functions as a background service that automates recurring data protection tasks, ensuring that only changed data blocks are stored to maximize efficiency and minimize bandwidth usage.

The project distinguishes itself through a centralized management console that allows for the orchestration of multiple distributed backup agents from a single web-based dashboard. It supports multi-tenant management, enabling the organization of users and resources into hierarchical structures for delegated access and data isolation. Furthermore, it provides robust security features, including AES-256 encryption for data at rest, support for OIDC and SAML2 authentication, and provider-level immutability protections to prevent unauthorized modification of backup archives.

Beyond its core backup capabilities, the system includes comprehensive tools for data lifecycle management, such as automated retention policies, versioning, and integrity verification. It offers flexible configuration through both a graphical interface and a command-line utility, supporting automation scripting and dry-run simulations to verify workflows before execution. The software also handles complex environments by managing locked files and providing metadata indexing to ensure rapid restoration even if the primary configuration database is unavailable.

Duplicati is available through various installation formats, including native system packages, portable archives, and containerized deployments, allowing it to run in diverse operating environments.
- [getsentry/self-hosted](https://awesome-repositories.com/repository/getsentry-self-hosted.md) (9,426 ⭐) — This project is a containerized error tracking platform and monitoring suite designed for self-hosted deployment on private infrastructure. It provides a collection of services for capturing and analyzing software crashes and exceptions, ensuring that sensitive application data remains within a controlled environment.

The system includes specialized tooling for air-gapped deployment, allowing the software to be installed and operated on servers without internet access through the manual transfer of container images. It also supports corporate network integration via proxy configurations to maintain connectivity within restricted firewall environments.

The operational surface covers infrastructure health monitoring through dedicated status endpoints and request routing via a reverse proxy. Persistent storage is managed through volume mapping to decouple data from container lifecycles.
- [docmost/docmost](https://awesome-repositories.com/repository/docmost-docmost.md) (19,049 ⭐) — Docmost is an open-source knowledge management system designed as a collaborative documentation platform for teams. It functions as an enterprise wiki that centralizes organizational information into structured, searchable workspaces, enabling users to create, organize, and share content through a hierarchical system of spaces and pages.

The platform distinguishes itself by integrating artificial intelligence directly into the documentation lifecycle. It utilizes vector-based semantic search to allow for natural language queries across stored content and provides AI-assisted tools for drafting, summarizing, and refining documents. To support team workflows, it features a block-based editor for rich text authoring and visual diagramming, paired with real-time collaboration capabilities that synchronize changes across multiple users.

The system is built for enterprise environments, offering granular access control, multi-factor authentication, and identity provider integration for centralized user management. It also includes programmatic access through a REST API, allowing for the automation of resource management and integration with external software tools.

The platform supports flexible deployment with configurable storage backends and automated security certificate management. It is designed to be self-hosted, providing the necessary infrastructure to manage documentation security and lifecycle workflows within an organization.
- [stoatchat/self-hosted](https://awesome-repositories.com/repository/stoatchat-self-hosted.md) (2,497 ⭐) — This project is a self-hosted communication suite and private messaging infrastructure. It is a containerized chat platform designed for deployment on independent hardware to maintain full control over user data and server dependencies.

The system features a modular plugin framework that allows custom features and behaviors to be loaded into the client at runtime via manifest files. It is designed as a proxy-compatible service, supporting configurable network port routing to operate behind external reverse proxy servers.

The platform covers capabilities for containerized service orchestration, private communication infrastructure deployment, and custom plugin development.
- [pulsejet/memories](https://awesome-repositories.com/repository/pulsejet-memories.md) (3,697 ⭐) — Memories is a self-hosted photo and video management system designed for organizing, indexing, and sharing media libraries from a private server. It functions as an AI-powered media organizer that uses artificial intelligence for face recognition and object tagging to automatically categorize large collections.

The system distinguishes itself through deep metadata integration and specialized processing, featuring a geographic photo viewer that plots media on a map using GPS data and reverse geocoding. It also includes a self-hosted video transcoder that converts files into adaptive HLS streams using hardware acceleration for optimized web playback.

The platform covers broad capability areas including chronological timeline browsing and EXIF metadata editing for maintaining library accuracy. It provides tools for mobile media synchronization, batch selection, and secure external sharing for users without accounts.

The system supports the import of existing media collections and the migration of external metadata into image and video files.
- [tubearchivist/tubearchivist](https://awesome-repositories.com/repository/tubearchivist-tubearchivist.md) (7,561 ⭐) — TubeArchivist is a self-hosted YouTube video archiving system and metadata indexer. It functions as a personal media library and download manager that allows users to create a searchable offline collection of videos, channels, and playlists.

The system distinguishes itself by indexing subtitles, comments, and channel information for full-text search and retrieval. It features automated media synchronization to track subscriptions and playlists, ensuring new content is automatically queued and downloaded as it is published.

The project provides a broad set of capabilities for digital asset management, including download traffic shaping to prevent IP blocking, role-based access control, and LDAP authentication. It also manages media integrity through filesystem-database synchronization, metadata embedding directly into files, and a prioritized queue for high-volume content processing.

The application includes a web-based interface for organizing archived media, tracking viewing progress, and monitoring library statistics.
- [gohugoio/hugo](https://awesome-repositories.com/repository/gohugoio-hugo.md) (88,701 ⭐) — Hugo is a high-performance static site generator that transforms source content and templates into optimized web assets. Built with a focus on speed and scalability, it provides a comprehensive framework for managing large-scale documentation and editorial projects through structured content organization, taxonomies, and a flexible template-driven rendering engine.

The project distinguishes itself through a sophisticated build system that utilizes incremental caching to minimize redundant processing during site updates. It supports complex content requirements by enabling multidimensional modeling, which allows for the generation of diverse page variations from a single source, and multi-format output rendering that can produce HTML, JSON, RSS, or CSV simultaneously. Authors can extend their content using a modular shortcode system, while the integrated asset pipeline handles the transformation, minification, and optimization of images and stylesheets directly within the build lifecycle.

Beyond its core generation capabilities, Hugo offers a robust command-line interface for managing the entire project lifecycle, including real-time development previews and automated deployment workflows. The system also features a modular dependency architecture, allowing users to import and version shared themes, layouts, and configuration components to maintain consistent design systems across multiple projects.
- [allenai/document-qa](https://awesome-repositories.com/repository/allenai-document-qa.md) (0 ⭐) — This repo contains code for our paper Simple and Effective Multi-Paragraph Reading Comprehension. It can be used to train neural question answering models in tensorflow, and in particular for the case when we want to run the model over multiple paragraphs for each question. Code is included to…
- [meeb/tubesync](https://awesome-repositories.com/repository/meeb-tubesync.md) (2,625 ⭐) — TubeSync is a containerized media management tool and self-hosted archiver designed to automate the downloading and organization of video content from online sources, such as YouTube channels and playlists, into a local library for offline access. It functions as a download manager and metadata generator, utilizing a web interface to manage video subscriptions and synchronization settings.

The system features a rule-based content filter that evaluates video metadata against user-defined conditions to determine which items enter the download queue. To handle restricted or age-gated content, it supports browser cookie authentication to bypass standard OAuth flows.

The platform covers broad capabilities including automated content mirroring, the generation of sidecar metadata files for portable organization, and synchronization with external media servers. It includes a database-backed task queue for sequential processing, support for external database connectors, and a command-line interface for task management.

The service is provided as a Dockerized application with a web-based dashboard protected by HTTP basic authentication.
- [tom0li/collection-document](https://awesome-repositories.com/repository/tom0li-collection-document.md) (0 ⭐) — Collection of quality safety articles(To be rebuilt) `` Some are inconvenient to release. Some forget update,can see me star. collection-document awesome 以前的链接中大多不是优质的 渗透测试部分不再更新 因精力有限，缓慢更新 Author: [tom0li] Blog: https://tom0li.github.io `` - Project Description - Github-list - Awesome-list - 开发…
- [coollabsio/coolify](https://awesome-repositories.com/repository/coollabsio-coolify.md) (57,055 ⭐) — This project is a self-hosted platform-as-a-service that provides a centralized management interface for deploying, configuring, and monitoring containerized applications and databases on private infrastructure. It functions as a visual control plane, automating the end-to-end lifecycle of services from source code to production. By managing container orchestration, networking, and resource allocation, it allows users to maintain full control over their own hardware while streamlining the delivery of software.

The platform distinguishes itself through its agentless architecture, which uses secure shell connections to execute administrative tasks and manage remote servers without requiring persistent local software. It integrates directly with version control systems to trigger automated build and deployment pipelines, including the creation of temporary, isolated preview environments for every pull request. This workflow is supported by a declarative engine that uses templates to standardize the deployment of complex multi-container architectures and persistent database engines.

Beyond core orchestration, the system handles the operational requirements of hosted services by managing dynamic reverse-proxy routing and automated SSL certificate lifecycles. It provides a comprehensive suite of infrastructure management tools, including browser-based terminal access for debugging, automated system dependency installation, and persistent state management via a central database. These capabilities ensure that infrastructure remains synchronized and consistent across multiple remote environments.
- [stashapp/stash](https://awesome-repositories.com/repository/stashapp-stash.md) (11,855 ⭐) — Stash is a self-hosted platform designed for organizing, cataloging, and managing personal video collections. It functions as a local server that indexes media files from your file system and stores their relationships, tags, and metadata within a relational database.

The platform distinguishes itself through a modular, plugin-driven scraper engine that automatically retrieves and normalizes metadata from community-maintained online sources. This system eliminates manual data entry by populating your library with detailed information about performers, studios, and content tags.

Beyond basic organization, the software provides tools for analyzing library statistics and patterns. Users can navigate their collections through a responsive, browser-based interface that communicates with the server via a strongly typed query layer. The project is distributed as a compiled binary, providing a centralized environment for managing large-scale media libraries.
- [redisventures/llm-document-chat](https://awesome-repositories.com/repository/redisventures-llm-document-chat.md) (113 ⭐) — Using LlamaIndex, Redis, and OpenAI to chat with PDF documents. Supplementary material for blog post on Microsoft Developer Blog
- [addyosmani/agent-skills](https://awesome-repositories.com/repository/addyosmani-agent-skills.md) (60,849 ⭐) — Agent-skills is a collection of structured instructions and behavioral personas designed to standardize how AI coding agents perform engineering tasks. It functions as a workflow orchestrator that maps natural language intent to repeatable technical sequences and verification checklists.

The project distinguishes itself through the use of specialized markdown-defined roles, such as security auditors or test engineers, to apply targeted domain expertise. It employs an evidence-based verification model that requires runtime data or passing tests as mandatory exit criteria to ensure AI-generated code meets production standards.

The system covers a broad range of engineering capabilities, including technical specification automation, multi-axis code reviews, and test-driven development. It also provides frameworks for context management, security auditing, and the orchestration of parallel agent tasks to synthesize findings into consolidated reports.

These skills are implemented as standardized instructions and commands that can be loaded into an agent via auto-discovery or explicit installation.
- [jxxghp/moviepilot](https://awesome-repositories.com/repository/jxxghp-moviepilot.md) (11,254 ⭐) — MoviePilot is a self-hosted media orchestrator and NAS media library automator. It coordinates workflows between downloaders, metadata scrapers, and file systems to automate the discovery, downloading, renaming, and organization of movie and television content.

The system functions as an LLM media management agent, allowing users to control subscriptions, searches, and file organization through conversational text commands. It also acts as a Model Context Protocol server, exposing internal media management tools via a standardized interface for external AI clients and agents.

The project includes a plugin-based automation framework that supports dynamic module loading and a hot-reloading architecture for extending system logic. Additional capabilities include automated content subscriptions, event-driven media monitoring, metadata scraping pipelines, and configurable event notifications across messaging platforms.

The system provides health diagnostics for configuration recovery and supports custom interface styling via CSS.
- [apigee-127/a127-documentation](https://awesome-repositories.com/repository/apigee-127-a127-documentation.md) (113 ⭐) — Documentation for Apigee-127
- [carbon-language/carbon-lang](https://awesome-repositories.com/repository/carbon-language-carbon-lang.md) (33,829 ⭐) — Carbon is an experimental, compiled systems programming language designed as a successor to C++. It focuses on providing a high-performance environment for modern software development while prioritizing memory safety and expressive generic programming. The language is built to support performance-critical engineering, allowing for precise control over memory layout and execution flow.

A primary differentiator of the project is its bidirectional interoperability with existing C++ codebases. This allows developers to call functions and share data between languages without manual wrappers, facilitating a gradual migration path for legacy systems. The language architecture is generic-first, utilizing checked generic constraints and interface requirements to ensure type safety and code reusability at compile time.

The language incorporates an incremental memory safety model that prevents common errors through initialization tracking, bounds checking, and the explicit isolation of unsafe code blocks. Its syntax is expression-oriented, treating control flow structures like loops and branches as values to maintain type consistency. The project also enforces a nominal type system and uses canonical source representation to ensure consistent interpretation across different build environments.
- [sqzw-x/mdcx](https://awesome-repositories.com/repository/sqzw-x-mdcx.md) (3,394 ⭐) — MDCx is a self-contained media management tool that automatically scrapes metadata from online sources and renames local video files to match standardized naming schemes. It organizes personal media libraries by attaching descriptive information to each file, enabling easy browsing and search through a browser-based interface.

The tool operates through a modular backend that includes a configurable rule engine for defining naming patterns, plugin-based scrapers that load metadata from multiple sources, and a filesystem watcher that triggers processing workflows automatically when new media files are detected. A background task queue offloads long-running operations to keep the web interface responsive, while SQLite provides persistent storage for metadata, scraping history, and user configuration.

MDCx also includes error-handling mechanisms that resolve metadata retrieval failures to ensure complete data collection, and it enforces consistent Python code style through configurable linting and auto-formatting rules. The web-based management interface communicates with the backend exclusively through RESTful API endpoints, allowing remote browsing, searching, and management of media collections without command-line tools.
- [silvestv/migration-planificator-documentation](https://awesome-repositories.com/repository/silvestv-migration-planificator-documentation.md) (3 ⭐) — Documentation of @silvestv migration-planificator
- [dubinc/dub](https://awesome-repositories.com/repository/dubinc-dub.md) (23,722 ⭐) — This project is a comprehensive link management and marketing attribution platform designed for creating, tracking, and analyzing shortened URLs. It functions as a centralized hub for marketing analytics, providing tools to monitor link performance, visualize conversion funnels, and manage affiliate programs through a unified dashboard.

The platform distinguishes itself by integrating advanced attribution modeling and partner management directly into the link infrastructure. It supports complex marketing workflows, including automated commission calculations, fraud detection, and payout distribution for affiliates, alongside granular traffic redirection based on device, location, or A/B testing requirements. By utilizing custom domains and reverse proxy configurations, it ensures reliable data collection that bypasses common browser-based tracking restrictions.

Beyond core link operations, the system offers extensive programmatic capabilities, including a robust API, SDKs, and event-driven webhooks for real-time integration with external services. It also incorporates enterprise-grade administrative features such as multi-tenant workspace isolation, role-based access control, and single sign-on integration to support collaborative team environments.

The platform is built to be deployed within private infrastructure, allowing organizations to maintain full control over their data and system configuration.
- [requarks/wiki](https://awesome-repositories.com/repository/requarks-wiki.md) (27,909 ⭐) — This project is an enterprise knowledge platform designed for teams to create, manage, and publish structured documentation. Built on a high-performance runtime, it provides a centralized environment where contributors can author content using markdown, HTML, or a visual editor. The system is engineered to handle collaborative workflows, ensuring that technical and non-technical users alike can maintain documentation with consistent rendering and version control.

What distinguishes this platform is its focus on secure, scalable, and synchronized content management. It features granular path-based access controls and pluggable authentication middleware that integrates with enterprise identity providers, social logins, and multi-factor security layers. To ensure data persistence and collaborative safety, the platform supports direct synchronization with version control repositories and automated backups to cloud object storage.

The system offers a comprehensive suite of tools for managing organizational knowledge, including multilingual localization, media asset management, and detailed content version history. Administrators can further tailor the platform through custom visual themes, site-wide style injections, and flexible search configurations that support both built-in engines and external indexing services. The architecture is designed to scale across diverse infrastructure, from low-power hardware to high-performance cloud environments.
- [ruanyf/document-style-guide](https://awesome-repositories.com/repository/ruanyf-document-style-guide.md) (12,608 ⭐) — This project is a technical writing style guide and comprehensive specification for professional documentation, with a primary focus on standards for Chinese technical prose. It provides a structured framework for organizing document hierarchies, software manuals, and API references to ensure a consistent user experience.

The guide distinguishes itself through detailed linguistic specifications, including rules for integrating English terms into non-English text and precise standards for punctuation, spacing, and grammar tailored for the Chinese language. It also defines quantitative formatting for currency, numeric ranges, and the description of incremental changes.

Broadly, the project covers document architecture through standardized heading hierarchies and file naming conventions, as well as content design strategies for paragraph structuring and external source attribution. It further addresses prose optimization by establishing rules for sentence length, tone, and visual alignment across different character sets.
