30 open-source projects similar to github/markup, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Markup alternative.
Magika is an AI content type classifier and MIME type prediction engine that uses deep learning to identify file formats based on binary data. It analyzes byte sequences through a neural network to predict the content type of a file and provide associated confidence scores. The system features a foreign function interface that allows the core detection logic to be integrated across different programming languages. It includes a mechanism for configuring detection sensitivity and per-type thresholds to balance precision and recall. The project provides capabilities for bulk file analysis via
PyMuPDF is a comprehensive PDF manipulation library and document analysis tool. It serves as a text extraction tool, OCR engine, and image converter, providing a programmatic interface to edit, merge, split, and optimize PDF and Office documents. The project distinguishes itself through high-performance capabilities, including the use of C-bindings for low-level manipulation and parallelized page processing to accelerate workloads. It provides specialized conversion paths, such as transforming PDF content into Markdown for retrieval-augmented generation and large language model pipelines. It
Translate-shell is a command-line translation tool and terminal dictionary client. It allows for the translation of words, phrases, and sentences between multiple languages and provides dictionary definition retrieval and language metadata display directly within the terminal. The tool functions as a shell-based text translator that can process input from standard streams, local files, or URLs. It includes text-to-speech capabilities to play audio pronunciations of source and translated text and can automatically detect the source language of a given string. The system supports interactive s
Libpostal is a C library designed for international address parsing and normalization. It utilizes statistical NLP and a language classifier to decompose unstructured global address strings into structured components and standardize street addresses by expanding abbreviations and resolving regional naming variations across multiple languages. The project provides tools for text transliteration, converting various scripts into standardized Latin-ASCII or NFD forms. It also includes capabilities for address deduplication, using symmetric fuzzy matching to identify whether different address reco
PasteMD is a clipboard-based document processor and productivity tool designed to convert Markdown or HTML content into formatted office documents. It transforms markup and mathematical formulas from the clipboard into rich text for direct insertion into word processors and spreadsheets. The system functions as a style orchestrator, using reference documents and templates to apply specific fonts, colors, layouts, and margins to the converted text. This allows for the customization of output appearances to match specific document requirements. The tool handles technical document composition b
Tika is a content analysis toolkit and Java library designed for detecting and extracting metadata and text from thousands of different file types. It functions as a universal document text extractor and metadata extraction engine, converting complex files into plain text or XHTML. The system employs a specialized MIME type detector that identifies document formats using magic bytes and metadata to determine the correct parser. It serves as an OCR integration gateway, connecting to external text recognition tools to extract content from image files. The project covers a broad range of extrac
Vale is a markup-aware prose linter and command-line interface tool designed to enforce editorial style guides and grammar rules across various document formats. It functions as a YAML-based style guide engine that analyzes text for consistency in tone, spelling, and terminology while ignoring non-prose elements like code blocks. The project distinguishes itself through a flexible extensibility model that allows users to define custom linting rules using YAML configurations, regular expressions, and external scripts for complex validation logic. It supports a wide array of documentation forma
TastyIgniter is a comprehensive restaurant management system and digital ordering engine. Built as a modular application framework, it provides the tools necessary to operate online food ordering, table reservation systems, and multi-vendor e-commerce platforms. The platform is designed to handle complex restaurant operations, including multi-location networking and multi-vendor marketplace management. It distinguishes itself through specialized restaurant automation, such as coordinating guest limits and time slots for bookings, managing ingredient and allergen catalogs, and implementing mul
markdown-it is a token-based Markdown compiler and CommonMark-compliant parser that converts structured plaintext markup into HTML. It functions as an extensible markup processor designed to transform text into browser-ready content while managing security and preventing cross-site scripting. The project is distinguished by a modular plugin system that allows for the extension of parsing capabilities and the addition of custom syntax, such as footnotes, tables, or emojis. It utilizes a two-stage tokenization process to break documents into structural tokens before rendering them into final HT
OpenWeChat is a Go language software development kit and API wrapper designed for integrating WeChat messaging and account management into Go applications. It serves as a bot framework and messaging library for handling real-time chat events and programmatic interactions with the platform. The project provides a comprehensive system for session management, including QR code authentication and the persistence of session cookies in JSON format to maintain access across restarts. It distinguishes itself by offering capabilities to intercept and preserve messages that senders attempt to revoke, a
Aglio is a command-line interface tool and static HTML renderer designed to convert API Blueprint specification files into readable, web-based documentation. It transforms structured API specifications into standalone HTML pages that can be hosted and distributed without a backend server. The project includes a theme engine that allows for the customization of visual styles through CSS variables and layout template overrides. Users can apply built-in themes or integrate external modules to change how the documentation is rendered. The tool supports modular document composition, enabling the
This repository contains the HTML specification, which defines the core standards for web page structuring, content organization, and document rendering. It establishes the fundamental algorithms for state-machine-based tokenization, tree construction for the document object model, and origin-based security isolation. The specification provides a framework for defining custom elements with independent lifecycles and registries. It also details the requirements for cross-document communication, session history management, and the synchronization of interface properties with content attributes.
RmlUi is a cross-platform UI renderer and middleware library that enables the creation of user interfaces using a subset of HTML and CSS. It functions as a rendering-agnostic layer designed to integrate web-standard layout and styling into custom game engines and embedded applications. The framework is distinguished by its integration of Lua for dynamic logic and control, as well as a specialized toolkit for rendering SVG images and Lottie animations. It utilizes a pluggable rendering backend that decouples geometry generation from the final display, allowing it to generate textured geometry
Blackfriday is a Go library for parsing and converting Markdown text into HTML, LaTeX, and other structured formats. It functions as an extensible Markdown processor that transforms syntax into target markup languages. The project is distinguished by its pluggable rendering architecture, which allows for the production of diverse output targets such as Slack message styles, Confluence Wiki Markup, and GitHub Flavored Markdown. It supports custom syntax extensions including definition lists, footnotes, autolinks, and strikethroughs. The processor includes utilities for generating automatic ta
kkFileView is a Spring Boot-based file preview server that provides a universal document viewer for rendering office files, PDFs, images, and 3D models directly in a web browser. It functions as a secure document rendering service that allows users to view a wide variety of file formats without requiring local software installations. The project distinguishes itself through specialized CAD to SVG conversion, transforming complex drawings into web-compatible formats. It includes a RESTful file preview API that allows these rendering capabilities to be integrated into external business applicat
anx-reader is a cross-platform e-book reader and cloud-synced library manager. It renders various electronic book formats into a standardized HTML view with customizable themes and fonts for a consistent experience across different operating systems. The project integrates a large language model as a reading assistant to summarize text and answer questions about book content. It also functions as a digital annotation tool for creating color-coded highlights and detailed notes for external research export. The system includes capabilities for organizing digital library collections, synchroniz
Markdoc is a documentation content framework that extends standard Markdown with custom tags, typed schemas, and reusable components, parsing content into an abstract syntax tree and rendering it as React elements or HTML. It provides a structured authoring system where documents are processed through an AST-based pipeline, enabling validation, transformation, and flexible output generation. The framework distinguishes itself through a schema-driven validation pipeline that checks document structure and attribute values against defined rules, and a pluggable renderer architecture that accepts
Metalsmith is a Node.js static site generator and static content processor that transforms source files into websites, eBooks, or technical documentation. It functions as a file-to-object transformer, converting directory trees into plain JavaScript objects that can be programmatically manipulated in memory. The project is built around a pluggable build pipeline where files are passed through a sequence of custom functions to transform content and metadata incrementally. This architecture allows users to extend functionality by writing their own plugins or using third-party modules to define
This project is a Ruby library for defining and managing object lifecycles through states, events, and transition rules. It functions as a declarative workflow engine that enforces business logic by restricting attribute changes to predefined, valid paths within Ruby classes. The library distinguishes itself through deep integration with database persistence layers, allowing it to automatically synchronize state changes with data models, validation frameworks, and transaction management. It supports dynamic configuration, enabling the construction of lifecycle rules at runtime from external d
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
OfficeCLI is a headless office suite and automation tool designed for programmatically reading, editing, and generating Microsoft Office documents. It functions as an OOXML manipulation library and a document templating engine, providing a standalone binary that allows for the management of Word, Excel, and PowerPoint files without requiring a local installation of office software. The project distinguishes itself by exposing document operations as tools for AI agents via a JSON-RPC server and the Model Context Protocol. It enables advanced customization through raw XML manipulation using XPa
GOT-OCR2.0 is an end-to-end optical character recognition system and document text extractor. It utilizes a unified transformer architecture to recognize and extract plain and formatted text from diverse images and documents. The system features a multi-crop processing method that divides high-resolution or dense documents into smaller sections to maintain recognition detail. It also includes a renderer that transforms recognized text into HTML to preserve the original structure and layout of the document. The project provides a framework for fine-tuning pre-trained models on custom datasets
This project is a static personal website template designed to showcase professional experience, career history, and technical skills. It provides a structured layout for individuals to establish an online presence for potential employers or clients. The template utilizes a responsive design approach, ensuring the interface adapts to various screen sizes through a grid system and mobile-first styling principles. By incorporating the Bootstrap framework, the project offers a consistent set of utility styles and components that facilitate the creation of a professional portfolio without requiri