awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Content Processing and Transformation · Awesome GitHub Repositories

21 repos

Awesome GitHub RepositoriesContent Processing and Transformation

Middleware and utility layers that handle the conversion, formatting, and programmatic manipulation of content between different formats or schemas.

Explore 21 awesome GitHub repositories matching content management & publishing · Content Processing and Transformation. Refine with filters or upvote what's useful.

  1. Home
  2. Content Management & Publishing
  3. Content Processing and Transformation

Awesome Content Processing and Transformation GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • avelino/awesome-go

    avelino/awesome-go

    165,543GitHubView on GitHub↗

    This project serves as a comprehensive language ecosystem index, functioning as a centralized, community-curated directory for the Go programming language. It organizes a vast landscape of software components, libraries, and development tools into a structured, navigable hierarchy, enabling developers to efficiently di

    Goawesomeawesome-listgo
  • jaywcjlove/awesome-mac

    jaywcjlove/awesome-mac

    99,007GitHubView on GitHub↗

    This project is a comprehensive, curated collection of software resources designed for the macOS ecosystem. It serves as a centralized directory for discovering applications across a wide range of functional domains, including professional development, system management, and personal productivity. The directory distin

    JavaScriptappappleapplication
  • oven-sh/bun

    oven-sh/bun

    87,491GitHubView on GitHub↗

    Bun is a high-performance runtime environment designed to execute JavaScript and TypeScript applications with minimal latency and high throughput. Built on a native core implemented in Zig, it provides a unified execution engine that leverages JavaScriptCore for efficient memory management and low-latency startup. The

    Zigbunbundlerjavascript
  • microsoft/markitdown

    microsoft/markitdown

    87,305GitHubView on GitHub↗

    This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine

    Pythonautogenautogen-extensionlangchain
  • Stirling-Tools/Stirling-PDF

    Stirling-Tools/Stirling-PDF

    74,357GitHubView on GitHub↗

    Stirling-PDF is a self-hosted document processing suite designed for secure, private file management. It functions as a comprehensive transformation engine that executes complex operations—such as merging, splitting, converting, and redacting documents—directly on the host machine. The platform provides both a browser-

    TypeScriptdockerhacktoberfestjava
  • infiniflow/ragflow

    infiniflow/ragflow

    73,425GitHubView on GitHub↗

    This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin

    Pythonagentagenticagentic-ai
  • tesseract-ocr/tesseract

    tesseract-ocr/tesseract

    72,460GitHubView on GitHub↗

    Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d

    C++hacktoberfestlstmmachine-learning
  • hakimel/reveal.js

    hakimel/reveal.js

    70,586GitHubView on GitHub↗

    This project is a web-native presentation framework that renders slide decks from standard HTML or Markdown. It functions as a declarative slide engine, managing navigation, state persistence, and lifecycle events through a configuration-driven interface. By leveraging standard web technologies, it enables the creation

    JavaScriptpresentationsslidesslideshow
  • binary-husky/gpt_academic

    binary-husky/gpt_academic

    70,112GitHubView on GitHub↗

    This project provides a self-hosted, web-based interface designed to integrate large language models into academic and research workflows. It functions as a modular platform for document analysis, literature processing, and data handling, allowing users to maintain full control over their data and model connectivity th

    Pythonacademicchatglm-6bchatgpt
  • fffaraz/awesome-cpp

    fffaraz/awesome-cpp

    69,832GitHubView on GitHub↗

    This project is a comprehensive, curated directory of high-quality libraries, tools, and educational resources for C and C++ development. It serves as an ecosystem discovery index, helping developers navigate the vast landscape of third-party components, frameworks, and technical documentation available for the languag

    awesomeawesome-listc
  • adam-p/markdown-here

    adam-p/markdown-here

    60,151GitHubView on GitHub↗

    Markdown Here is a browser extension that enables rich text composition within web-based editors that lack native formatting support. By transforming plain text markdown syntax into rendered HTML, it allows users to draft professional emails and documents using standard markup, including headers, tables, and footnotes,

    JavaScript
  • Solido/awesome-flutter

    Solido/awesome-flutter

    59,015GitHubView on GitHub↗

    This project is a community-curated directory of resources, libraries, and tools designed to support developers working with the Flutter framework. It functions as a centralized knowledge base, organizing high-quality external references into a structured, human-readable format to assist in the discovery of technical m

    Dartandroidawesomeawesome-list
  • zylon-ai/private-gpt

    zylon-ai/private-gpt

    57,116GitHubView on GitHub↗

    This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov

    Python
  • Textualize/rich

    Textualize/rich

    55,540GitHubView on GitHub↗

    Rich is a comprehensive library for building sophisticated command-line interfaces and terminal applications. It provides a robust console formatting engine and a layout framework that enables developers to render rich text, syntax-highlighted code, and complex data structures directly in the terminal. By utilizing a r

    Pythonansi-colorsemojimarkdown
  • obra/superpowers

    obra/superpowers

    55,426GitHubView on GitHub↗

    Superpowers is a browser-based game development engine and collaborative integrated development environment. It provides a unified workspace for building two-dimensional interactive experiences, allowing users to manage code, assets, and scene logic directly within a web browser without the need for local compilers or

    Shell
  • opendatalab/MinerU

    opendatalab/MinerU

    54,523GitHubView on GitHub↗

    MinerU is a document parsing pipeline designed to transform unstructured files into machine-readable, structured data. It utilizes deep learning models to perform layout analysis, identifying document regions and extracting complex content such as mathematical expressions. By combining these neural network inferences w

    Pythonai4sciencedocument-analysisextract-data
  • marktext/marktext

    marktext/marktext

    53,968GitHubView on GitHub↗

    Marktext is a cross-platform desktop application designed for markdown document authoring and structured note-taking. It functions as a WYSIWYG text processor, providing a distraction-free interface that renders formatted content in real-time while hiding the underlying markup syntax. The application utilizes a multi-

    JavaScriptdark-modeeditorelectron
  • docling-project/docling

    docling-project/docling

    53,584GitHubView on GitHub↗

    Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing

    Pythonaiconvertdocument-parser
  • mozilla/pdf.js

    mozilla/pdf.js

    52,848GitHubView on GitHub↗

    This project is a portable document rendering engine designed to parse and display complex document layouts directly within standard web browser environments. It functions as a web-native viewer that enables the presentation of documents without requiring external software or browser plugins. The engine utilizes a can

    JavaScript
  • typst/typst

    typst/typst

    51,468GitHubView on GitHub↗

    Typst is a programmable, markup-based typesetting engine designed for professional document creation. It functions as a scriptable publishing toolchain that transforms plain text and code into complex, paginated outputs. By utilizing a high-performance compiler, the system automates document assembly, mathematical rend

    Rustcompilermarkuptypesetting
Prev12Next

Explore sub-tags

  • Content Processing7 sub-tagsUtilities that manipulate, parse, or transform text and data content into different formats or structures.
  • Document Processing and Conversion6 sub-tagsEngines and APIs that automate the conversion and processing of documents between various file formats.
  • Markdown and Markup Tools3 sub-tagsTools that parse, render, and format Markdown and other lightweight markup languages into structured output.