4 repos

Awesome GitHub RepositoriesIntelligent Extraction Frameworks

Systems utilizing machine learning and spatial analysis to interpret document structure and extract data from complex layouts.

Explore 4 awesome GitHub repositories matching content management & publishing · Intelligent Extraction Frameworks. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

microsoft/markitdown
microsoft/markitdown
87,305GitHubView on GitHub
This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine
Pythonautogenautogen-extensionlangchain
Stirling-Tools/Stirling-PDF
Stirling-Tools/Stirling-PDF
74,357GitHubView on GitHub
Stirling-PDF is a self-hosted document processing suite designed for secure, private file management. It functions as a comprehensive transformation engine that executes complex operations—such as merging, splitting, converting, and redacting documents—directly on the host machine. The platform provides both a browser-
TypeScriptdockerhacktoberfestjava
infiniflow/ragflow
infiniflow/ragflow
73,425GitHubView on GitHub
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin
Pythonagentagenticagentic-ai
tesseract-ocr/tesseract
tesseract-ocr/tesseract
72,460GitHubView on GitHub
Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d
C++hacktoberfestlstmmachine-learning

Explore sub-tags

4 repos

Awesome GitHub RepositoriesIntelligent Extraction Frameworks

Systems utilizing machine learning and spatial analysis to interpret document structure and extract data from complex layouts.

Explore 4 awesome GitHub repositories matching content management & publishing · Intelligent Extraction Frameworks. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

microsoft/markitdown
microsoft/markitdown
87,305GitHubView on GitHub
This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine
Pythonautogenautogen-extensionlangchain
Stirling-Tools/Stirling-PDF
Stirling-Tools/Stirling-PDF
74,357GitHubView on GitHub
Stirling-PDF is a self-hosted document processing suite designed for secure, private file management. It functions as a comprehensive transformation engine that executes complex operations—such as merging, splitting, converting, and redacting documents—directly on the host machine. The platform provides both a browser-
TypeScriptdockerhacktoberfestjava
infiniflow/ragflow
infiniflow/ragflow
73,425GitHubView on GitHub
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin
Pythonagentagenticagentic-ai
tesseract-ocr/tesseract
tesseract-ocr/tesseract
72,460GitHubView on GitHub
Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d
C++hacktoberfestlstmmachine-learning

Awesome Intelligent Extraction Frameworks GitHub Repositories

microsoft/markitdown

Stirling-Tools/Stirling-PDF

infiniflow/ragflow

tesseract-ocr/tesseract

Explore sub-tags

Awesome Intelligent Extraction Frameworks GitHub Repositories

microsoft/markitdown

Stirling-Tools/Stirling-PDF

infiniflow/ragflow

tesseract-ocr/tesseract

Explore sub-tags