awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Document and Data Intelligence · Awesome GitHub Repositories

5 repos

Awesome GitHub RepositoriesDocument and Data Intelligence

AI-driven systems for parsing, extracting, and structuring information from unstructured documents or text.

Explore 5 awesome GitHub repositories matching artificial intelligence & ml · Document and Data Intelligence. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Machine Learning
  4. Document and Data Intelligence

Awesome Document and Data Intelligence GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • public-apis/public-apis

    public-apis/public-apis

    399,192GitHubView on GitHub↗

    This project is a comprehensive, community-driven directory of public service endpoints designed to facilitate the discovery and integration of external data sources. It serves as a centralized registry where developers can locate reliable third-party APIs to augment their applications with specialized functionality, r

    Pythonapiapisdataset
  • huggingface/transformers

    huggingface/transformers

    156,730GitHubView on GitHub↗

    Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering

    Pythonaudiodeep-learningdeepseek
  • microsoft/markitdown

    microsoft/markitdown

    87,305GitHubView on GitHub↗

    This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine

    Pythonautogenautogen-extensionlangchain
  • infiniflow/ragflow

    infiniflow/ragflow

    73,425GitHubView on GitHub↗

    This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin

    Pythonagentagenticagentic-ai
  • tesseract-ocr/tesseract

    tesseract-ocr/tesseract

    72,460GitHubView on GitHub↗

    Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d

    C++hacktoberfestlstmmachine-learning

Explore sub-tags

  • AI-Powered Data ExtractionTools that automatically parse and extract structured data from unstructured documents like invoices, forms, and reports.
  • Automated Digitization EnginesAutomated pipelines for converting scanned documents into searchable text formats.
  • Document Intelligence ServicesCloud-based services that analyze, classify, and summarize large volumes of complex document-based information.
  • Model-Driven Text ExtractionUsing multimodal models to interpret layouts and extract text.
  • Multimodal Layout AnalysisTechniques for interpreting visual document structures and embedded image content using multimodal models.
  • Question Answering1 sub-tagAutomated systems designed to extract specific answers from provided documents or knowledge bases.
  • Semantic Parsing ToolsTools that extract and interpret structured data, such as text and tables, from complex document formats.
  • Text Analysis APIs1 sub-tagWeb services that provide programmatic access to natural language processing for analyzing and classifying text.