5 repos
AI-driven systems for parsing, extracting, and structuring information from unstructured documents or text.
Explore 5 awesome GitHub repositories matching artificial intelligence & ml · Document and Data Intelligence. Refine with filters or upvote what's useful.
This project is a comprehensive, community-driven directory of public service endpoints designed to facilitate the discovery and integration of external data sources. It serves as a centralized registry where developers can locate reliable third-party APIs to augment their applications with specialized functionality, r
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering
This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin
Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d