12 repository-uri
Tools and patterns for analyzing and extracting information from documents using AI.
Distinguishing note: Focuses on document-specific analysis patterns rather than general chat.
Explore 12 awesome GitHub repositories matching artificial intelligence & ml · Document Analysis. Refine with filters or upvote what's useful.
gstack is an AI agent framework and development workflow system designed to automate the software development lifecycle. It coordinates specialized AI personas to manage tasks across product design, engineering management, and quality assurance, transforming product intent into technical specifications and final releases. The project is distinguished by its deep integration of headless browser automation and semantic code memory. It utilizes a persistent Chromium daemon for web scraping and visual auditing, and implements a searchable knowledge base that logs architectural decisions and repos
Tags missing technical information in pull requests by mapping coverage gaps to documentation quadrants.
Label Studio is a multi-modal data annotation platform designed to create and manage high-quality training datasets for machine learning. It functions as a self-hosted, containerized environment that supports secure, private deployments, including air-gapped configurations. The platform provides a centralized workspace for labeling diverse media types, such as images, text, audio, and time-series data, to support supervised and reinforcement learning workflows. The platform distinguishes itself through deep integration with machine learning backends, enabling active learning loops, automated
Label Studio extracts information through named entity recognition and optical character recognition for complex, large-scale document analysis.
This project is a comprehensive framework and toolkit for developing, optimizing, and deploying transformer-based models across multimodal, document intelligence, and natural language processing tasks. It provides a unified neural architecture that processes text, vision, audio, and document layout data through a shared set of weights, enabling researchers and developers to build foundational models that align cross-modal representations. The platform distinguishes itself through advanced training and inference strategies designed for large-scale deep learning. It incorporates specialized mec
Provides large-scale datasets of document images paired with ground-truth reading order information to evaluate and train document analysis models.
Surya is a document processing platform designed to transform unstructured files into structured, machine-readable data. It provides a comprehensive suite of tools for text recognition, layout analysis, and reading order detection, enabling the conversion of PDFs and images into formats such as JSON, HTML, or markdown. The platform is built to handle complex document workflows, offering capabilities for data extraction, document segmentation, and automated form completion. The platform distinguishes itself through a robust pipeline-based architecture that allows users to chain analysis tasks
Performs text recognition, layout analysis, and reading order detection using typed clients and asynchronous requests.
This project is a collection of implementation guides, recipes, and developer resources for building applications with Llama models. It serves as a comprehensive kit for developing autonomous agents, establishing retrieval-augmented generation systems, and executing model fine-tuning. The resource provides specific patterns for multimodal workflows that process text, images, and audio. It includes specialized guidance on adapting pre-trained model weights for targeted tasks and implementing tool-calling orchestration to connect models with external APIs and functions. The codebase covers a b
Provides patterns for extracting information and mapping relationships within research papers and books.
Transformers.js is a JavaScript library and web machine learning framework designed to run pretrained transformer models directly in the browser. It serves as a client-side inference engine and a wrapper for the ONNX Runtime, enabling the execution of multimodal AI tasks on user devices without the need for a backend server. The library distinguishes itself by providing a unified toolkit for processing text, image, and audio data locally. This architecture supports privacy-preserving model inference and reduces latency by performing all computations on the client's hardware. Its capabilities
Extracts answers from visual scans of documents by combining image recognition with text analysis.
This platform is an automated documentation and codebase analysis system designed to generate structured wikis, technical guides, and interactive diagrams from source code repositories. It functions as a retrieval-augmented generation framework that connects codebases to language models, enabling context-aware answers, deep research, and automated documentation updates through semantic vector search. The system distinguishes itself through a self-hosted, containerized architecture that supports both cloud-based and local AI model execution. It provides sophisticated model orchestration, allow
Generates documentation in multiple languages with cultural adaptations for contextually appropriate output.
ChatGLM3 is an open-weights large language model designed for bilingual conversational interactions in English and Chinese. It functions as a tool-augmented system capable of calling external functions and executing internal code to resolve complex tasks. The model utilizes four-bit quantization to reduce memory requirements, enabling inference on consumer hardware and diverse processing units including GPUs and CPUs. It features an expanded context window for processing and summarizing long documents and includes a supervised fine-tuning pipeline for adapting the model to specialized domains
Analyzes and extracts information from extensive documents using AI and a large context window.
Scira is an AI-powered search and synthesis engine that uses agentic research workflows to find and organize information from the web and academic sources. The system breaks complex queries into multi-step plans and generates grounded answers with inline citations for verification. The platform distinguishes itself by executing Python code within isolated sandboxes to perform data analysis and create visual charts from retrieved data. It also implements retrieval-augmented generation to perform semantic searches across uploaded documents, including PDFs and CSV files, and integrates with clou
Analyzes uploaded PDFs and CSV files using semantic embeddings to answer specific questions.
Bytebot is an LLM desktop automation framework and virtual Linux desktop environment. It enables AI agents to plan and execute mouse and keyboard actions on a virtual computer using natural language, allowing for autonomous desktop automation and the integration of legacy systems that lack native APIs. The system operates as an LLM API gateway and a Model Context Protocol server, routing requests across multiple language model providers with integrated load balancing and rate limiting. It provides isolated, containerized environments where agents use visual reasoning to interpret screenshots
Reads documents and spreadsheets directly into the AI context for detailed extraction and analysis.
BERTopic is a topic modeling library used to extract interpretable themes from collections of text documents and images. It functions as a document clustering framework that transforms unstructured data into numerical vectors to group semantically similar content. The project distinguishes itself through a multimodal embedding tool that allows for joint clustering of text and images in a shared vector space. It also features a class-based TF-IDF representation engine to identify representative words for clusters and an integrated system for using large language models to generate natural lang
Groups semantically similar documents into dense clusters while identifying and excluding noise as outliers.
myGPTReader este o suită de aplicații bazate pe modele de limbaj mari, incluzând o interfață de chat, un instrument de analiză a documentelor și un agregator de știri. Sistemul se concentrează pe extragerea informațiilor din fișiere digitale și conținut web pentru a permite analiza conversațională și condensarea automată a conținutului. Proiectul dispune de un manager de șabloane de prompt-uri pentru a structura fluxurile de conversație și a crește acuratețea răspunsurilor. Include, de asemenea, un client de chat vocal multilingv care integrează speech-to-text și text-to-speech pentru tutoring interactiv în timp real și practică lingvistică. Platforma acoperă capabilități mai largi în conversația retrieval-augmented, livrarea automată de știri zilnice și sumarizarea site-urilor web și a conținutului video.
Provides a comprehensive system for analyzing and extracting information from documents using AI for question answering.