75 repositorios
Tools for querying and manipulating tabular data structures.
Distinguishing note: Focuses on row-level table operations rather than full database administration.
Explore 75 awesome GitHub repositories matching data & databases · Table Data Processing. Refine with filters or upvote what's useful.
Docling is a multimodal content converter and document parser designed to transform PDFs, Office files, and HTML into structured Markdown or JSON for generative AI applications. It functions as an OCR document processor and a PDF layout analyzer that extracts tables, charts, and hierarchical structures while preserving the original page layout. The system operates as a local-first inference engine, allowing for the processing of sensitive data in air-gapped environments without external network connectivity. It can also be deployed as an API or a Model Context Protocol server to provide parsi
Utilizes vision models to interpret graphical chart elements and convert them into descriptive text or tables.
Filament is a full-stack framework for building administrative panels and management interfaces within the Laravel ecosystem. It provides a declarative, component-based architecture that allows developers to construct complex, data-driven applications using server-side configuration objects rather than manual HTML. By inspecting database model structures and relationships, the framework automates the generation of CRUD interfaces, forms, and data tables, significantly reducing boilerplate code. The project distinguishes itself through a highly modular and extensible design that supports custo
Enables the construction of complex, responsive row structures using grid and stack components.
This project is an AI agent orchestration platform that provides a visual environment for building, testing, and deploying complex automation workflows. It functions as a low-code development interface where users can chain discrete functional blocks into dependency-aware pipelines to integrate artificial intelligence with external data and services. The platform supports the creation of intelligent conversational agents, automated business processes, and multi-service API orchestrations within a unified workspace. The platform distinguishes itself through its event-driven integration engine,
Retrieves, inserts, and modifies data items within database tables.
TanStack Table is a headless, framework-agnostic engine designed for building complex data grids and managing tabular state. By decoupling data processing logic from the visual rendering layer, it allows developers to implement custom user interfaces while offloading sophisticated operations like sorting, filtering, grouping, and pagination to a unified, performant core. The library distinguishes itself through its commitment to type safety and environment flexibility. It leverages strict type definitions to ensure data integrity across the entire application and utilizes an adapter pattern t
Provides tools for querying and manipulating tabular data structures including sorting, filtering, and aggregation.
Ramda is a functional JavaScript standard library and toolset for immutable data transformation and composition. It provides a comprehensive suite of pure utility functions designed to enable declarative data processing pipelines. The library is distinguished by its use of automatic function currying and a data-last argument order. These design patterns allow multi-argument functions to be partially applied, simplifying the construction of processing chains where data is passed through a sequence of operations. The toolkit covers broad data manipulation capabilities, including list processin
Transforms lists of key-value pairs into pivoted table formats to reorganize data.
Beekeeper Studio is a cross-platform desktop application designed for database management and SQL development. It provides a unified graphical interface to connect to, query, and modify data across a wide range of relational and NoSQL database systems. The application functions as a comprehensive workspace, integrating tools for schema design, record editing, and data visualization. The project distinguishes itself through a focus on secure, flexible connectivity and AI-assisted workflows. It supports advanced authentication methods, including enterprise single sign-on, multi-factor authentic
Streams data from database tables into separate files using formats like SQL, CSV, or JSON.
Excelize is a library for reading and writing spreadsheet files in the Office Open XML format. It provides a comprehensive suite of tools for programmatically creating, modifying, and analyzing workbooks, worksheets, and cell data, ensuring compatibility across various office software suites through structured XML serialization. The library distinguishes itself with a built-in formula calculation engine that evaluates complex mathematical and logical expressions directly against workbook data. It also features a memory-mapped streaming architecture, which allows for the efficient processing o
Aggregates and groups large datasets into summary tables using configurable statistical functions.
TOML is a configuration file format designed for human readability and unambiguous mapping to hash tables. It serves as a standardized language for structured data, enabling consistent parsing and data exchange across diverse programming environments. The format distinguishes itself through a strict type-system specification that ensures data is interpreted identically regardless of the implementation. It utilizes a line-oriented lexical structure that supports both hierarchical organization through bracketed sections and compact inline embedding for nested objects. This approach allows for t
Uses dot-notation keys and bracketed headers to structure hierarchical configuration data.
Knex is a multi-dialect database client that provides a programmatic SQL query builder, a connection pool manager, and a versioned schema migration tool. It enables programmatic database interaction across multiple SQL engines, including PostgreSQL, MySQL, SQLite3, SQL Server, CockroachDB, and Oracle. The project distinguishes itself through a fluent interface for constructing complex SQL statements and a dedicated framework for database seeding. It utilizes specialized dialects to translate generic query representations into database-specific syntax while maintaining a consistent API across
Includes utilities to delete all records from specific database tables to reset environment state.
Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools. The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orches
Combines metrics from multiple fact tables sharing common dimensions without causing row multiplication or data duplication.
Luckysheet upgraded to Univer
Summarizes and visualizes data through interactive pivot tables and chart components.
Floci is a local emulator for AWS services and cloud infrastructure designed for developing and testing applications without a live internet connection. It serves as a containerized cloud emulator and a serverless runtime emulator, allowing users to run high-fidelity replicas of cloud databases, queues, and compute services on a local machine. The project distinguishes itself by using real container images instead of simple mocks to ensure behavioral accuracy. It functions as a local API gateway simulator with proxy-based routing for REST and WebSocket APIs, and provides a serverless environm
Emulates NoSQL database operations including specialized query languages and point-in-time recovery.
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
Converts tabular data into structured HTML format to facilitate accurate data extraction and rendering.
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
Configures target states using SQL statements that the warehouse maintains through incremental refreshes.
Agent-S is a multimodal AI agent and LLM desktop automation framework designed to control operating systems through graphical user interface interactions. It functions as a computer use interface, utilizing vision-language grounding to translate natural language goals into precise screen coordinates and system actions. The project differentiates itself by combining structured accessibility tree inspection with vision-based element localization. It manages cross-application workflows by mapping conceptual descriptions to physical pixels and simulating low-level keyboard and mouse events to mov
Extracts cell values, labels, and column data from structured tables and spreadsheets.
Perspective is a columnar data analytics engine and high-performance visualization component powered by WebAssembly. It provides a system for analyzing and visualizing large or streaming datasets through interactive data grids and charts, utilizing a compiled binary to achieve near-native performance within the browser. The project distinguishes itself through a WebSocket-based data streaming interface and deep Apache Arrow integration, which minimize memory overhead when synchronizing tables between servers and clients. It acts as a remote query proxy capable of translating visualization con
Combines two source tables on a shared key to create a reactive read-only joined table.
FerretDB is an open-source database emulator and protocol translator that mimics a MongoDB environment to support existing drivers and client tools on a relational backend. It functions as a stateless database proxy that converts binary wire protocol messages into SQL statements, allowing a relational engine to handle document-oriented requests. The project serves as a migration tool for moving applications from MongoDB to PostgreSQL without rewriting queries or changing client drivers. It achieves this by using PostgreSQL as a document store, storing and querying BSON documents through a tra
Mimics a MongoDB environment to support existing drivers and client tools on a relational backend.
PDF-Extract-Kit is a document extraction toolkit designed to convert PDF documents into structured formats such as Markdown, HTML, and LaTeX. It functions as a multi-stage parsing framework that combines a document layout analyzer, a formula recognition engine, an OCR text extractor, and a table extraction system. The project focuses on recovering complex document elements by translating images of mathematical formulas and tabular structures into editable source code. It utilizes model-driven layout analysis to identify structural elements in reports and textbooks while ignoring noise like wa
Transforms images of tables into structured source code using LaTeX, HTML, or Markdown formats.
pgweb is a web-based database client and graphical administration tool for PostgreSQL. It provides a browser-based interface for executing SQL queries, inspecting schemas, and managing database objects. The tool includes a read-only mode that prevents destructive operations by blocking specific SQL keywords. It supports secure remote access to private instances through native SSH tunneling and encrypted database connections. The application covers a broad range of management capabilities, including multi-environment session management, database structure inspection, and the export of query r
Enables browsing of table contents with pagination, column sorting, and row filtering.
SeaTunnel is a distributed data integration engine designed to synchronize structured and unstructured data across diverse sources and sinks. It functions as a multi-engine execution framework that can run data integration tasks across different distributed computing backends to optimize workload performance. The project is distinguished by a visual data pipeline designer for configuring workflows without manual code and a specialized change data capture tool for streaming incremental database updates. It also includes an enrichment pipeline that integrates large language models and embedding
Enables transformation logic to be applied across multiple tables simultaneously using a single configuration.