Why is ds4sd/docling a recommended Table Data Processing GitHub Repositories repository?

Utilizes vision models to interpret graphical chart elements and convert them into descriptive text or tables.

Why is filamentphp/filament a recommended Table Data Processing GitHub Repositories repository?

Enables the construction of complex, responsive row structures using grid and stack components.

Why is simstudioai/sim a recommended Table Data Processing GitHub Repositories repository?

Retrieves, inserts, and modifies data items within database tables.

Why is tanstack/table a recommended Table Data Processing GitHub Repositories repository?

Provides tools for querying and manipulating tabular data structures including sorting, filtering, and aggregation.

Why is ramda/ramda a recommended Table Data Processing GitHub Repositories repository?

Transforms lists of key-value pairs into pivoted table formats to reorganize data.

Why is beekeeper-studio/beekeeper-studio a recommended Table Data Processing GitHub Repositories repository?

Streams data from database tables into separate files using formats like SQL, CSV, or JSON.

Why is qax-os/excelize a recommended Table Data Processing GitHub Repositories repository?

Aggregates and groups large datasets into summary tables using configurable statistical functions.

Why is toml-lang/toml a recommended Table Data Processing GitHub Repositories repository?

Uses dot-notation keys and bracketed headers to structure hierarchical configuration data.

Why is knex/knex a recommended Table Data Processing GitHub Repositories repository?

Includes utilities to delete all records from specific database tables to reset environment state.

Why is cube-js/cube a recommended Table Data Processing GitHub Repositories repository?

Combines metrics from multiple fact tables sharing common dimensions without causing row multiplication or data duplication.

75 repositorios

Awesome GitHub RepositoriesTable Data Processing

Tools for querying and manipulating tabular data structures.

Distinguishing note: Focuses on row-level table operations rather than full database administration.

Explore 75 awesome GitHub repositories matching data & databases · Table Data Processing. Refine with filters or upvote what's useful.

Encuentra los mejores repositorios con IA.Buscaremos los repositorios que mejor coincidan usando IA.

ds4sd/docling
DS4SD/docling
62,172Ver en GitHub
Docling is a multimodal content converter and document parser designed to transform PDFs, Office files, and HTML into structured Markdown or JSON for generative AI applications. It functions as an OCR document processor and a PDF layout analyzer that extracts tables, charts, and hierarchical structures while preserving the original page layout. The system operates as a local-first inference engine, allowing for the processing of sensitive data in air-gapped environments without external network connectivity. It can also be deployed as an API or a Model Context Protocol server to provide parsi
Utilizes vision models to interpret graphical chart elements and convert them into descriptive text or tables.
Python
Ver en GitHub62,172
filamentphp/filament
filamentphp/filament
31,215Ver en GitHub
Filament is a full-stack framework for building administrative panels and management interfaces within the Laravel ecosystem. It provides a declarative, component-based architecture that allows developers to construct complex, data-driven applications using server-side configuration objects rather than manual HTML. By inspecting database model structures and relationships, the framework automates the generation of CRUD interfaces, forms, and data tables, significantly reducing boilerplate code. The project distinguishes itself through a highly modular and extensible design that supports custo
Enables the construction of complex, responsive row structures using grid and stack components.
PHPadminalpine-jsbuilder
Ver en GitHub31,215
simstudioai/sim
simstudioai/sim
28,796Ver en GitHub
This project is an AI agent orchestration platform that provides a visual environment for building, testing, and deploying complex automation workflows. It functions as a low-code development interface where users can chain discrete functional blocks into dependency-aware pipelines to integrate artificial intelligence with external data and services. The platform supports the creation of intelligent conversational agents, automated business processes, and multi-service API orchestrations within a unified workspace. The platform distinguishes itself through its event-driven integration engine,
Retrieves, inserts, and modifies data items within database tables.
TypeScriptagent-workflowagentic-workflowagents
Ver en GitHub28,796
tanstack/table
TanStack/table
28,119Ver en GitHub
TanStack Table is a headless, framework-agnostic engine designed for building complex data grids and managing tabular state. By decoupling data processing logic from the visual rendering layer, it allows developers to implement custom user interfaces while offloading sophisticated operations like sorting, filtering, grouping, and pagination to a unified, performant core. The library distinguishes itself through its commitment to type safety and environment flexibility. It leverages strict type definitions to ensure data integrity across the entire application and utilizes an adapter pattern t
Provides tools for querying and manipulating tabular data structures including sorting, filtering, and aggregation.
TypeScriptdatagriddatagridsdatatable
Ver en GitHub28,119
ramda/ramda
ramda/ramda
24,072Ver en GitHub
Ramda is a functional JavaScript standard library and toolset for immutable data transformation and composition. It provides a comprehensive suite of pure utility functions designed to enable declarative data processing pipelines. The library is distinguished by its use of automatic function currying and a data-last argument order. These design patterns allow multi-argument functions to be partially applied, simplifying the construction of processing chains where data is passed through a sequence of operations. The toolkit covers broad data manipulation capabilities, including list processin
Transforms lists of key-value pairs into pivoted table formats to reorganize data.
JavaScriptjavascriptramda
Ver en GitHub24,072
beekeeper-studio/beekeeper-studio
beekeeper-studio/beekeeper-studio
22,030Ver en GitHub
Beekeeper Studio is a cross-platform desktop application designed for database management and SQL development. It provides a unified graphical interface to connect to, query, and modify data across a wide range of relational and NoSQL database systems. The application functions as a comprehensive workspace, integrating tools for schema design, record editing, and data visualization. The project distinguishes itself through a focus on secure, flexible connectivity and AI-assisted workflows. It supports advanced authentication methods, including enterprise single sign-on, multi-factor authentic
Streams data from database tables into separate files using formats like SQL, CSV, or JSON.
TypeScriptbigquerycassandracockroachdb
Ver en GitHub22,030
qax-os/excelize
qax-os/excelize
20,682Ver en GitHub
Excelize is a library for reading and writing spreadsheet files in the Office Open XML format. It provides a comprehensive suite of tools for programmatically creating, modifying, and analyzing workbooks, worksheets, and cell data, ensuring compatibility across various office software suites through structured XML serialization. The library distinguishes itself with a built-in formula calculation engine that evaluates complex mathematical and logical expressions directly against workbook data. It also features a memory-mapped streaming architecture, which allows for the efficient processing o
Aggregates and groups large datasets into summary tables using configurable statistical functions.
Goagentaianalytics
Ver en GitHub20,682
toml-lang/toml
toml-lang/toml
20,525Ver en GitHub
TOML is a configuration file format designed for human readability and unambiguous mapping to hash tables. It serves as a standardized language for structured data, enabling consistent parsing and data exchange across diverse programming environments. The format distinguishes itself through a strict type-system specification that ensures data is interpreted identically regardless of the implementation. It utilizes a line-oriented lexical structure that supports both hierarchical organization through bracketed sections and compact inline embedding for nested objects. This approach allows for t
Uses dot-notation keys and bracketed headers to structure hierarchical configuration data.
Ver en GitHub20,525
knex/knex
knex/knex
20,300Ver en GitHub
Knex is a multi-dialect database client that provides a programmatic SQL query builder, a connection pool manager, and a versioned schema migration tool. It enables programmatic database interaction across multiple SQL engines, including PostgreSQL, MySQL, SQLite3, SQL Server, CockroachDB, and Oracle. The project distinguishes itself through a fluent interface for constructing complex SQL statements and a dedicated framework for database seeding. It utilizes specialized dialects to translate generic query representations into database-specific syntax while maintaining a consistent API across
Includes utilities to delete all records from specific database tables to reset environment state.
JavaScript
Ver en GitHub20,300
cube-js/cube
cube-js/cube
20,251Ver en GitHub
Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools. The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orches
Combines metrics from multiple fact tables sharing common dimensions without causing row multiplication or data duplication.
Rustagentic-analyticsagentsai
Ver en GitHub20,251
dream-num/luckysheet
dream-num/Luckysheet
16,644Ver en GitHub
Luckysheet upgraded to Univer
Summarizes and visualizes data through interactive pivot tables and chart components.
JavaScriptcanvaschartconditional-formatting
Ver en GitHub16,644
floci-io/floci
floci-io/floci
14,168Ver en GitHub
Floci is a local emulator for AWS services and cloud infrastructure designed for developing and testing applications without a live internet connection. It serves as a containerized cloud emulator and a serverless runtime emulator, allowing users to run high-fidelity replicas of cloud databases, queues, and compute services on a local machine. The project distinguishes itself by using real container images instead of simple mocks to ensure behavioral accuracy. It functions as a local API gateway simulator with proxy-based routing for REST and WebSocket APIs, and provides a serverless environm
Emulates NoSQL database operations including specialized query languages and point-in-time recovery.
Javaawsaws-emulationdevops
Ver en GitHub14,168
unstructured-io/unstructured
Unstructured-IO/unstructured
14,019Ver en GitHub
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
Converts tabular data into structured HTML format to facilitate accurate data extraction and rendering.
HTMLdata-pipelinesdeep-learningdocument-image-analysis
Ver en GitHub14,019
dbt-labs/dbt-core
dbt-labs/dbt-core
13,051Ver en GitHub
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
Configures target states using SQL statements that the warehouse maintains through incremental refreshes.
Rustanalyticsbusiness-intelligencedata-modeling
Ver en GitHub13,051
simular-ai/agent-s
simular-ai/Agent-S
11,855Ver en GitHub
Agent-S is a multimodal AI agent and LLM desktop automation framework designed to control operating systems through graphical user interface interactions. It functions as a computer use interface, utilizing vision-language grounding to translate natural language goals into precise screen coordinates and system actions. The project differentiates itself by combining structured accessibility tree inspection with vision-based element localization. It manages cross-application workflows by mapping conceptual descriptions to physical pixels and simulating low-level keyboard and mouse events to mov
Extracts cell values, labels, and column data from structured tables and spreadsheets.
Pythonagent-computer-interfaceai-agentscomputer-automation
Ver en GitHub11,855
perspective-dev/perspective
perspective-dev/perspective
10,981Ver en GitHub
Perspective is a columnar data analytics engine and high-performance visualization component powered by WebAssembly. It provides a system for analyzing and visualizing large or streaming datasets through interactive data grids and charts, utilizing a compiled binary to achieve near-native performance within the browser. The project distinguishes itself through a WebSocket-based data streaming interface and deep Apache Arrow integration, which minimize memory overhead when synchronizing tables between servers and clients. It acts as a remote query proxy capable of translating visualization con
Combines two source tables on a shared key to create a reactive read-only joined table.
C++analyticsbidata-visualization
Ver en GitHub10,981
ferretdb/ferretdb
FerretDB/FerretDB
10,976Ver en GitHub
FerretDB is an open-source database emulator and protocol translator that mimics a MongoDB environment to support existing drivers and client tools on a relational backend. It functions as a stateless database proxy that converts binary wire protocol messages into SQL statements, allowing a relational engine to handle document-oriented requests. The project serves as a migration tool for moving applications from MongoDB to PostgreSQL without rewriting queries or changing client drivers. It achieves this by using PostgreSQL as a document store, storing and querying BSON documents through a tra
Mimics a MongoDB environment to support existing drivers and client tools on a relational backend.
Go
Ver en GitHub10,976
opendatalab/pdf-extract-kit
opendatalab/PDF-Extract-Kit
9,724Ver en GitHub
PDF-Extract-Kit is a document extraction toolkit designed to convert PDF documents into structured formats such as Markdown, HTML, and LaTeX. It functions as a multi-stage parsing framework that combines a document layout analyzer, a formula recognition engine, an OCR text extractor, and a table extraction system. The project focuses on recovering complex document elements by translating images of mathematical formulas and tabular structures into editable source code. It utilizes model-driven layout analysis to identify structural elements in reports and textbooks while ignoring noise like wa
Transforms images of tables into structured source code using LaTeX, HTML, or Markdown formats.
Python
Ver en GitHub9,724
sosedoff/pgweb
sosedoff/pgweb
9,399Ver en GitHub
pgweb is a web-based database client and graphical administration tool for PostgreSQL. It provides a browser-based interface for executing SQL queries, inspecting schemas, and managing database objects. The tool includes a read-only mode that prevents destructive operations by blocking specific SQL keywords. It supports secure remote access to private instances through native SSH tunneling and encrypted database connections. The application covers a broad range of management capabilities, including multi-environment session management, database structure inspection, and the export of query r
Enables browsing of table contents with pagination, column sorting, and row filtering.
Gocross-platformgolangpgweb
Ver en GitHub9,399
apache/seatunnel
apache/seatunnel
9,427Ver en GitHub
SeaTunnel is a distributed data integration engine designed to synchronize structured and unstructured data across diverse sources and sinks. It functions as a multi-engine execution framework that can run data integration tasks across different distributed computing backends to optimize workload performance. The project is distinguished by a visual data pipeline designer for configuring workflows without manual code and a specialized change data capture tool for streaming incremental database updates. It also includes an enrichment pipeline that integrates large language models and embedding
Enables transformation logic to be applied across multiple tables simultaneously using a single configuration.
Javaapachebatchcdc
Ver en GitHub9,427

Awesome Table Data Processing GitHub Repositories

DS4SD/docling

filamentphp/filament

simstudioai/sim

TanStack/table

ramda/ramda

beekeeper-studio/beekeeper-studio

qax-os/excelize

toml-lang/toml

knex/knex

cube-js/cube

dream-num/Luckysheet

floci-io/floci

Unstructured-IO/unstructured

dbt-labs/dbt-core

simular-ai/Agent-S

perspective-dev/perspective

FerretDB/FerretDB

opendatalab/PDF-Extract-Kit

sosedoff/pgweb

apache/seatunnel

Explorar subetiquetas