# High Performance Tabular Data Libraries

> Search results for `fast dataframe library for crunching tabular data` on awesome-repositories.com. 112 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/fast-dataframe-library-for-crunching-tabular-data

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/fast-dataframe-library-for-crunching-tabular-data).**

## Results

- [hosseinmoein/dataframe](https://awesome-repositories.com/repository/hosseinmoein-dataframe.md) (2,917 ⭐) — DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets.

The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f
- [apache/datafusion](https://awesome-repositories.com/repository/apache-datafusion.md) (8,908 ⭐) — Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules.

The engine distinguishes itself through its modular extension framework, which enables building custom query e
- [nushell/nushell](https://awesome-repositories.com/repository/nushell-nushell.md) (39,743 ⭐) — Nushell is a cross-platform shell and programming language designed to treat all input and output as structured data rather than raw text streams. By enforcing data types and command signatures, it provides a consistent environment for building robust, pipeline-oriented workflows. The shell allows users to chain commands that pass structured objects between stages, enabling complex data processing and automation tasks that remain predictable across different operating systems.

What distinguishes the project is its focus on interactive data exploration and modular extensibility. Users can quer
- [dask/dask](https://awesome-repositories.com/repository/dask-dask.md) (13,746 ⭐) — Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements.

The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
- [microsoft/data-science-for-beginners](https://awesome-repositories.com/repository/microsoft-data-science-for-beginners.md) (35,657 ⭐) — This project is a comprehensive educational curriculum designed to teach the fundamental concepts, workflows, and tools of data science. It provides a structured learning path that covers the end-to-end data science lifecycle, including data acquisition, maintenance, processing, and pattern discovery, while grounding theoretical knowledge in practical, real-world applications.

The curriculum distinguishes itself through a data-driven pedagogical design that utilizes interactive, notebook-based lessons. By combining narrative text with live code blocks, the platform allows learners to experime
- [bvaughn/react-virtualized](https://awesome-repositories.com/repository/bvaughn-react-virtualized.md) (27,072 ⭐) — react-virtualized is a library of components for rendering massive lists and tables by drawing only the elements visible in the viewport. It provides specialized layout managers including a windowed grid component and a dynamic height list manager.

The project includes a masonry layout engine for packing items of varying heights and widths, as well as an infinite scroll interface for incrementally fetching and appending data.

The library covers a broad range of virtualization capabilities, including frozen grid elements, reverse list rendering, and synchronized viewport scrolling. It also su
- [iamseancheney/python_for_data_analysis_2nd_chinese_version](https://awesome-repositories.com/repository/iamseancheney-python-for-data-analysis-2nd-chinese-version.md) (8,937 ⭐) — This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data.

The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
- [fastai/fastai](https://awesome-repositories.com/repository/fastai-fastai.md) (27,862 ⭐) — Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models.

The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
- [diyago/gan-for-tabular-data](https://awesome-repositories.com/repository/diyago-gan-for-tabular-data.md) (0 ⭐) — Generative Networks are well-known for their success in realistic image generation. However, they can also be applied to generate tabular data. We introduce major improvements for generating high-fidelity tabular data giving oppotunity to try GANS, TimeGANs, Diffusions and LLM for tabular data…
- [diyago/tabular-data-generation](https://awesome-repositories.com/repository/diyago-tabular-data-generation.md) (570 ⭐) — We well know GANs for success in the realistic image generation. However, they can be applied in tabular data generation. We will review and examine some recent papers about tabular GANs in action.
- [kanaries/pygwalker](https://awesome-repositories.com/repository/kanaries-pygwalker.md) (15,628 ⭐) — Pygwalker is a library that transforms tabular data into interactive, drag-and-drop interfaces for exploratory analysis and visualization. It functions as a grammar-based framework that translates user interactions into declarative chart definitions, allowing for the creation of dynamic data exploration environments directly within notebooks or embedded web applications.

The system distinguishes itself by offloading heavy analytical computations to backend kernels, which maintains responsiveness when visualizing large datasets. It supports the serialization of visual states into portable conf
- [wesm/pydata-book](https://awesome-repositories.com/repository/wesm-pydata-book.md) (24,668 ⭐) — This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis.

The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
- [bukosabino/ta](https://awesome-repositories.com/repository/bukosabino-ta.md) (4,890 ⭐) — This is a pandas-based technical analysis library and financial feature engineering tool. It serves as a vectorized indicator calculator that transforms raw price and volume data into derived metrics for time series analysis.

The library uses a NumPy-based engine to perform mathematical operations across entire arrays, avoiding iterative loops to maintain high performance. It organizes technical indicators into a modular class hierarchy with a consistent interface, allowing for bulk feature generation and the direct appending of results as new columns to a pandas DataFrame.

The system covers
- [jaykali/maskphish](https://awesome-repositories.com/repository/jaykali-maskphish.md) (3,020 ⭐) — Maskphish is a comprehensive security toolkit that integrates capabilities for digital forensics, network vulnerability scanning, open-source intelligence, penetration testing, and social engineering. It functions as a multi-purpose framework for automating reconnaissance and executing security audits across diverse network environments.

The project features a specialized phishing and social engineering toolkit used for cloning websites, masking URLs, and deploying deceptive pages to capture user credentials. It also includes a remote access Trojan builder for generating platform-specific exe
- [rocketlaunchr/dataframe-go](https://awesome-repositories.com/repository/rocketlaunchr-dataframe-go.md) (1,287 ⭐) — DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
- [vonng/ddia](https://awesome-repositories.com/repository/vonng-ddia.md) (22,648 ⭐) — This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure.

The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
- [juliastats/dataframes.jl](https://awesome-repositories.com/repository/juliastats-dataframes-jl.md) (1,830 ⭐) — In-memory tabular data in Julia
- [chainlit/chainlit](https://awesome-repositories.com/repository/chainlit-chainlit.md) (12,213 ⭐) — Chainlit is a Python framework designed for building and deploying interactive, stateful conversational AI interfaces. It provides a backend-driven platform that connects language models and agent frameworks to a web-based chat frontend, managing the complexities of session state, message history, and real-time communication.

The framework distinguishes itself by offering a component-based UI builder that allows developers to inject interactive widgets, rich media, and data visualizations directly into the chat stream. It supports the visualization of complex agent workflows, enabling users t
- [saulpw/visidata](https://awesome-repositories.com/repository/saulpw-visidata.md) (8,834 ⭐) — VisiData is a terminal-based interactive data analysis tool and browser designed for exploring, filtering, and sorting large tabular datasets. It functions as a structured data inspector that loads and flattens complex formats like JSON, XML, and PCAP into interactive sheets, as well as a terminal file manager for navigating directories and performing staged filesystem operations.

The project distinguishes itself by rendering data visualizations, such as scatter plots and histograms, directly in the terminal using Unicode Braille characters. It provides a Python-based data wrangling environme
- [juliadata/dataframes.jl](https://awesome-repositories.com/repository/juliadata-dataframes-jl.md) (1,830 ⭐) — In-memory tabular data in Julia
- [jondot/crunch](https://awesome-repositories.com/repository/jondot-crunch.md) (212 ⭐) — A fast to iterate, fast to run, Go based toolkit for ETL and feature extraction on Hadoop.
- [polakowo/vectorbt](https://awesome-repositories.com/repository/polakowo-vectorbt.md) (6,720 ⭐) — VectorBT is a vectorized trading strategy backtesting framework that simulates thousands of strategy configurations in a single pass over historical price data. It operates as a parameter optimization engine, a portfolio performance analyzer, a technical indicator calculator, and a financial data fetcher, all built around a DataFrame-centric data model that uses NumPy broadcasting for signal alignment and compiled code acceleration for performance.

The framework distinguishes itself through its ability to run large-scale parameter sweeps by constructing every combination of strategy parameter
- [aldeed/meteor-tabular](https://awesome-repositories.com/repository/aldeed-meteor-tabular.md) (360 ⭐) — aldeed:tabular
- [apache/airflow](https://awesome-repositories.com/repository/apache-airflow.md) (45,902 ⭐) — Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments.

The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
- [modin-project/modin](https://awesome-repositories.com/repository/modin-project-modin.md) (10,389 ⭐) — Modin is a distributed dataframe library and parallel data processing engine designed to handle large datasets that exceed system memory. It functions as a distributed computing framework that parallelizes data manipulation tasks across multiple CPU cores or clusters to increase throughput and avoid memory errors.

The project mirrors the Pandas API, allowing for the distribution of data workflows without changing core code logic. It utilizes a pluggable backend interface, which enables users to switch between different distributed execution engines to optimize performance based on available h
- [jordipolo/dataframe](https://awesome-repositories.com/repository/jordipolo-dataframe.md) (63 ⭐) — Package providing functionality similar to Python's Pandas or R's data.frame()
- [lancedb/lancedb](https://awesome-repositories.com/repository/lancedb-lancedb.md) (9,031 ⭐) — LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines.

The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
- [fastly/fastly-magento2](https://awesome-repositories.com/repository/fastly-fastly-magento2.md) (156 ⭐) — Thank you for using the "Fastly CDN module for Magento2" (Fastly_Cdn).
- [avelino/awesome-go](https://awesome-repositories.com/repository/avelino-awesome-go.md) (175,576 ⭐) — This project serves as a comprehensive language ecosystem index, functioning as a centralized, community-curated directory for the Go programming language. It organizes a vast landscape of software components, libraries, and development tools into a structured, navigable hierarchy, enabling developers to efficiently discover resources tailored to specific functional domains.

The repository distinguishes itself through a decentralized contribution model, where community-driven updates ensure the index remains current with the rapidly evolving software landscape. Beyond simple resource listing,
- [superwhiskers/crunch](https://awesome-repositories.com/repository/superwhiskers-crunch.md) (98 ⭐) — take bytes out of things easily ✨🍪
- [pola-rs/polars](https://awesome-repositories.com/repository/pola-rs-polars.md) (38,855 ⭐) — Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters.

The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
- [haifengl/smile](https://awesome-repositories.com/repository/haifengl-smile.md) (6,387 ⭐) — Smile is a comprehensive JVM machine learning library and statistical computing toolkit. It provides a suite of algorithms for classification, regression, and clustering, implemented natively for Java, Scala, and Kotlin. The project also functions as a deep learning framework, a natural language processing library, and an inference engine for large language models.

The library distinguishes itself through GPU acceleration via LibTorch bindings and support for the ONNX model interchange format. It includes specialized capabilities for large language model inference, featuring Byte-Pair Encodin
- [emotion-js/emotion](https://awesome-repositories.com/repository/emotion-js-emotion.md) (18,017 ⭐) — This project is a styling library and framework designed for component-based architectures, enabling developers to define and manage visual styles directly within JavaScript or TypeScript. It functions as a styling engine that generates unique class names from style definitions, ensuring encapsulated, predictable, and maintainable visual presentation across applications. By integrating with component logic, it allows for the creation of reusable UI elements with styles defined through template literals or object syntax.

The library distinguishes itself through a comprehensive suite of build-t
- [pandas-dev/pandas](https://awesome-repositories.com/repository/pandas-dev-pandas.md) (49,039 ⭐) — Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations.

The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
- [chrissimpkins/crunch](https://awesome-repositories.com/repository/chrissimpkins-crunch.md) (3,422 ⭐) — Insane(ly slow but wicked good) PNG image optimization
- [guipsamora/pandas_exercises](https://awesome-repositories.com/repository/guipsamora-pandas-exercises.md) (12,180 ⭐) — This repository is a collection of structured coding challenges designed to build proficiency in data manipulation, cleaning, and transformation using the Python data analysis library. It functions as a hands-on tutorial for learning how to process and analyze tabular datasets through a series of practical, real-world exercises.

The project utilizes interactive documents that combine live code cells with narrative text, allowing users to execute data manipulation logic in a persistent environment. The content is organized into modular, progressive units that increase in complexity, enabling u
- [invisionapp/tabular](https://awesome-repositories.com/repository/invisionapp-tabular.md) (0 ⭐)
- [expo/expo](https://awesome-repositories.com/repository/expo-expo.md) (50,111 ⭐) — Expo is a universal mobile framework designed to build native iOS and Android applications from a single codebase using web-standard technologies. It provides a comprehensive development environment that includes a unified runtime for testing, cloud-based infrastructure for compiling and signing native binaries, and automated tools for managing the entire mobile release lifecycle, including app store submission.

The framework distinguishes itself through a plugin-based native configuration engine that programmatically modifies project files, allowing developers to integrate native modules wit
- [jetbrains/kotlin](https://awesome-repositories.com/repository/jetbrains-kotlin.md) (52,880 ⭐) — Kotlin is a statically typed, general-purpose programming language designed for type safety and concise syntax. It functions as a cross-platform development toolkit that enables the sharing of business logic across mobile, web, and server-side environments by compiling a unified intermediate representation into platform-specific machine code, bytecode, or source code.

The project distinguishes itself through a multi-target build orchestration model that manages complex compilation units and hierarchical source sets. Developers can define common interface logic that is satisfied by platform-sp
- [qax-os/excelize](https://awesome-repositories.com/repository/qax-os-excelize.md) (20,682 ⭐) — Excelize is a library for reading and writing spreadsheet files in the Office Open XML format. It provides a comprehensive suite of tools for programmatically creating, modifying, and analyzing workbooks, worksheets, and cell data, ensuring compatibility across various office software suites through structured XML serialization.

The library distinguishes itself with a built-in formula calculation engine that evaluates complex mathematical and logical expressions directly against workbook data. It also features a memory-mapped streaming architecture, which allows for the efficient processing o
- [etcimon/fast](https://awesome-repositories.com/repository/etcimon-fast.md) (111 ⭐) — fast
- [x-extends/vxe-table](https://awesome-repositories.com/repository/x-extends-vxe-table.md) (8,595 ⭐) — vxe-table is a high-performance data table component and UI library for Vue, designed for building data-heavy applications. It functions as a virtualized data grid and spreadsheet UI framework capable of rendering millions of rows by mounting only the visible elements of a dataset.

The project distinguishes itself through spreadsheet-like functionality, including cell selection, copy-paste support, and the generation of cross-tabulated pivot tables. It also provides specialized tools for managing complex data hierarchies using virtual trees, row grouping, and cell merging.

The library covers
- [azuread/microsoft-authentication-library-for-js](https://awesome-repositories.com/repository/azuread-microsoft-authentication-library-for-js.md) (4,084 ⭐) — Microsoft Authentication Library (MSAL) for JS
- [fastai/fastbook](https://awesome-repositories.com/repository/fastai-fastbook.md) (24,587 ⭐) — This project is an interactive educational textbook and comprehensive machine learning resource designed for deep learning education. It provides a structured curriculum that combines narrative prose with executable code, utilizing literate programming to create reproducible learning experiences within a collection of Jupyter Notebooks.

The repository distinguishes itself by teaching machine learning through applied research and modular design. It demonstrates a callback-driven training loop, a declarative data-block pipeline, and a layered abstraction API that allows users to transition betw
- [honojs/hono](https://awesome-repositories.com/repository/honojs-hono.md) (30,994 ⭐) — Hono is a lightweight web framework built on Web Standard APIs that executes across JavaScript runtimes including Cloudflare Workers, Deno, Bun, and Node.js.
- [xuri/excelize](https://awesome-repositories.com/repository/xuri-excelize.md) (20,668 ⭐) — Excelize is a Go library designed for reading, writing, and modifying Microsoft Excel files in XML-based formats. It functions as a spreadsheet file parser and generator that enables the programmatic extraction and modification of data.

The library includes a streaming spreadsheet processor to handle massive datasets incrementally, preventing system memory exhaustion during large-scale read and write operations. It also provides a chart generator to convert worksheet values or external data sources into visual representations within the spreadsheet.

Beyond core file processing, the project c
- [fast-crud/fast-crud](https://awesome-repositories.com/repository/fast-crud-fast-crud.md) (1,138 ⭐) — 面向配置的crud框架，开发crud 快如闪电，超级表格；Options-oriented crud framework,  develop crud as fast as lightning；based on vue3；super table
- [phpoffice/phpspreadsheet](https://awesome-repositories.com/repository/phpoffice-phpspreadsheet.md) (13,932 ⭐) — PhpSpreadsheet is a PHP library used for reading and writing spreadsheet files across various formats. It functions as a spreadsheet file generator and an Excel file parser, allowing for the programmatic creation and manipulation of documents compatible with software such as Microsoft Excel and LibreOffice Calc.

The library provides capabilities for programmatic spreadsheet generation and data extraction, enabling the conversion of data from spreadsheet files into programmable PHP formats. It also facilitates cross-format spreadsheet conversion, allowing data to be moved between different sta
- [ollama/ollama](https://awesome-repositories.com/repository/ollama-ollama.md) (174,300 ⭐) — Ollama provides a framework for running and managing local machine learning models. It includes a command-line interface for model lifecycle management, such as creation, embedding generation, and configuration, alongside a stable API for programmatic interaction across multiple programming languages.

The platform supports the import of models and adapters in various formats, including GGUF and Safetensors. Users can define custom model behaviors, prompt templates, and system messages through a configuration file format. It also offers tools for fine-tuning models with LoRA adapters and apply
- [jodosoft/libraries](https://awesome-repositories.com/repository/jodosoft-libraries.md) (12 ⭐) — Simple, reliable .NET libraries covering numbers, geometry and data structures