5 个仓库
Two-dimensional labeled data structures with ordered columns sharing a common index.
Distinct from DataFrame Analysis: Existing candidates focus on exporting, integrating, or analyzing dataframes rather than the core construction of the structure itself.
Explore 5 awesome GitHub repositories matching data & databases · Tabular DataFrames. Refine with filters or upvote what's useful.
This library provides a diagnostic toolkit for automated data profiling and exploratory analysis. It generates comprehensive statistical summaries and visual reports for tabular datasets, enabling users to identify distribution patterns, missing values, and quality anomalies through a unified interface. The project distinguishes itself by offering differential analysis, which allows for the comparison of two dataset versions to track structural and statistical changes over time. It supports large-scale data processing through lazy evaluation and provides interactive widgets that embed directl
Normalizes access to tabular data structures through a consistent API for statistical analysis.
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
Constructs two-dimensional labeled table structures with ordered columns sharing a common index.
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
Constructs and manipulates tabular data through a lazy DataFrame API with filtering, aggregation, and joins.
这是一个 pandas 数据分析实战手册和 Python 数据科学指南。它提供了一系列用于清理、操作和分析结构化数据的编程配方和示例。 该项目专注于提供容器化的分析环境,以确保在执行数据处理脚本时拥有一致的工作空间和可复现的依赖项。 它涵盖了广泛的数据科学功能,包括从外部源进行数据摄取、原始数据清理和探索性数据分析。这些配方演示了如何通过过滤、聚合分组数据和处理文本数据等技术进行结构化数据分析。
Implements data modeling using tabular DataFrames with labeled axes for efficient indexing and slicing.
This repository serves as an educational resource and structured curriculum for performing statistical analysis using Python. It provides a comprehensive guide to the scientific computing workflow, focusing on the practical application of data cleaning, numerical modeling, and distribution visualization. The tutorial covers the end-to-end process of transforming raw tabular data into actionable insights. It demonstrates how to manipulate structured datasets through merging and aggregation, perform descriptive and inferential statistical calculations, and fit regression models to evaluate rela
Organizes structured information into labeled rows and columns to facilitate complex filtering, merging, and statistical aggregation.