7 repositorios
High-performance tools for cleaning and transforming structured datasets.
Distinguishing note: Focuses on in-memory data analysis rather than database engine operations.
Explore 7 awesome GitHub repositories matching data & databases · Data Analysis Libraries. Refine with filters or upvote what's useful.
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
Offers a comprehensive suite for cleaning and transforming structured data.
This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis. The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
Implements high-performance tools for cleaning, transforming, and analyzing structured tabular datasets in memory.
Pandas AI is a data analysis library and natural language interface that uses large language models to perform conversational querying on structured datasets. It functions as a retrieval-augmented generation framework designed to translate plain text questions into executable code for extracting insights from dataframes and structured files. The system includes a dedicated sandbox execution environment that runs AI-generated analysis code within an isolated container to prevent security risks and system compromise. It employs a natural language translation layer and contextual retrieval to ma
Provides a library that uses large language models for conversational data analysis and querying on structured datasets.
This repository is a collection of structured coding challenges designed to build proficiency in data manipulation, cleaning, and transformation using the Python data analysis library. It functions as a hands-on tutorial for learning how to process and analyze tabular datasets through a series of practical, real-world exercises. The project utilizes interactive documents that combine live code cells with narrative text, allowing users to execute data manipulation logic in a persistent environment. The content is organized into modular, progressive units that increase in complexity, enabling u
Focuses on mastering high-level data analysis libraries for efficient manipulation of tabular datasets.
Statsmodels is a comprehensive Python library designed for statistical modeling, econometric research, and data analysis. It provides a robust framework for estimating and diagnosing a wide range of statistical models, enabling users to perform rigorous hypothesis testing, regression analysis, and complex data exploration within structured environments. The library distinguishes itself through its support for advanced statistical methodologies, including state space representation for dynamic systems and generalized linear frameworks that accommodate non-normal response variables. It offers s
Models correlated data structures using generalized estimating equations for longitudinal analysis.
This project is a Python data analysis library and exploratory data analysis framework designed for processing raw datasets. It provides a suite of tools for examining data, identifying anomalies, and applying statistical methods to uncover patterns. The repository functions as a machine learning modeling toolkit and a statistical data modeling suite. It includes predictive algorithms and mathematical models used to analyze relationships between data variables and derive insights from complex datasets. The project covers a broad range of capabilities including data science, machine learning
Provides a collection of scripts and tools for processing raw datasets and applying statistical methods.
Danfo.js es una biblioteca de análisis y preprocesamiento de datos para JavaScript que proporciona estructuras de datos etiquetadas de alto rendimiento. Implementa dataframes y series para permitir análisis de datos complejos, computación estadística y la manipulación de datos tabulares estructurados. El proyecto sirve como una biblioteca de preprocesamiento para machine learning, ofreciendo utilidades para codificación de etiquetas categóricas, one-hot encoding y escalado y estandarización de características numéricas. Facilita específicamente la conversión de estructuras de datos etiquetadas en tensores para el entrenamiento y evaluación de modelos. La biblioteca cubre un amplio conjunto de capacidades, incluyendo estadísticas descriptivas, operaciones relacionales como merge y join, y procesamiento de series temporales. Incluye herramientas para limpieza, filtrado y agrupación de datos, así como una interfaz de visualización para generar gráficos interactivos directamente desde los dataframes. El sistema soporta la importación y exportación de datos mediante formatos CSV, JSON y Excel.
Serves as a high-performance library for cleaning and transforming structured datasets within JavaScript environments.