Why is pandas-dev/pandas a recommended Data Analysis Libraries GitHub Repositories repository?

Offers a comprehensive suite for cleaning and transforming structured data.

Why is wesm/pydata-book a recommended Data Analysis Libraries GitHub Repositories repository?

Implements high-performance tools for cleaning, transforming, and analyzing structured tabular datasets in memory.

Why is gventuri/pandas-ai a recommended Data Analysis Libraries GitHub Repositories repository?

Provides a library that uses large language models for conversational data analysis and querying on structured datasets.

Why is guipsamora/pandas_exercises a recommended Data Analysis Libraries GitHub Repositories repository?

Focuses on mastering high-level data analysis libraries for efficient manipulation of tabular datasets.

Why is statsmodels/statsmodels a recommended Data Analysis Libraries GitHub Repositories repository?

Models correlated data structures using generalized estimating equations for longitudinal analysis.

Why is willkoehrsen/data-analysis a recommended Data Analysis Libraries GitHub Repositories repository?

Provides a collection of scripts and tools for processing raw datasets and applying statistical methods.

7 repositorios

Awesome GitHub RepositoriesData Analysis Libraries

High-performance tools for cleaning and transforming structured datasets.

Distinguishing note: Focuses on in-memory data analysis rather than database engine operations.

Explore 7 awesome GitHub repositories matching data & databases · Data Analysis Libraries. Refine with filters or upvote what's useful.

Encuentra los mejores repositorios con IA.Buscaremos los repositorios que mejor coincidan usando IA.

pandas-dev/pandas
pandas-dev/pandas
49,039Ver en GitHub
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
Offers a comprehensive suite for cleaning and transforming structured data.
Pythonalignmentdata-analysisdata-science
Ver en GitHub49,039
wesm/pydata-book
wesm/pydata-book
24,668Ver en GitHub
This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis. The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
Implements high-performance tools for cleaning, transforming, and analyzing structured tabular datasets in memory.
Jupyter Notebook
Ver en GitHub24,668
gventuri/pandas-ai
gventuri/pandas-ai
23,587Ver en GitHub
Pandas AI is a data analysis library and natural language interface that uses large language models to perform conversational querying on structured datasets. It functions as a retrieval-augmented generation framework designed to translate plain text questions into executable code for extracting insights from dataframes and structured files. The system includes a dedicated sandbox execution environment that runs AI-generated analysis code within an isolated container to prevent security risks and system compromise. It employs a natural language translation layer and contextual retrieval to ma
Provides a library that uses large language models for conversational data analysis and querying on structured datasets.
Python
Ver en GitHub23,587
guipsamora/pandas_exercises
guipsamora/pandas_exercises
12,180Ver en GitHub
This repository is a collection of structured coding challenges designed to build proficiency in data manipulation, cleaning, and transformation using the Python data analysis library. It functions as a hands-on tutorial for learning how to process and analyze tabular datasets through a series of practical, real-world exercises. The project utilizes interactive documents that combine live code cells with narrative text, allowing users to execute data manipulation logic in a persistent environment. The content is organized into modular, progressive units that increase in complexity, enabling u
Focuses on mastering high-level data analysis libraries for efficient manipulation of tabular datasets.
Jupyter Notebookjupyter-notebookspandaspandas-tutorial
Ver en GitHub12,180
statsmodels/statsmodels
statsmodels/statsmodels
11,260Ver en GitHub
Statsmodels is a comprehensive Python library designed for statistical modeling, econometric research, and data analysis. It provides a robust framework for estimating and diagnosing a wide range of statistical models, enabling users to perform rigorous hypothesis testing, regression analysis, and complex data exploration within structured environments. The library distinguishes itself through its support for advanced statistical methodologies, including state space representation for dynamic systems and generalized linear frameworks that accommodate non-normal response variables. It offers s
Models correlated data structures using generalized estimating equations for longitudinal analysis.
Pythoncount-modeldata-analysisdata-science
Ver en GitHub11,260
willkoehrsen/data-analysis
WillKoehrsen/Data-Analysis
5,543Ver en GitHub
This project is a Python data analysis library and exploratory data analysis framework designed for processing raw datasets. It provides a suite of tools for examining data, identifying anomalies, and applying statistical methods to uncover patterns. The repository functions as a machine learning modeling toolkit and a statistical data modeling suite. It includes predictive algorithms and mathematical models used to analyze relationships between data variables and derive insights from complex datasets. The project covers a broad range of capabilities including data science, machine learning
Provides a collection of scripts and tools for processing raw datasets and applying statistical methods.
Jupyter Notebook
Ver en GitHub5,543
javascriptdata/danfojs
javascriptdata/danfojs
5,050Ver en GitHub
Danfo.js es una biblioteca de análisis y preprocesamiento de datos para JavaScript que proporciona estructuras de datos etiquetadas de alto rendimiento. Implementa dataframes y series para permitir análisis de datos complejos, computación estadística y la manipulación de datos tabulares estructurados. El proyecto sirve como una biblioteca de preprocesamiento para machine learning, ofreciendo utilidades para codificación de etiquetas categóricas, one-hot encoding y escalado y estandarización de características numéricas. Facilita específicamente la conversión de estructuras de datos etiquetadas en tensores para el entrenamiento y evaluación de modelos. La biblioteca cubre un amplio conjunto de capacidades, incluyendo estadísticas descriptivas, operaciones relacionales como merge y join, y procesamiento de series temporales. Incluye herramientas para limpieza, filtrado y agrupación de datos, así como una interfaz de visualización para generar gráficos interactivos directamente desde los dataframes. El sistema soporta la importación y exportación de datos mediante formatos CSV, JSON y Excel.
Serves as a high-performance library for cleaning and transforming structured datasets within JavaScript environments.
TypeScriptdanfojsdata-analysisdata-analytics
Ver en GitHub5,050

Awesome Data Analysis Libraries GitHub Repositories

pandas-dev/pandas

wesm/pydata-book

gventuri/pandas-ai

guipsamora/pandas_exercises

statsmodels/statsmodels

WillKoehrsen/Data-Analysis

javascriptdata/danfojs

Explorar subetiquetas