Why is pandas-dev/pandas a recommended Data Analysis Libraries GitHub Repositories repository?

Offers a comprehensive suite for cleaning and transforming structured data.

Why is wesm/pydata-book a recommended Data Analysis Libraries GitHub Repositories repository?

Implements high-performance tools for cleaning, transforming, and analyzing structured tabular datasets in memory.

Why is gventuri/pandas-ai a recommended Data Analysis Libraries GitHub Repositories repository?

Provides a library that uses large language models for conversational data analysis and querying on structured datasets.

Why is guipsamora/pandas_exercises a recommended Data Analysis Libraries GitHub Repositories repository?

Focuses on mastering high-level data analysis libraries for efficient manipulation of tabular datasets.

Why is statsmodels/statsmodels a recommended Data Analysis Libraries GitHub Repositories repository?

Models correlated data structures using generalized estimating equations for longitudinal analysis.

Why is willkoehrsen/data-analysis a recommended Data Analysis Libraries GitHub Repositories repository?

Provides a collection of scripts and tools for processing raw datasets and applying statistical methods.

7 dépôts

Awesome GitHub RepositoriesData Analysis Libraries

High-performance tools for cleaning and transforming structured datasets.

Distinguishing note: Focuses on in-memory data analysis rather than database engine operations.

Explore 7 awesome GitHub repositories matching data & databases · Data Analysis Libraries. Refine with filters or upvote what's useful.

Trouvez les meilleurs dépôts grâce à l'IA.Nous recherchons les dépôts les plus pertinents grâce à l'IA.

pandas-dev/pandas
pandas-dev/pandas
49,039Voir sur GitHub
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
Offers a comprehensive suite for cleaning and transforming structured data.
Pythonalignmentdata-analysisdata-science
Voir sur GitHub49,039
wesm/pydata-book
wesm/pydata-book
24,668Voir sur GitHub
This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis. The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
Implements high-performance tools for cleaning, transforming, and analyzing structured tabular datasets in memory.
Jupyter Notebook
Voir sur GitHub24,668
gventuri/pandas-ai
gventuri/pandas-ai
23,587Voir sur GitHub
Pandas AI is a data analysis library and natural language interface that uses large language models to perform conversational querying on structured datasets. It functions as a retrieval-augmented generation framework designed to translate plain text questions into executable code for extracting insights from dataframes and structured files. The system includes a dedicated sandbox execution environment that runs AI-generated analysis code within an isolated container to prevent security risks and system compromise. It employs a natural language translation layer and contextual retrieval to ma
Provides a library that uses large language models for conversational data analysis and querying on structured datasets.
Python
Voir sur GitHub23,587
guipsamora/pandas_exercises
guipsamora/pandas_exercises
12,180Voir sur GitHub
This repository is a collection of structured coding challenges designed to build proficiency in data manipulation, cleaning, and transformation using the Python data analysis library. It functions as a hands-on tutorial for learning how to process and analyze tabular datasets through a series of practical, real-world exercises. The project utilizes interactive documents that combine live code cells with narrative text, allowing users to execute data manipulation logic in a persistent environment. The content is organized into modular, progressive units that increase in complexity, enabling u
Focuses on mastering high-level data analysis libraries for efficient manipulation of tabular datasets.
Jupyter Notebookjupyter-notebookspandaspandas-tutorial
Voir sur GitHub12,180
statsmodels/statsmodels
statsmodels/statsmodels
11,260Voir sur GitHub
Statsmodels is a comprehensive Python library designed for statistical modeling, econometric research, and data analysis. It provides a robust framework for estimating and diagnosing a wide range of statistical models, enabling users to perform rigorous hypothesis testing, regression analysis, and complex data exploration within structured environments. The library distinguishes itself through its support for advanced statistical methodologies, including state space representation for dynamic systems and generalized linear frameworks that accommodate non-normal response variables. It offers s
Models correlated data structures using generalized estimating equations for longitudinal analysis.
Pythoncount-modeldata-analysisdata-science
Voir sur GitHub11,260
willkoehrsen/data-analysis
WillKoehrsen/Data-Analysis
5,543Voir sur GitHub
Ce projet est une bibliothèque d'analyse de données Python et un framework d'analyse exploratoire de données conçu pour traiter des jeux de données bruts. Il fournit une suite d'outils pour examiner les données, identifier les anomalies et appliquer des méthodes statistiques pour découvrir des modèles. Le dépôt fonctionne comme une boîte à outils de modélisation de machine learning et une suite de modélisation statistique de données. Il inclut des algorithmes prédictifs et des modèles mathématiques utilisés pour analyser les relations entre les variables de données et tirer des enseignements de jeux de données complexes. Le projet couvre un large éventail de capacités, notamment la science des données, la modélisation par machine learning et l'analyse exploratoire de données. Celles-ci sont implémentées via la manipulation de données, le calcul numérique et la visualisation de données.
Provides a collection of scripts and tools for processing raw datasets and applying statistical methods.
Jupyter Notebook
Voir sur GitHub5,543
javascriptdata/danfojs
javascriptdata/danfojs
5,050Voir sur GitHub
Danfo.js est une bibliothèque d'analyse et de prétraitement de données pour JavaScript qui fournit des structures de données étiquetées haute performance. Elle implémente des dataframes et des séries pour permettre une analyse de données complexe, le calcul statistique et la manipulation de données tabulaires structurées. Le projet sert de bibliothèque de prétraitement pour le machine learning, offrant des utilitaires pour l'encodage d'étiquettes catégorielles, l'encodage one-hot, ainsi que la mise à l'échelle et la standardisation des caractéristiques numériques. Elle facilite spécifiquement la conversion de structures de données étiquetées en tenseurs pour l'entraînement et l'évaluation de modèles. La bibliothèque couvre un large ensemble de capacités incluant les statistiques descriptives, les opérations relationnelles comme la fusion et la jointure, et le traitement de séries temporelles. Elle inclut des outils pour le nettoyage, le filtrage et le regroupement de données, ainsi qu'une interface de visualisation pour générer des graphiques interactifs directement à partir des dataframes. Le système prend en charge l'importation et l'exportation de données via les formats CSV, JSON et Excel.
Serves as a high-performance library for cleaning and transforming structured datasets within JavaScript environments.
TypeScriptdanfojsdata-analysisdata-analytics
Voir sur GitHub5,050

Awesome Data Analysis Libraries GitHub Repositories

pandas-dev/pandas

wesm/pydata-book

gventuri/pandas-ai

guipsamora/pandas_exercises

statsmodels/statsmodels

WillKoehrsen/Data-Analysis

javascriptdata/danfojs

Explorer les sous-tags