Why is donnemartin/data-science-ipython-notebooks a recommended Exploratory Data Analysis GitHub Repositories repository?

Provides techniques for cleaning and manipulating tabular data to visualize trends and extract statistical insights.

Why is saulpw/visidata a recommended Exploratory Data Analysis GitHub Repositories repository?

Provides tools for generating summary statistics, pivot tables, and frequency distributions to identify patterns in datasets.

Why is tidyverse/ggplot2 a recommended Exploratory Data Analysis GitHub Repositories repository?

Enables discovery of patterns and statistical insights through the creation of layered plots and faceted grids.

Why is hadley/ggplot2 a recommended Exploratory Data Analysis GitHub Repositories repository?

Facilitates the rapid generation of various plots to discover patterns and statistical insights in datasets.

Why is observablehq/plot a recommended Exploratory Data Analysis GitHub Repositories repository?

Provides an API for rapidly transforming tabular data into charts to discover patterns and statistical insights.

Why is man-group/dtale a recommended Exploratory Data Analysis GitHub Repositories repository?

Provides a visual interface for identifying patterns, outliers, and missing values in datasets.

Why is hadley/r4ds a recommended Exploratory Data Analysis GitHub Repositories repository?

Teaches the iterative process of manipulating and visualizing datasets to discover statistical patterns and insights.

Why is javascriptdata/danfojs a recommended Exploratory Data Analysis GitHub Repositories repository?

Provides tools for calculating descriptive statistics and generating charts to discover patterns in datasets.

13 Repos

Awesome GitHub RepositoriesExploratory Data Analysis

The process of cleaning and manipulating datasets to discover patterns and statistical insights.

Distinct from Automated Exploratory Analysis: Focuses on the manual exploratory process using pandas/NumPy, distinct from automated analysis frameworks.

Explore 13 awesome GitHub repositories matching data & databases · Exploratory Data Analysis. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

donnemartin/data-science-ipython-notebooks
donnemartin/data-science-ipython-notebooks
29,166Auf GitHub ansehen
This project is a collection of interactive Python notebooks and educational resources designed for mastering data science, machine learning, and numerical computing. It provides a series of practical guides and tutorials covering deep learning, big data processing, and statistical analysis. The repository features specialized instructional suites for implementing classical machine learning algorithms, building deep learning model architectures, and managing AWS cloud infrastructure. It includes dedicated notebooks for data visualization and numerical computing exercises. The project covers
Provides techniques for cleaning and manipulating tabular data to visualize trends and extract statistical insights.
Pythonawsbig-datacaffe
Auf GitHub ansehen29,166
saulpw/visidata
saulpw/visidata
8,834Auf GitHub ansehen
VisiData is a terminal-based interactive data analysis tool and browser designed for exploring, filtering, and sorting large tabular datasets. It functions as a structured data inspector that loads and flattens complex formats like JSON, XML, and PCAP into interactive sheets, as well as a terminal file manager for navigating directories and performing staged filesystem operations. The project distinguishes itself by rendering data visualizations, such as scatter plots and histograms, directly in the terminal using Unicode Braille characters. It provides a Python-based data wrangling environme
Provides tools for generating summary statistics, pivot tables, and frequency distributions to identify patterns in datasets.
Pythonclicsvdatajournalism
Auf GitHub ansehen8,834
jvns/pandas-cookbook
jvns/pandas-cookbook
7,086Auf GitHub ansehen
Dieses Projekt ist ein pandas-Datenanalyse-Kochbuch und ein Python-Data-Science-Leitfaden. Es bietet eine Sammlung programmatischer Rezepte und Beispiele für das Bereinigen, Manipulieren und Analysieren strukturierter Daten. Das Projekt konzentriert sich auf die Bereitstellung einer containerisierten Analyseumgebung, um einen konsistenten Arbeitsbereich und reproduzierbare Abhängigkeiten bei der Ausführung von Datenverarbeitungsskripten zu gewährleisten. Es deckt ein breites Spektrum an Data-Science-Fähigkeiten ab, einschließlich Datenaufnahme aus externen Quellen, Rohdatenbereinigung und explorativer Datenanalyse. Diese Rezepte demonstrieren, wie strukturierte Datenanalyse durch Techniken wie Filtern, Aggregieren gruppierter Daten und die Verarbeitung von Textdaten durchgeführt wird.
Uses pandas for cleaning and manipulating datasets to discover patterns and statistical insights.
Jupyter Notebook
Auf GitHub ansehen7,086
tidyverse/ggplot2
tidyverse/ggplot2
6,948Auf GitHub ansehen
ggplot2 is a data visualization library for R based on a formal grammar of graphics. It provides a declarative plotting framework that allows users to create complex graphics by combining geometric objects, statistical summaries, and coordinate systems. The system is distinguished by a layered approach to composition, where visualizations are built incrementally by stacking independent geometric, statistical, and coordinate layers. It utilizes a hierarchical styling engine to manage non-data elements such as backgrounds, fonts, and margins, and includes a multi-panel faceting tool for splitti
Enables discovery of patterns and statistical insights through the creation of layered plots and faceted grids.
R
Auf GitHub ansehen6,948
hadley/ggplot2
hadley/ggplot2
6,948Auf GitHub ansehen
ggplot2 is an R data visualization library and statistical graphics engine. It implements a grammar of graphics that functions as a declarative plotting framework, allowing users to specify what a plot should contain rather than how to draw it. The system builds visualizations by mapping data variables to visual aesthetics through a structured set of layering rules. This approach enables the composition of complex graphics by stacking independent components, such as geometric objects and scales, on top of a shared coordinate system. The framework supports scientific plotting and exploratory
Facilitates the rapid generation of various plots to discover patterns and statistical insights in datasets.
R
Auf GitHub ansehen6,948
willkoehrsen/data-analysis
WillKoehrsen/Data-Analysis
5,543Auf GitHub ansehen
Dieses Projekt ist eine Python-Bibliothek für Datenanalyse und ein Framework für explorative Datenanalyse, das für die Verarbeitung von Rohdatensätzen konzipiert ist. Es bietet eine Suite von Tools zur Untersuchung von Daten, zur Identifizierung von Anomalien und zur Anwendung statistischer Methoden, um Muster aufzudecken. Das Repository fungiert als Machine-Learning-Modellierungs-Toolkit und statistische Datenmodellierungssuite. Es enthält prädiktive Algorithmen und mathematische Modelle, die verwendet werden, um Beziehungen zwischen Datenvariablen zu analysieren und Erkenntnisse aus komplexen Datensätzen abzuleiten. Das Projekt deckt ein breites Spektrum an Funktionen ab, einschließlich Data Science, Machine-Learning-Modellierung und explorativer Datenanalyse. Diese werden durch Datenmanipulation, numerische Berechnung und Datenvisualisierung implementiert.
Provides a framework for cleaning and manipulating datasets to discover patterns and identify statistical anomalies.
Jupyter Notebook
Auf GitHub ansehen5,543
observablehq/plot
observablehq/plot
5,305Auf GitHub ansehen
Dies ist eine Visualisierungsbibliothek basierend auf der Grammar of Graphics, die verwendet wird, um Diagramme durch die Abbildung tabellarischer Daten auf visuelle Markierungen zu erstellen. Sie fungiert als SVG-Datenvisualisierungstool und API für explorative Datenanalyse, mit der Benutzer komplexe Visualisierungen und geografische Karten rendern können. Die Bibliothek verfügt über einen GeoJSON-Karten-Renderer, der sphärische Koordinaten in einen zweidimensionalen Pixelraum projiziert, sowie ein Apache-Arrow-Visualisierungsinterface für hocheffiziente Datenverarbeitung. Der Funktionsumfang umfasst Datentransformation durch Binning und Gruppierung, visuelle Kodierung durch automatische Skaleninferenz und Anwendung von Farbschemata sowie die Generierung von Small Multiples. Sie unterstützt das Rendern geometrischer Formen in geschichteten Ansichten und den Export statischer Bilder in serverseitigen Umgebungen.
Provides an API for rapidly transforming tabular data into charts to discover patterns and statistical insights.
HTMLchartsd3data-visualization
Auf GitHub ansehen5,305
man-group/dtale
man-group/dtale
5,170Auf GitHub ansehen
dtale is a web-based interactive grid and visualizer for pandas dataframes, designed as an exploratory data analysis tool. It provides a browser-based interface for analyzing tabular data structures, allowing users to calculate statistics, detect outliers, and compute correlations without writing manual code. The project functions as an embedded data viewer that can be integrated into web applications via iframes or custom routes, with specific support for Django, Flask, and Streamlit. It enables the exploration of datasets through a combination of an interactive data grid and a data visualiz
Provides a visual interface for identifying patterns, outliers, and missing values in datasets.
TypeScriptdata-analysisdata-sciencedata-visualization
Auf GitHub ansehen5,170
hadley/r4ds
hadley/r4ds
5,070Auf GitHub ansehen
r4ds ist ein Data-Science-Lehrplan und eine Bildungsressource, die für die Beherrschung der Programmiersprache R entwickelt wurde. Es bietet einen strukturierten Lernpfad für den End-to-End-Prozess des Importierens, Bereinigens, Transformierens und Visualisierens von Daten. Das Projekt betont einen Leitfaden für reproduzierbare Data Science und einen umfassenden Lehrplan für Data Wrangling. Es enthält spezialisierte Tutorials zur Grammatik der Grafik für geschichtete Datenvisualisierung sowie technische Publikationen, die mit Quarto erstellt wurden und ausführbaren Code mit erzählendem Text verbinden. Das Material deckt ein breites Spektrum analytischer Funktionen ab, einschließlich Datenaufnahme aus diversen Quellen, relationalem Daten-Joining und der Verwaltung kategorialer Variablen. Es behandelt zudem Datenbereinigung, mathematische Modellierung und die Erstellung professioneller Berichte und Präsentationen in verschiedenen Formaten. Der Lehrplan konzentriert sich auf die praktische Anwendung funktionaler Programmierung und Tidy-Data-Prinzipien, um transparente und wiederholbare Analysen zu erstellen.
Teaches the iterative process of manipulating and visualizing datasets to discover statistical patterns and insights.
R
Auf GitHub ansehen5,070
javascriptdata/danfojs
javascriptdata/danfojs
5,050Auf GitHub ansehen
Danfo.js ist eine Bibliothek für Datenanalyse und Vorverarbeitung für JavaScript, die leistungsstarke gelabelte Datenstrukturen bereitstellt. Sie implementiert Dataframes und Series, um komplexe Datenanalysen, statistische Berechnungen und die Manipulation strukturierter tabellarischer Daten zu ermöglichen. Das Projekt dient als Bibliothek für die Vorverarbeitung beim maschinellen Lernen und bietet Dienstprogramme für kategoriales Label-Encoding, One-Hot-Encoding sowie die Skalierung und Standardisierung numerischer Features. Es erleichtert insbesondere die Konvertierung gelabelter Datenstrukturen in Tensoren für das Modelltraining und die Evaluierung. Die Bibliothek deckt eine breite Palette an Funktionen ab, einschließlich deskriptiver Statistik, relationaler Operationen wie Merging und Joining sowie Zeitreihenverarbeitung. Sie enthält Tools für die Datenbereinigung, Filterung und Gruppierung sowie eine Visualisierungsschnittstelle zur Erstellung interaktiver Diagramme und Plots direkt aus Dataframes. Das System unterstützt den Import und Export von Daten über CSV-, JSON- und Excel-Formate.
Provides tools for calculating descriptive statistics and generating charts to discover patterns in datasets.
TypeScriptdanfojsdata-analysisdata-analytics
Auf GitHub ansehen5,050
nyandwi/machine_learning_complete
Nyandwi/machine_learning_complete
4,983Auf GitHub ansehen
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
Guides users through cleaning and manipulating datasets to discover patterns and optimize features for modeling.
Jupyter Notebookcomputer-visiondata-analysisdata-science
Auf GitHub ansehen4,983
residentmario/missingno
ResidentMario/missingno
4,209Auf GitHub ansehen
missingno ist eine Python-Bibliothek zur Visualisierung und Analyse von Mustern fehlender Daten. Sie bietet eine Reihe von Tools, um die Vollständigkeit von Datensätzen zu profilieren, Datenlücken abzubilden und das Volumen von Null-Werten über Variablen hinweg zu quantifizieren. Die Bibliothek zeichnet sich durch einen Nullity-Korrelations-Analyzer und ein hierarchisches Daten-Clustering-Tool aus. Diese Komponenten ermöglichen die Erkennung systemischer Abhängigkeiten und Trends, indem gemessen wird, wie das Fehlen einer Variable mit dem Fehlen einer anderen zusammenhängt. Das Toolset deckt breitere Funktionen für Data-Quality-Auditing und explorative Analysen ab. Es enthält Features zur Zusammenfassung der Spalten-Nullität mittels linearer und logarithmischer Skalen sowie matrixbasierte Mappings zur Identifizierung systemischer Lücken in Datensätzen.
Enables exploratory data analysis by visualizing the distribution and volume of null values.
Pythondata-analysisdata-visualizationmissing-data
Auf GitHub ansehen4,209
ibm/mcp-context-forge
IBM/mcp-context-forge
3,310Auf GitHub ansehen
mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified set of tools, prompts, and resources. The project distinguishes itself through a multi-protocol translation bridge that converts communication between standard I/O, SSE, gRPC, and REST to enable interoperability between disparate tool servers. It includes a comprehensive LLM evaluation framework for
Performs descriptive statistical analysis to identify data distributions and correlations.
Pythonagentsaiapi-gateway
Auf GitHub ansehen3,310