Why is awesome-selfhosted/awesome-selfhosted a recommended Data Analysis GitHub Repositories repository?

Performs systematic computational analysis of data to discover and interpret meaningful patterns.

Why is imartinez/privategpt a recommended Data Analysis GitHub Repositories repository?

Extracts structured insights from CSV files using a local model to ensure sensitive data remains offline.

Why is ottermind/chat2db a recommended Data Analysis GitHub Repositories repository?

Provides AI-driven analysis of spreadsheet files to extract patterns and insights using natural language processing.

Why is gventuri/pandas-ai a recommended Data Analysis GitHub Repositories repository?

Allows complex queries and comparisons across several data tables simultaneously to identify related insights.

Why is transformeroptimus/superagi a recommended Data Analysis GitHub Repositories repository?

Provides tools for examining campaign metrics to identify patterns and inform marketing strategy.

Why is davisking/dlib a recommended Data Analysis GitHub Repositories repository?

Offers a specialized toolkit of algorithms for processing and evaluating large-scale complex datasets.

Why is data-centric-ai-community/ydata-profiling a recommended Data Analysis GitHub Repositories repository?

Automates the statistical summary and visualization of tabular datasets to identify patterns and quality issues.

Why is ydataai/pandas-profiling a recommended Data Analysis GitHub Repositories repository?

Provides a framework that automatically generates statistical summaries and visual insights from tabular datasets.

Why is data-centric-ai-community/fg-data-profiling a recommended Data Analysis GitHub Repositories repository?

Automatically generates statistical summaries and visual insights to discover patterns and anomalies in new datasets.

Why is pandas-profiling/pandas-profiling a recommended Data Analysis GitHub Repositories repository?

Automatically generates statistical summaries and visual insights to facilitate the initial investigation of datasets.

26 dépôts

Awesome GitHub RepositoriesData Analysis

Tools for examining metrics to identify patterns and inform strategy.

Distinguishing note: Focuses on strategic data analysis rather than raw data processing.

Explore 26 awesome GitHub repositories matching data & databases · Data Analysis. Refine with filters or upvote what's useful.

Trouvez les meilleurs dépôts grâce à l'IA.Nous recherchons les dépôts les plus pertinents grâce à l'IA.

awesome-selfhosted/awesome-selfhosted
awesome-selfhosted/awesome-selfhosted
299,516Voir sur GitHub
Ce projet est un répertoire de logiciels open source organisé par la communauté, conçu pour être déployé dans des environnements de serveurs privés et des laboratoires domestiques. Il sert de ressource complète pour découvrir des alternatives indépendantes et auto-hébergées aux services cloud grand public, permettant aux utilisateurs de conserver la pleine propriété des données et le contrôle de leur infrastructure numérique. Le répertoire est structuré par une taxonomie hiérarchique qui organise une vaste collection d'applications en catégories logiques, allant de la gestion multimédia et de l'analyse de données à la communication privée et aux outils de productivité d'équipe. Il se distingue par un processus de revue par les pairs collaboratif, où les membres de la communauté valident la qualité et la pertinence de chaque soumission pour garantir que le répertoire reste précis et fiable. Le projet couvre une large surface de capacités, notamment l'automatisation de l'infrastructure, le déploiement de services basés sur des conteneurs et la gestion de configuration déclarative. Ces outils aident les utilisateurs à maintenir des environnements de serveur reproductibles et à gérer des dépendances de services complexes sur du matériel privé. Le répertoire est maintenu en tant que dépôt contrôlé par version, garantissant que toutes les mises à jour et les changements pilotés par la communauté sont suivis et transparents.
Performs systematic computational analysis of data to discover and interpret meaningful patterns.
awesomeawesome-listcloud
Voir sur GitHub299,516
imartinez/privategpt
imartinez/privateGPT
57,281Voir sur GitHub
PrivateGPT is a private AI document assistant and local knowledge base manager designed for querying private files and documents using retrieval-augmented generation. It functions as a local language model application and API gateway, allowing users to obtain cited answers from unstructured data without sending information to external servers. The system differentiates itself by acting as a tool integrator that connects language models to external functions, including web search, tabular data analysis, and custom action extensions. It provides a standardized API layer that allows local infere
Extracts structured insights from CSV files using a local model to ensure sensitive data remains offline.
Python
Voir sur GitHub57,281
ottermind/chat2db
OtterMind/Chat2DB
25,784Voir sur GitHub
Chat2DB is an AI-powered SQL client and multi-database GUI manager designed for managing various relational and NoSQL database systems. It serves as a visual database management tool and a natural language to SQL interface, allowing users to convert plain text descriptions into executable and optimized queries. The platform distinguishes itself through automated business intelligence capabilities, which include the generation of real-time data visualization dashboards and AI-driven data analysis from spreadsheets. To ensure data privacy, it supports secure local AI deployment, enabling large
Provides AI-driven analysis of spreadsheet files to extract patterns and insights using natural language processing.
Javaaibichatgpt
Voir sur GitHub25,784
gventuri/pandas-ai
gventuri/pandas-ai
23,587Voir sur GitHub
Pandas AI is a data analysis library and natural language interface that uses large language models to perform conversational querying on structured datasets. It functions as a retrieval-augmented generation framework designed to translate plain text questions into executable code for extracting insights from dataframes and structured files. The system includes a dedicated sandbox execution environment that runs AI-generated analysis code within an isolated container to prevent security risks and system compromise. It employs a natural language translation layer and contextual retrieval to ma
Allows complex queries and comparisons across several data tables simultaneously to identify related insights.
Python
Voir sur GitHub23,587
transformeroptimus/superagi
TransformerOptimus/SuperAGI
17,572Voir sur GitHub
SuperAGI is a comprehensive marketing automation platform and customer data system designed to orchestrate multi-channel engagement workflows. It functions as a no-code workflow orchestrator, allowing users to build complex, automated task sequences triggered by real-time user behavior, transactional data, or scheduled events. By centralizing customer profiles and interaction history, the platform enables businesses to manage end-to-end marketing operations from a single interface. The platform distinguishes itself through its deep integration with e-commerce storefronts and its ability to ex
Provides tools for examining campaign metrics to identify patterns and inform marketing strategy.
Pythonagentsagiai
Voir sur GitHub17,572
davisking/dlib
davisking/dlib
14,399Voir sur GitHub
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
Offers a specialized toolkit of algorithms for processing and evaluating large-scale complex datasets.
C++c-plus-pluscomputer-visiondeep-learning
Voir sur GitHub14,399
data-centric-ai-community/ydata-profiling
Data-Centric-AI-Community/ydata-profiling
13,618Voir sur GitHub
This library provides a diagnostic toolkit for automated data profiling and exploratory analysis. It generates comprehensive statistical summaries and visual reports for tabular datasets, enabling users to identify distribution patterns, missing values, and quality anomalies through a unified interface. The project distinguishes itself by offering differential analysis, which allows for the comparison of two dataset versions to track structural and statistical changes over time. It supports large-scale data processing through lazy evaluation and provides interactive widgets that embed directl
Automates the statistical summary and visualization of tabular datasets to identify patterns and quality issues.
Python
Voir sur GitHub13,618
ydataai/pandas-profiling
ydataai/pandas-profiling
13,610Voir sur GitHub
This project is an exploratory data analysis framework and profiling tool designed to generate comprehensive statistical reports from Pandas and Spark DataFrames. It functions as a data quality profiler that identifies missing values, duplicates, and high correlations within tabular datasets. The tool distinguishes itself through specialized capabilities for time-series analysis, extracting temporal statistics, seasonality, and auto-correlation plots. It also includes a dataset comparison utility to identify structural or content changes between different versions of a dataset. The analysis
Provides a framework that automatically generates statistical summaries and visual insights from tabular datasets.
Python
Voir sur GitHub13,610
data-centric-ai-community/fg-data-profiling
Data-Centric-AI-Community/fg-data-profiling
13,609Voir sur GitHub
This project is a data profiling and exploratory data analysis tool designed to generate automated quality reports for Pandas and Spark dataframes. It serves as a system for computing descriptive statistics, identifying correlations, and analyzing univariate and multivariate data patterns. The tool provides specialized capabilities for comparing different versions of datasets to identify changes in data quality and distributions. It includes a dedicated profiler for time-dependent data to extract statistical information such as seasonality and auto-correlation. The software covers a broad an
Automatically generates statistical summaries and visual insights to discover patterns and anomalies in new datasets.
Python
Voir sur GitHub13,609
pandas-profiling/pandas-profiling
pandas-profiling/pandas-profiling
13,609Voir sur GitHub
This project is an exploratory data analysis library and profiling tool for Pandas and Spark DataFrames. It automates the initial investigation of datasets by generating comprehensive descriptive analysis reports, statistical summaries, and data quality warnings. The system functions as a data quality profiler to detect missing values, duplicate rows, and type inconsistencies. It includes a dataset comparison tool for identifying structural and content shifts between different versions of the same data, as well as specialized tools for time-series analysis to calculate auto-correlation and se
Automatically generates statistical summaries and visual insights to facilitate the initial investigation of datasets.
Python
Voir sur GitHub13,609
ydataai/ydata-profiling
ydataai/ydata-profiling
13,388Voir sur GitHub
Ydata-profiling is an automated exploratory data analysis framework designed to generate comprehensive statistical reports and visual summaries from dataframes. It functions as a diagnostic tool for assessing data quality, identifying missing values, duplicates, and outliers, while providing a scalable engine for profiling massive datasets across distributed enterprise environments. The project distinguishes itself through its ability to handle large-scale data through distributed task orchestration and lazy stream processing, which minimizes memory overhead during complex computations. It in
Provides an automated framework for discovering data distributions, correlations, and quality issues within large datasets.
Pythonbig-data-analyticsdata-analysisdata-exploration
Voir sur GitHub13,388
dbcli/pgcli
dbcli/pgcli
13,231Voir sur GitHub
pgcli is an interactive command-line interface and database management tool for PostgreSQL. It functions as an interactive SQL shell and query editor that allows users to inspect schemas, manage connections, and run queries against PostgreSQL data sources. The tool is distinguished by its real-time, schema-aware autocompletion for keywords, tables, and columns, as well as dynamic SQL syntax highlighting. It provides safety mechanisms through transaction-aware guardrails that warn against or block destructive statements when no active transaction is detected. Broad capabilities include secure
Facilitates quick data inspection by formatting query results into readable tables for analysis.
Pythondatabasepostgrespostgresql
Voir sur GitHub13,231
microsoft/vscode-copilot-chat
microsoft/vscode-copilot-chat
9,493Voir sur GitHub
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Automatically generates a complete data analysis workflow, including notebook scaffolding and visualization code.
TypeScript
Voir sur GitHub9,493
growthbook/growthbook
growthbook/growthbook
7,351Voir sur GitHub
GrowthBook is a feature flagging and experimentation platform that utilizes a warehouse-native approach to data analysis. It serves as a system for managing feature rollouts and conducting A/B tests by executing SQL queries directly against existing data warehouses to calculate experiment results. The platform is distinguished by its integration of a Model Context Protocol server, which allows AI coding assistants and IDEs to manage flags and query analytics using natural language. It also provides specialized capabilities for AI model optimization, enabling the testing of prompts and models
Generates automated drill-down analyses for a single metric across multiple dimensions.
TypeScriptab-testingabtestabtesting
Voir sur GitHub7,351
jvns/pandas-cookbook
jvns/pandas-cookbook
7,086Voir sur GitHub
Ce projet est un livre de recettes d'analyse de données pandas et un guide de science des données Python. Il fournit une collection de recettes programmatiques et d'exemples pour nettoyer, manipuler et analyser des données structurées. Le projet se concentre sur la fourniture d'un environnement d'analyse conteneurisé pour assurer un espace de travail cohérent et des dépendances reproductibles lors de l'exécution de scripts de traitement de données. Il couvre un large éventail de capacités en science des données, y compris l'ingestion de données à partir de sources externes, le nettoyage de données brutes et l'analyse exploratoire des données. Ces recettes démontrent comment effectuer une analyse de données structurées via des techniques telles que le filtrage, l'agrégation de données groupées et le traitement de données textuelles.
Provides tools for examining real-world datasets to identify patterns and extract meaningful insights.
Jupyter Notebook
Voir sur GitHub7,086
j3ssie/osmedeus
j3ssie/Osmedeus
6,425Voir sur GitHub
Osmedeus is a security workflow orchestration engine that coordinates AI agents, shell commands, and scanning tools through declarative YAML pipelines. It functions as a distributed security scanner, a declarative workflow automator, and an AI agent framework for security, enabling automated multi-step security analysis with conditional branching, parallel execution, and distributed workers. The engine distinguishes itself through a hybrid runner model that executes workflow steps on the local host, inside Docker containers, or over SSH to remote machines, selected per step or module. It supp
Sends prompts to language models and exports generated analysis for use in subsequent workflow steps.
Go
Voir sur GitHub6,425
lux-org/lux
lux-org/lux
5,380Voir sur GitHub
Lux est un outil d'analyse exploratoire de données automatisé conçu pour générer des représentations visuelles intelligentes de dataframes pandas. Il identifie les modèles et les tendances en recommandant les types de graphiques et les mappings d'axes optimaux basés sur les attributs statistiques d'un jeu de données. L'outil fonctionne comme une couche de profilage de données interactive qui permet aux utilisateurs de parcourir et d'interroger des collections de graphiques en utilisant des filtres et des wildcards. Il sert également de générateur de code de visualisation, traduisant les graphiques produits automatiquement en code programmatique ou en HTML pour un affinement manuel dans des bibliothèques externes. Le système couvre un large éventail de capacités d'analyse exploratoire, notamment l'encodage automatique de graphiques, la découverte guidée via des recommandations d'étapes, et la capacité d'exporter des configurations visuelles sous forme de spécifications déclaratives. Ce projet s'intègre directement dans pandas pour remplacer l'affichage par défaut des dataframes par des composants de visualisation interactifs.
Automates the exploratory data analysis process by recommending optimal chart types and axis mappings based on dataset attributes.
Python
Voir sur GitHub5,380
nyandwi/machine_learning_complete
Nyandwi/machine_learning_complete
4,983Voir sur GitHub
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
Implements a complete suite for examining metrics and identifying patterns through data analysis.
Jupyter Notebookcomputer-visiondata-analysisdata-science
Voir sur GitHub4,983
observedobserver/visual-insights
ObservedObserver/visual-insights
4,653Voir sur GitHub
Visual Insights est une plateforme d'analyse exploratoire de données automatisée et un outil d'inférence causale conçu pour découvrir des modèles et des relations de cause à effet au sein des jeux de données. Il fonctionne comme une bibliothèque de visualisation de données interactive utilisant une approche de grammaire graphique pour générer des graphiques et des tableaux de bord multidimensionnels. Le projet se distingue par une interface en langage naturel qui traduit les questions en texte brut en réponses de données et visualisations via un modèle de langage. Il fournit un framework spécialisé pour la découverte et l'inférence causales, permettant aux utilisateurs d'identifier les liens entre variables via des graphes causaux interactifs et d'effectuer des analyses de type « et si » pour valider des hypothèses. La plateforme couvre un large éventail de capacités, incluant le nettoyage visuel des données, le profilage statistique et la transformation automatisée des jeux de données. Elle prend en charge l'intégration de données diverses provenant de fichiers locaux et de bases de données distantes, et dispose d'un moteur de traitement haute performance pour gérer de grands jeux de données localement. De plus, le système permet l'intégration de composants d'analyse interactifs dans des applications web et des notebooks.
Discovers patterns and trends in unfamiliar datasets using automated agents to generate multi-dimensional visualizations.
TypeScript
Voir sur GitHub4,653
iflow-ai/iflow-cli
iflow-ai/iflow-cli
4,609Voir sur GitHub
iflow-cli is a command-line interface and suite of AI tools designed for software engineering, workflow orchestration, and multimodal data analysis. It functions as an LLM command line interface that enables users to execute AI workflows, analyze codebase structures, and interact with large language models directly from the terminal. The project features a plugin-based agent architecture that allows for the integration of specialized domain experts and custom instruction sets from an external marketplace. It distinguishes itself through a multimodal AI terminal capable of processing visual da
Extracts information from spreadsheets to merge data into tables or generate visual charts.
Shell
Voir sur GitHub4,609

Awesome Data Analysis GitHub Repositories

awesome-selfhosted/awesome-selfhosted

imartinez/privateGPT

OtterMind/Chat2DB

gventuri/pandas-ai

TransformerOptimus/SuperAGI

davisking/dlib

Data-Centric-AI-Community/ydata-profiling

ydataai/pandas-profiling

Data-Centric-AI-Community/fg-data-profiling

pandas-profiling/pandas-profiling

ydataai/ydata-profiling

dbcli/pgcli

microsoft/vscode-copilot-chat

growthbook/growthbook

jvns/pandas-cookbook

j3ssie/Osmedeus

lux-org/lux

Nyandwi/machine_learning_complete

ObservedObserver/visual-insights

iflow-ai/iflow-cli

Explorer les sous-tags