Why is awesome-selfhosted/awesome-selfhosted a recommended Data Analysis GitHub Repositories repository?

Performs systematic computational analysis of data to discover and interpret meaningful patterns.

Why is imartinez/privategpt a recommended Data Analysis GitHub Repositories repository?

Extracts structured insights from CSV files using a local model to ensure sensitive data remains offline.

Why is ottermind/chat2db a recommended Data Analysis GitHub Repositories repository?

Provides AI-driven analysis of spreadsheet files to extract patterns and insights using natural language processing.

Why is gventuri/pandas-ai a recommended Data Analysis GitHub Repositories repository?

Allows complex queries and comparisons across several data tables simultaneously to identify related insights.

Why is transformeroptimus/superagi a recommended Data Analysis GitHub Repositories repository?

Provides tools for examining campaign metrics to identify patterns and inform marketing strategy.

Why is davisking/dlib a recommended Data Analysis GitHub Repositories repository?

Offers a specialized toolkit of algorithms for processing and evaluating large-scale complex datasets.

Why is data-centric-ai-community/ydata-profiling a recommended Data Analysis GitHub Repositories repository?

Automates the statistical summary and visualization of tabular datasets to identify patterns and quality issues.

Why is ydataai/pandas-profiling a recommended Data Analysis GitHub Repositories repository?

Provides a framework that automatically generates statistical summaries and visual insights from tabular datasets.

Why is data-centric-ai-community/fg-data-profiling a recommended Data Analysis GitHub Repositories repository?

Automatically generates statistical summaries and visual insights to discover patterns and anomalies in new datasets.

Why is pandas-profiling/pandas-profiling a recommended Data Analysis GitHub Repositories repository?

Automatically generates statistical summaries and visual insights to facilitate the initial investigation of datasets.

26 repository-uri

Awesome GitHub RepositoriesData Analysis

Tools for examining metrics to identify patterns and inform strategy.

Distinguishing note: Focuses on strategic data analysis rather than raw data processing.

Explore 26 awesome GitHub repositories matching data & databases · Data Analysis. Refine with filters or upvote what's useful.

Găsește cele mai bune repo-uri cu AI.Vom căuta cele mai potrivite repository-uri folosind AI.

awesome-selfhosted/awesome-selfhosted
awesome-selfhosted/awesome-selfhosted
299,516Vezi pe GitHub
Acest proiect este un director curatoriat de comunitate cu software open-source conceput pentru implementarea în medii de server private și laboratoare de acasă (home labs). Servește drept resursă cuprinzătoare pentru descoperirea alternativelor independente, auto-găzduite, la serviciile cloud mainstream, permițând utilizatorilor să mențină proprietatea deplină a datelor și controlul asupra infrastructurii lor digitale. Directorul este structurat printr-o taxonomie ierarhică ce organizează o colecție vastă de aplicații în categorii logice, variind de la gestionarea media și analiza datelor la comunicare privată și instrumente de productivitate în echipă. Se distinge printr-un proces colaborativ de peer-review, unde membrii comunității validează calitatea și relevanța fiecărei trimiteri pentru a se asigura că directorul rămâne precis și fiabil. Proiectul acoperă o suprafață largă de capabilități, inclusiv automatizarea infrastructurii, implementarea serviciilor bazate pe containere și gestionarea configurației declarative. Aceste instrumente ajută utilizatorii să mențină medii de server reproductibile și să gestioneze dependențele complexe ale serviciilor pe hardware privat. Directorul este menținut ca un repository controlat prin versiuni, asigurându-se că toate actualizările și modificările conduse de comunitate sunt urmărite și transparente.
Performs systematic computational analysis of data to discover and interpret meaningful patterns.
awesomeawesome-listcloud
Vezi pe GitHub299,516
imartinez/privategpt
imartinez/privateGPT
57,281Vezi pe GitHub
PrivateGPT is a private AI document assistant and local knowledge base manager designed for querying private files and documents using retrieval-augmented generation. It functions as a local language model application and API gateway, allowing users to obtain cited answers from unstructured data without sending information to external servers. The system differentiates itself by acting as a tool integrator that connects language models to external functions, including web search, tabular data analysis, and custom action extensions. It provides a standardized API layer that allows local infere
Extracts structured insights from CSV files using a local model to ensure sensitive data remains offline.
Python
Vezi pe GitHub57,281
ottermind/chat2db
OtterMind/Chat2DB
25,784Vezi pe GitHub
Chat2DB is an AI-powered SQL client and multi-database GUI manager designed for managing various relational and NoSQL database systems. It serves as a visual database management tool and a natural language to SQL interface, allowing users to convert plain text descriptions into executable and optimized queries. The platform distinguishes itself through automated business intelligence capabilities, which include the generation of real-time data visualization dashboards and AI-driven data analysis from spreadsheets. To ensure data privacy, it supports secure local AI deployment, enabling large
Provides AI-driven analysis of spreadsheet files to extract patterns and insights using natural language processing.
Javaaibichatgpt
Vezi pe GitHub25,784
gventuri/pandas-ai
gventuri/pandas-ai
23,587Vezi pe GitHub
Pandas AI is a data analysis library and natural language interface that uses large language models to perform conversational querying on structured datasets. It functions as a retrieval-augmented generation framework designed to translate plain text questions into executable code for extracting insights from dataframes and structured files. The system includes a dedicated sandbox execution environment that runs AI-generated analysis code within an isolated container to prevent security risks and system compromise. It employs a natural language translation layer and contextual retrieval to ma
Allows complex queries and comparisons across several data tables simultaneously to identify related insights.
Python
Vezi pe GitHub23,587
transformeroptimus/superagi
TransformerOptimus/SuperAGI
17,572Vezi pe GitHub
SuperAGI is a comprehensive marketing automation platform and customer data system designed to orchestrate multi-channel engagement workflows. It functions as a no-code workflow orchestrator, allowing users to build complex, automated task sequences triggered by real-time user behavior, transactional data, or scheduled events. By centralizing customer profiles and interaction history, the platform enables businesses to manage end-to-end marketing operations from a single interface. The platform distinguishes itself through its deep integration with e-commerce storefronts and its ability to ex
Provides tools for examining campaign metrics to identify patterns and inform marketing strategy.
Pythonagentsagiai
Vezi pe GitHub17,572
davisking/dlib
davisking/dlib
14,399Vezi pe GitHub
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
Offers a specialized toolkit of algorithms for processing and evaluating large-scale complex datasets.
C++c-plus-pluscomputer-visiondeep-learning
Vezi pe GitHub14,399
data-centric-ai-community/ydata-profiling
Data-Centric-AI-Community/ydata-profiling
13,618Vezi pe GitHub
This library provides a diagnostic toolkit for automated data profiling and exploratory analysis. It generates comprehensive statistical summaries and visual reports for tabular datasets, enabling users to identify distribution patterns, missing values, and quality anomalies through a unified interface. The project distinguishes itself by offering differential analysis, which allows for the comparison of two dataset versions to track structural and statistical changes over time. It supports large-scale data processing through lazy evaluation and provides interactive widgets that embed directl
Automates the statistical summary and visualization of tabular datasets to identify patterns and quality issues.
Python
Vezi pe GitHub13,618
ydataai/pandas-profiling
ydataai/pandas-profiling
13,610Vezi pe GitHub
This project is an exploratory data analysis framework and profiling tool designed to generate comprehensive statistical reports from Pandas and Spark DataFrames. It functions as a data quality profiler that identifies missing values, duplicates, and high correlations within tabular datasets. The tool distinguishes itself through specialized capabilities for time-series analysis, extracting temporal statistics, seasonality, and auto-correlation plots. It also includes a dataset comparison utility to identify structural or content changes between different versions of a dataset. The analysis
Provides a framework that automatically generates statistical summaries and visual insights from tabular datasets.
Python
Vezi pe GitHub13,610
data-centric-ai-community/fg-data-profiling
Data-Centric-AI-Community/fg-data-profiling
13,609Vezi pe GitHub
This project is a data profiling and exploratory data analysis tool designed to generate automated quality reports for Pandas and Spark dataframes. It serves as a system for computing descriptive statistics, identifying correlations, and analyzing univariate and multivariate data patterns. The tool provides specialized capabilities for comparing different versions of datasets to identify changes in data quality and distributions. It includes a dedicated profiler for time-dependent data to extract statistical information such as seasonality and auto-correlation. The software covers a broad an
Automatically generates statistical summaries and visual insights to discover patterns and anomalies in new datasets.
Python
Vezi pe GitHub13,609
pandas-profiling/pandas-profiling
pandas-profiling/pandas-profiling
13,609Vezi pe GitHub
This project is an exploratory data analysis library and profiling tool for Pandas and Spark DataFrames. It automates the initial investigation of datasets by generating comprehensive descriptive analysis reports, statistical summaries, and data quality warnings. The system functions as a data quality profiler to detect missing values, duplicate rows, and type inconsistencies. It includes a dataset comparison tool for identifying structural and content shifts between different versions of the same data, as well as specialized tools for time-series analysis to calculate auto-correlation and se
Automatically generates statistical summaries and visual insights to facilitate the initial investigation of datasets.
Python
Vezi pe GitHub13,609
ydataai/ydata-profiling
ydataai/ydata-profiling
13,388Vezi pe GitHub
Ydata-profiling is an automated exploratory data analysis framework designed to generate comprehensive statistical reports and visual summaries from dataframes. It functions as a diagnostic tool for assessing data quality, identifying missing values, duplicates, and outliers, while providing a scalable engine for profiling massive datasets across distributed enterprise environments. The project distinguishes itself through its ability to handle large-scale data through distributed task orchestration and lazy stream processing, which minimizes memory overhead during complex computations. It in
Provides an automated framework for discovering data distributions, correlations, and quality issues within large datasets.
Pythonbig-data-analyticsdata-analysisdata-exploration
Vezi pe GitHub13,388
dbcli/pgcli
dbcli/pgcli
13,231Vezi pe GitHub
pgcli is an interactive command-line interface and database management tool for PostgreSQL. It functions as an interactive SQL shell and query editor that allows users to inspect schemas, manage connections, and run queries against PostgreSQL data sources. The tool is distinguished by its real-time, schema-aware autocompletion for keywords, tables, and columns, as well as dynamic SQL syntax highlighting. It provides safety mechanisms through transaction-aware guardrails that warn against or block destructive statements when no active transaction is detected. Broad capabilities include secure
Facilitates quick data inspection by formatting query results into readable tables for analysis.
Pythondatabasepostgrespostgresql
Vezi pe GitHub13,231
microsoft/vscode-copilot-chat
microsoft/vscode-copilot-chat
9,493Vezi pe GitHub
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Automatically generates a complete data analysis workflow, including notebook scaffolding and visualization code.
TypeScript
Vezi pe GitHub9,493
growthbook/growthbook
growthbook/growthbook
7,351Vezi pe GitHub
GrowthBook is a feature flagging and experimentation platform that utilizes a warehouse-native approach to data analysis. It serves as a system for managing feature rollouts and conducting A/B tests by executing SQL queries directly against existing data warehouses to calculate experiment results. The platform is distinguished by its integration of a Model Context Protocol server, which allows AI coding assistants and IDEs to manage flags and query analytics using natural language. It also provides specialized capabilities for AI model optimization, enabling the testing of prompts and models
Generates automated drill-down analyses for a single metric across multiple dimensions.
TypeScriptab-testingabtestabtesting
Vezi pe GitHub7,351
jvns/pandas-cookbook
jvns/pandas-cookbook
7,086Vezi pe GitHub
This project is a pandas data analysis cookbook and Python data science guide. It provides a collection of programmatic recipes and examples for cleaning, manipulating, and analyzing structured data. The project focuses on providing a containerized analysis environment to ensure a consistent workspace and reproducible dependencies when executing data processing scripts. It covers a broad range of data science capabilities, including data ingestion from external sources, raw data cleaning, and exploratory data analysis. These recipes demonstrate how to perform structured data analysis through
Provides tools for examining real-world datasets to identify patterns and extract meaningful insights.
Jupyter Notebook
Vezi pe GitHub7,086
j3ssie/osmedeus
j3ssie/Osmedeus
6,425Vezi pe GitHub
Osmedeus is a security workflow orchestration engine that coordinates AI agents, shell commands, and scanning tools through declarative YAML pipelines. It functions as a distributed security scanner, a declarative workflow automator, and an AI agent framework for security, enabling automated multi-step security analysis with conditional branching, parallel execution, and distributed workers. The engine distinguishes itself through a hybrid runner model that executes workflow steps on the local host, inside Docker containers, or over SSH to remote machines, selected per step or module. It supp
Sends prompts to language models and exports generated analysis for use in subsequent workflow steps.
Go
Vezi pe GitHub6,425
lux-org/lux
lux-org/lux
5,380Vezi pe GitHub
Lux este un instrument automatizat de analiză exploratorie a datelor conceput pentru a genera reprezentări vizuale inteligente ale dataframe-urilor pandas. Acesta identifică tipare și tendințe prin recomandarea tipurilor optime de diagrame și mapări ale axelor bazate pe atributele statistice ale unui set de date. Instrumentul funcționează ca un strat interactiv de profilare a datelor care permite utilizatorilor să răsfoiască și să interogheze colecții de diagrame folosind filtre și wildcard-uri. De asemenea, servește ca un generator de cod de vizualizare, traducând diagramele produse automat în cod programatic sau HTML pentru rafinare manuală în biblioteci externe. Sistemul acoperă o gamă largă de capabilități de analiză exploratorie, inclusiv codificarea automată a diagramelor, descoperirea ghidată prin recomandări de pași și capacitatea de a exporta configurații vizuale ca specificații declarative. Acest proiect se integrează direct în pandas pentru a suprascrie imprimarea implicită a dataframe-urilor cu componente de vizualizare interactive.
Automates the exploratory data analysis process by recommending optimal chart types and axis mappings based on dataset attributes.
Python
Vezi pe GitHub5,380
nyandwi/machine_learning_complete
Nyandwi/machine_learning_complete
4,983Vezi pe GitHub
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
Implements a complete suite for examining metrics and identifying patterns through data analysis.
Jupyter Notebookcomputer-visiondata-analysisdata-science
Vezi pe GitHub4,983
observedobserver/visual-insights
ObservedObserver/visual-insights
4,653Vezi pe GitHub
Visual Insights este o platformă automatizată de analiză exploratorie a datelor și un instrument de inferență cauzală conceput pentru a descoperi tipare și relații cauză-efect în seturi de date. Funcționează ca o bibliotecă interactivă de vizualizare a datelor folosind o abordare de tip grammar-of-graphics pentru a genera grafice și dashboard-uri multidimensionale. Proiectul se distinge printr-o interfață în limbaj natural care traduce întrebările în text simplu în răspunsuri și vizualizări de date prin intermediul unui model de limbaj. Oferă un framework specializat pentru descoperirea și inferența cauzală, permițând utilizatorilor să identifice legăturile dintre variabile prin grafuri cauzale interactive și să efectueze analize de tip what-if pentru a valida ipotezele. Platforma acoperă o gamă largă de capabilități, inclusiv curățarea vizuală a datelor, profilarea statistică și transformarea automatizată a seturilor de date. Suportă integrarea diverselor date din fișiere locale și baze de date la distanță și dispune de un motor de procesare de înaltă performanță pentru gestionarea locală a seturilor mari de date. În plus, sistemul permite încorporarea componentelor de analiză interactivă în aplicații web și notebook-uri.
Discovers patterns and trends in unfamiliar datasets using automated agents to generate multi-dimensional visualizations.
TypeScript
Vezi pe GitHub4,653
iflow-ai/iflow-cli
iflow-ai/iflow-cli
4,609Vezi pe GitHub
iflow-cli is a command-line interface and suite of AI tools designed for software engineering, workflow orchestration, and multimodal data analysis. It functions as an LLM command line interface that enables users to execute AI workflows, analyze codebase structures, and interact with large language models directly from the terminal. The project features a plugin-based agent architecture that allows for the integration of specialized domain experts and custom instruction sets from an external marketplace. It distinguishes itself through a multimodal AI terminal capable of processing visual da
Extracts information from spreadsheets to merge data into tables or generate visual charts.
Shell
Vezi pe GitHub4,609