26 repository-uri
Tools for examining metrics to identify patterns and inform strategy.
Distinguishing note: Focuses on strategic data analysis rather than raw data processing.
Explore 26 awesome GitHub repositories matching data & databases · Data Analysis. Refine with filters or upvote what's useful.
Acest proiect este un director curatoriat de comunitate cu software open-source conceput pentru implementarea în medii de server private și laboratoare de acasă (home labs). Servește drept resursă cuprinzătoare pentru descoperirea alternativelor independente, auto-găzduite, la serviciile cloud mainstream, permițând utilizatorilor să mențină proprietatea deplină a datelor și controlul asupra infrastructurii lor digitale. Directorul este structurat printr-o taxonomie ierarhică ce organizează o colecție vastă de aplicații în categorii logice, variind de la gestionarea media și analiza datelor la comunicare privată și instrumente de productivitate în echipă. Se distinge printr-un proces colaborativ de peer-review, unde membrii comunității validează calitatea și relevanța fiecărei trimiteri pentru a se asigura că directorul rămâne precis și fiabil. Proiectul acoperă o suprafață largă de capabilități, inclusiv automatizarea infrastructurii, implementarea serviciilor bazate pe containere și gestionarea configurației declarative. Aceste instrumente ajută utilizatorii să mențină medii de server reproductibile și să gestioneze dependențele complexe ale serviciilor pe hardware privat. Directorul este menținut ca un repository controlat prin versiuni, asigurându-se că toate actualizările și modificările conduse de comunitate sunt urmărite și transparente.
Performs systematic computational analysis of data to discover and interpret meaningful patterns.
PrivateGPT is a private AI document assistant and local knowledge base manager designed for querying private files and documents using retrieval-augmented generation. It functions as a local language model application and API gateway, allowing users to obtain cited answers from unstructured data without sending information to external servers. The system differentiates itself by acting as a tool integrator that connects language models to external functions, including web search, tabular data analysis, and custom action extensions. It provides a standardized API layer that allows local infere
Extracts structured insights from CSV files using a local model to ensure sensitive data remains offline.
Chat2DB is an AI-powered SQL client and multi-database GUI manager designed for managing various relational and NoSQL database systems. It serves as a visual database management tool and a natural language to SQL interface, allowing users to convert plain text descriptions into executable and optimized queries. The platform distinguishes itself through automated business intelligence capabilities, which include the generation of real-time data visualization dashboards and AI-driven data analysis from spreadsheets. To ensure data privacy, it supports secure local AI deployment, enabling large
Provides AI-driven analysis of spreadsheet files to extract patterns and insights using natural language processing.
Pandas AI is a data analysis library and natural language interface that uses large language models to perform conversational querying on structured datasets. It functions as a retrieval-augmented generation framework designed to translate plain text questions into executable code for extracting insights from dataframes and structured files. The system includes a dedicated sandbox execution environment that runs AI-generated analysis code within an isolated container to prevent security risks and system compromise. It employs a natural language translation layer and contextual retrieval to ma
Allows complex queries and comparisons across several data tables simultaneously to identify related insights.
SuperAGI is a comprehensive marketing automation platform and customer data system designed to orchestrate multi-channel engagement workflows. It functions as a no-code workflow orchestrator, allowing users to build complex, automated task sequences triggered by real-time user behavior, transactional data, or scheduled events. By centralizing customer profiles and interaction history, the platform enables businesses to manage end-to-end marketing operations from a single interface. The platform distinguishes itself through its deep integration with e-commerce storefronts and its ability to ex
Provides tools for examining campaign metrics to identify patterns and inform marketing strategy.
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
Offers a specialized toolkit of algorithms for processing and evaluating large-scale complex datasets.
This library provides a diagnostic toolkit for automated data profiling and exploratory analysis. It generates comprehensive statistical summaries and visual reports for tabular datasets, enabling users to identify distribution patterns, missing values, and quality anomalies through a unified interface. The project distinguishes itself by offering differential analysis, which allows for the comparison of two dataset versions to track structural and statistical changes over time. It supports large-scale data processing through lazy evaluation and provides interactive widgets that embed directl
Automates the statistical summary and visualization of tabular datasets to identify patterns and quality issues.
This project is an exploratory data analysis framework and profiling tool designed to generate comprehensive statistical reports from Pandas and Spark DataFrames. It functions as a data quality profiler that identifies missing values, duplicates, and high correlations within tabular datasets. The tool distinguishes itself through specialized capabilities for time-series analysis, extracting temporal statistics, seasonality, and auto-correlation plots. It also includes a dataset comparison utility to identify structural or content changes between different versions of a dataset. The analysis
Provides a framework that automatically generates statistical summaries and visual insights from tabular datasets.
This project is a data profiling and exploratory data analysis tool designed to generate automated quality reports for Pandas and Spark dataframes. It serves as a system for computing descriptive statistics, identifying correlations, and analyzing univariate and multivariate data patterns. The tool provides specialized capabilities for comparing different versions of datasets to identify changes in data quality and distributions. It includes a dedicated profiler for time-dependent data to extract statistical information such as seasonality and auto-correlation. The software covers a broad an
Automatically generates statistical summaries and visual insights to discover patterns and anomalies in new datasets.
This project is an exploratory data analysis library and profiling tool for Pandas and Spark DataFrames. It automates the initial investigation of datasets by generating comprehensive descriptive analysis reports, statistical summaries, and data quality warnings. The system functions as a data quality profiler to detect missing values, duplicate rows, and type inconsistencies. It includes a dataset comparison tool for identifying structural and content shifts between different versions of the same data, as well as specialized tools for time-series analysis to calculate auto-correlation and se
Automatically generates statistical summaries and visual insights to facilitate the initial investigation of datasets.
Ydata-profiling is an automated exploratory data analysis framework designed to generate comprehensive statistical reports and visual summaries from dataframes. It functions as a diagnostic tool for assessing data quality, identifying missing values, duplicates, and outliers, while providing a scalable engine for profiling massive datasets across distributed enterprise environments. The project distinguishes itself through its ability to handle large-scale data through distributed task orchestration and lazy stream processing, which minimizes memory overhead during complex computations. It in
Provides an automated framework for discovering data distributions, correlations, and quality issues within large datasets.
pgcli is an interactive command-line interface and database management tool for PostgreSQL. It functions as an interactive SQL shell and query editor that allows users to inspect schemas, manage connections, and run queries against PostgreSQL data sources. The tool is distinguished by its real-time, schema-aware autocompletion for keywords, tables, and columns, as well as dynamic SQL syntax highlighting. It provides safety mechanisms through transaction-aware guardrails that warn against or block destructive statements when no active transaction is detected. Broad capabilities include secure
Facilitates quick data inspection by formatting query results into readable tables for analysis.
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Automatically generates a complete data analysis workflow, including notebook scaffolding and visualization code.
GrowthBook is a feature flagging and experimentation platform that utilizes a warehouse-native approach to data analysis. It serves as a system for managing feature rollouts and conducting A/B tests by executing SQL queries directly against existing data warehouses to calculate experiment results. The platform is distinguished by its integration of a Model Context Protocol server, which allows AI coding assistants and IDEs to manage flags and query analytics using natural language. It also provides specialized capabilities for AI model optimization, enabling the testing of prompts and models
Generates automated drill-down analyses for a single metric across multiple dimensions.
This project is a pandas data analysis cookbook and Python data science guide. It provides a collection of programmatic recipes and examples for cleaning, manipulating, and analyzing structured data. The project focuses on providing a containerized analysis environment to ensure a consistent workspace and reproducible dependencies when executing data processing scripts. It covers a broad range of data science capabilities, including data ingestion from external sources, raw data cleaning, and exploratory data analysis. These recipes demonstrate how to perform structured data analysis through
Provides tools for examining real-world datasets to identify patterns and extract meaningful insights.
Osmedeus is a security workflow orchestration engine that coordinates AI agents, shell commands, and scanning tools through declarative YAML pipelines. It functions as a distributed security scanner, a declarative workflow automator, and an AI agent framework for security, enabling automated multi-step security analysis with conditional branching, parallel execution, and distributed workers. The engine distinguishes itself through a hybrid runner model that executes workflow steps on the local host, inside Docker containers, or over SSH to remote machines, selected per step or module. It supp
Sends prompts to language models and exports generated analysis for use in subsequent workflow steps.
Lux este un instrument automatizat de analiză exploratorie a datelor conceput pentru a genera reprezentări vizuale inteligente ale dataframe-urilor pandas. Acesta identifică tipare și tendințe prin recomandarea tipurilor optime de diagrame și mapări ale axelor bazate pe atributele statistice ale unui set de date. Instrumentul funcționează ca un strat interactiv de profilare a datelor care permite utilizatorilor să răsfoiască și să interogheze colecții de diagrame folosind filtre și wildcard-uri. De asemenea, servește ca un generator de cod de vizualizare, traducând diagramele produse automat în cod programatic sau HTML pentru rafinare manuală în biblioteci externe. Sistemul acoperă o gamă largă de capabilități de analiză exploratorie, inclusiv codificarea automată a diagramelor, descoperirea ghidată prin recomandări de pași și capacitatea de a exporta configurații vizuale ca specificații declarative. Acest proiect se integrează direct în pandas pentru a suprascrie imprimarea implicită a dataframe-urilor cu componente de vizualizare interactive.
Automates the exploratory data analysis process by recommending optimal chart types and axis mappings based on dataset attributes.
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
Implements a complete suite for examining metrics and identifying patterns through data analysis.
Visual Insights este o platformă automatizată de analiză exploratorie a datelor și un instrument de inferență cauzală conceput pentru a descoperi tipare și relații cauză-efect în seturi de date. Funcționează ca o bibliotecă interactivă de vizualizare a datelor folosind o abordare de tip grammar-of-graphics pentru a genera grafice și dashboard-uri multidimensionale. Proiectul se distinge printr-o interfață în limbaj natural care traduce întrebările în text simplu în răspunsuri și vizualizări de date prin intermediul unui model de limbaj. Oferă un framework specializat pentru descoperirea și inferența cauzală, permițând utilizatorilor să identifice legăturile dintre variabile prin grafuri cauzale interactive și să efectueze analize de tip what-if pentru a valida ipotezele. Platforma acoperă o gamă largă de capabilități, inclusiv curățarea vizuală a datelor, profilarea statistică și transformarea automatizată a seturilor de date. Suportă integrarea diverselor date din fișiere locale și baze de date la distanță și dispune de un motor de procesare de înaltă performanță pentru gestionarea locală a seturilor mari de date. În plus, sistemul permite încorporarea componentelor de analiză interactivă în aplicații web și notebook-uri.
Discovers patterns and trends in unfamiliar datasets using automated agents to generate multi-dimensional visualizations.
iflow-cli is a command-line interface and suite of AI tools designed for software engineering, workflow orchestration, and multimodal data analysis. It functions as an LLM command line interface that enables users to execute AI workflows, analyze codebase structures, and interact with large language models directly from the terminal. The project features a plugin-based agent architecture that allows for the integration of specialized domain experts and custom instruction sets from an external marketplace. It distinguishes itself through a multimodal AI terminal capable of processing visual da
Extracts information from spreadsheets to merge data into tables or generate visual charts.