16 repositorios
Frameworks that automatically generate statistical summaries and visual insights from raw datasets.
Distinct from Data Analysis: Distinct from general data analysis: focuses on the automation of the exploratory phase rather than strategic or manual analysis.
Explore 16 awesome GitHub repositories matching data & databases · Automated Exploratory Analysis. Refine with filters or upvote what's useful.
Chat2DB is an AI-powered SQL client and multi-database GUI manager designed for managing various relational and NoSQL database systems. It serves as a visual database management tool and a natural language to SQL interface, allowing users to convert plain text descriptions into executable and optimized queries. The platform distinguishes itself through automated business intelligence capabilities, which include the generation of real-time data visualization dashboards and AI-driven data analysis from spreadsheets. To ensure data privacy, it supports secure local AI deployment, enabling large
Provides AI-driven analysis of spreadsheet files to extract patterns and insights using natural language processing.
This library provides a diagnostic toolkit for automated data profiling and exploratory analysis. It generates comprehensive statistical summaries and visual reports for tabular datasets, enabling users to identify distribution patterns, missing values, and quality anomalies through a unified interface. The project distinguishes itself by offering differential analysis, which allows for the comparison of two dataset versions to track structural and statistical changes over time. It supports large-scale data processing through lazy evaluation and provides interactive widgets that embed directl
Automates the statistical summary and visualization of tabular datasets to identify patterns and quality issues.
This project is an exploratory data analysis framework and profiling tool designed to generate comprehensive statistical reports from Pandas and Spark DataFrames. It functions as a data quality profiler that identifies missing values, duplicates, and high correlations within tabular datasets. The tool distinguishes itself through specialized capabilities for time-series analysis, extracting temporal statistics, seasonality, and auto-correlation plots. It also includes a dataset comparison utility to identify structural or content changes between different versions of a dataset. The analysis
Provides a framework that automatically generates statistical summaries and visual insights from tabular datasets.
This project is a data profiling and exploratory data analysis tool designed to generate automated quality reports for Pandas and Spark dataframes. It serves as a system for computing descriptive statistics, identifying correlations, and analyzing univariate and multivariate data patterns. The tool provides specialized capabilities for comparing different versions of datasets to identify changes in data quality and distributions. It includes a dedicated profiler for time-dependent data to extract statistical information such as seasonality and auto-correlation. The software covers a broad an
Automatically generates statistical summaries and visual insights to discover patterns and anomalies in new datasets.
This project is an exploratory data analysis library and profiling tool for Pandas and Spark DataFrames. It automates the initial investigation of datasets by generating comprehensive descriptive analysis reports, statistical summaries, and data quality warnings. The system functions as a data quality profiler to detect missing values, duplicate rows, and type inconsistencies. It includes a dataset comparison tool for identifying structural and content shifts between different versions of the same data, as well as specialized tools for time-series analysis to calculate auto-correlation and se
Automatically generates statistical summaries and visual insights to facilitate the initial investigation of datasets.
Ydata-profiling is an automated exploratory data analysis framework designed to generate comprehensive statistical reports and visual summaries from dataframes. It functions as a diagnostic tool for assessing data quality, identifying missing values, duplicates, and outliers, while providing a scalable engine for profiling massive datasets across distributed enterprise environments. The project distinguishes itself through its ability to handle large-scale data through distributed task orchestration and lazy stream processing, which minimizes memory overhead during complex computations. It in
Provides an automated framework for discovering data distributions, correlations, and quality issues within large datasets.
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Automatically generates a complete data analysis workflow, including notebook scaffolding and visualization code.
GrowthBook is a feature flagging and experimentation platform that utilizes a warehouse-native approach to data analysis. It serves as a system for managing feature rollouts and conducting A/B tests by executing SQL queries directly against existing data warehouses to calculate experiment results. The platform is distinguished by its integration of a Model Context Protocol server, which allows AI coding assistants and IDEs to manage flags and query analytics using natural language. It also provides specialized capabilities for AI model optimization, enabling the testing of prompts and models
Generates automated drill-down analyses for a single metric across multiple dimensions.
Osmedeus is a security workflow orchestration engine that coordinates AI agents, shell commands, and scanning tools through declarative YAML pipelines. It functions as a distributed security scanner, a declarative workflow automator, and an AI agent framework for security, enabling automated multi-step security analysis with conditional branching, parallel execution, and distributed workers. The engine distinguishes itself through a hybrid runner model that executes workflow steps on the local host, inside Docker containers, or over SSH to remote machines, selected per step or module. It supp
Sends prompts to language models and exports generated analysis for use in subsequent workflow steps.
Lux es una herramienta de análisis exploratorio de datos automatizado diseñada para generar representaciones visuales inteligentes de dataframes de pandas. Identifica patrones y tendencias recomendando tipos de gráficos óptimos y mapeos de ejes basados en los atributos estadísticos de un conjunto de datos. La herramienta funciona como una capa de perfilado de datos interactiva que permite a los usuarios navegar y consultar colecciones de gráficos utilizando filtros y comodines. También sirve como un generador de código de visualización, traduciendo gráficos producidos automáticamente en código programático o HTML para un refinamiento manual en bibliotecas externas. El sistema cubre una amplia gama de capacidades de análisis exploratorio, incluyendo codificación de gráficos automatizada, descubrimiento guiado a través de recomendaciones de pasos y la capacidad de exportar configuraciones visuales como especificaciones declarativas. Este proyecto se integra directamente en pandas para anular la impresión predeterminada de dataframes con componentes de visualización interactivos.
Automates the exploratory data analysis process by recommending optimal chart types and axis mappings based on dataset attributes.
Visual Insights es una plataforma de análisis exploratorio de datos automatizado y herramienta de inferencia causal diseñada para descubrir patrones y relaciones de causa y efecto dentro de los datasets. Funciona como una librería de visualización de datos interactiva utilizando un enfoque de gramática de gráficos para generar gráficos y dashboards multidimensionales. El proyecto se distingue por una interfaz de lenguaje natural que traduce preguntas en texto plano a respuestas y visualizaciones de datos mediante un modelo de lenguaje. Proporciona un framework especializado para el descubrimiento e inferencia causal, permitiendo a los usuarios identificar enlaces entre variables mediante gráficos causales interactivos y realizar análisis de tipo "qué pasaría si" (what-if) para validar hipótesis. La plataforma cubre un amplio rango de capacidades, incluyendo limpieza visual de datos, perfilado estadístico y transformación automatizada de datasets. Soporta la integración de datos diversos desde archivos locales y bases de datos remotas, y cuenta con un motor de procesamiento de alto rendimiento para manejar grandes datasets localmente. Además, el sistema permite embeber componentes de análisis interactivos en aplicaciones web y notebooks.
Discovers patterns and trends in unfamiliar datasets using automated agents to generate multi-dimensional visualizations.
EQGRP es un framework de troyano de acceso remoto y kit de herramientas de post-explotación. Proporciona una infraestructura centralizada de comando y control para desplegar implantes persistentes y gestionar agentes remotos en diversos sistemas operativos. El proyecto incluye herramientas para la evasión forense digital, como la modificación de registros del sistema y marcas de tiempo del sistema de archivos para eliminar rastros de ejecución. Cuenta con un sistema de interceptación de red para capturar y reconstruir flujos de datos mediante hooks en el root del sistema, así como exploits diseñados para la escalada de privilegios del kernel para elevar los permisos de proceso a root administrativo. El kit de herramientas cubre una amplia gama de capacidades, incluyendo ejecución remota de código, empaquetado de shellcode para evasión de firmas, y la exfiltración y análisis de registros de dispositivos móviles y registros de telecomunicaciones. También proporciona utilidades para enlazar puertos de red y navegar por archivos descifrados.
Parses telecommunications call detail records to extract structured data for analysis.
This project is a curated library of Python code examples, educational resources, and programming tutorials. It functions as an educational repository designed to teach Python language fundamentals through practical implementation tasks, real-world exercises, and functional code snippets. The collection covers a diverse range of implementation examples, including the development of interactive websites and message boards using web frameworks. It also features scripts for audio speech processing, automated media processing for images, and the extraction of data from web content. Additional ca
Parses call detail records from spreadsheets and computes total talk time per month.
Positron is a data science integrated development environment and AI-powered code editor designed for polyglot development, specifically supporting Python and R. It functions as a remote compute workspace that separates the user interface from the execution kernel via SSH or container integration. The environment features a deep integration of large language models that provide context-aware suggestions and automated data analysis by accessing real-time interpreter state, in-memory objects, and plot outputs. It distinguishes itself through a polyglot runtime bridge that enables cross-language
Automatically generates and executes statistical summaries and visualizations to uncover insights from datasets.
IronCalc is an XLSX spreadsheet engine and formula evaluator designed to compute numerical expressions and manage workbook structures. It utilizes a logic engine compatible with industry standards to evaluate formulas and manage cell dependencies. The project provides a comprehensive suite of specialized toolkits, including a financial calculation library for bond pricing and net present value, and an engineering math toolkit for complex number arithmetic and Bessel functions. It also features a web-based spreadsheet interface for creating and formatting workbooks. The engine covers a broad
Enables the creation of automated workflows to filter, sort, and aggregate large datasets using database-style criteria.
PromptX is an LLM agent orchestration framework designed to execute multi-step workflows using autonomous agents. It features a sandboxed tool execution environment for secure filesystem operations and external API integrations, alongside a persona management system that defines professional roles and domain expertise to control agent behavior. The system implements a semantic memory network for persistent knowledge storage, utilizing graph-based memory and engrams to retain information across sessions. This cognitive memory includes specialized tools for knowledge graph visualization, allowi
Processes Excel files to generate insights, automate reports, and create data visualizations.