27 个仓库
Tools for examining metrics to identify patterns and inform strategy.
Distinguishing note: Focuses on strategic data analysis rather than raw data processing.
Explore 27 awesome GitHub repositories matching data & databases · Data Analysis. Refine with filters or upvote what's useful.
这是一个由社区策划的开源软件目录,专为在私有服务器环境和家庭实验室中部署而设计。它作为发现主流云服务独立自托管替代方案的综合资源,使用户能够保持对数字基础设施的完全数据所有权和控制权。 该目录通过层级分类法构建,将庞大的应用程序集合组织成逻辑类别,范围从媒体管理和数据分析到私有通信和团队生产力工具。它通过协作同行评审流程脱颖而出,社区成员验证每个提交的质量和相关性,以确保目录保持准确和可靠。 该项目涵盖了广泛的能力领域,包括基础设施自动化、基于容器的服务部署和声明式配置管理。这些工具协助用户维护可复现的服务器环境,并管理私有硬件上的复杂服务依赖。 该目录作为版本控制仓库进行维护,确保所有更新和社区驱动的变更都是可追踪且透明的。
Performs systematic computational analysis of data to discover and interpret meaningful patterns.
PrivateGPT is a private AI document assistant and local knowledge base manager designed for querying private files and documents using retrieval-augmented generation. It functions as a local language model application and API gateway, allowing users to obtain cited answers from unstructured data without sending information to external servers. The system differentiates itself by acting as a tool integrator that connects language models to external functions, including web search, tabular data analysis, and custom action extensions. It provides a standardized API layer that allows local infere
Extracts structured insights from CSV files using a local model to ensure sensitive data remains offline.
Chat2DB is an AI-powered SQL client and multi-database GUI manager designed for managing various relational and NoSQL database systems. It serves as a visual database management tool and a natural language to SQL interface, allowing users to convert plain text descriptions into executable and optimized queries. The platform distinguishes itself through automated business intelligence capabilities, which include the generation of real-time data visualization dashboards and AI-driven data analysis from spreadsheets. To ensure data privacy, it supports secure local AI deployment, enabling large
Provides AI-driven analysis of spreadsheet files to extract patterns and insights using natural language processing.
Pandas AI is a data analysis library and natural language interface that uses large language models to perform conversational querying on structured datasets. It functions as a retrieval-augmented generation framework designed to translate plain text questions into executable code for extracting insights from dataframes and structured files. The system includes a dedicated sandbox execution environment that runs AI-generated analysis code within an isolated container to prevent security risks and system compromise. It employs a natural language translation layer and contextual retrieval to ma
Allows complex queries and comparisons across several data tables simultaneously to identify related insights.
SuperAGI is a comprehensive marketing automation platform and customer data system designed to orchestrate multi-channel engagement workflows. It functions as a no-code workflow orchestrator, allowing users to build complex, automated task sequences triggered by real-time user behavior, transactional data, or scheduled events. By centralizing customer profiles and interaction history, the platform enables businesses to manage end-to-end marketing operations from a single interface. The platform distinguishes itself through its deep integration with e-commerce storefronts and its ability to ex
Provides tools for examining campaign metrics to identify patterns and inform marketing strategy.
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
Offers a specialized toolkit of algorithms for processing and evaluating large-scale complex datasets.
This library provides a diagnostic toolkit for automated data profiling and exploratory analysis. It generates comprehensive statistical summaries and visual reports for tabular datasets, enabling users to identify distribution patterns, missing values, and quality anomalies through a unified interface. The project distinguishes itself by offering differential analysis, which allows for the comparison of two dataset versions to track structural and statistical changes over time. It supports large-scale data processing through lazy evaluation and provides interactive widgets that embed directl
Automates the statistical summary and visualization of tabular datasets to identify patterns and quality issues.
This project is an exploratory data analysis framework and profiling tool designed to generate comprehensive statistical reports from Pandas and Spark DataFrames. It functions as a data quality profiler that identifies missing values, duplicates, and high correlations within tabular datasets. The tool distinguishes itself through specialized capabilities for time-series analysis, extracting temporal statistics, seasonality, and auto-correlation plots. It also includes a dataset comparison utility to identify structural or content changes between different versions of a dataset. The analysis
Provides a framework that automatically generates statistical summaries and visual insights from tabular datasets.
This project is a data profiling and exploratory data analysis tool designed to generate automated quality reports for Pandas and Spark dataframes. It serves as a system for computing descriptive statistics, identifying correlations, and analyzing univariate and multivariate data patterns. The tool provides specialized capabilities for comparing different versions of datasets to identify changes in data quality and distributions. It includes a dedicated profiler for time-dependent data to extract statistical information such as seasonality and auto-correlation. The software covers a broad an
Automatically generates statistical summaries and visual insights to discover patterns and anomalies in new datasets.
This project is an exploratory data analysis library and profiling tool for Pandas and Spark DataFrames. It automates the initial investigation of datasets by generating comprehensive descriptive analysis reports, statistical summaries, and data quality warnings. The system functions as a data quality profiler to detect missing values, duplicate rows, and type inconsistencies. It includes a dataset comparison tool for identifying structural and content shifts between different versions of the same data, as well as specialized tools for time-series analysis to calculate auto-correlation and se
Automatically generates statistical summaries and visual insights to facilitate the initial investigation of datasets.
Ydata-profiling is an automated exploratory data analysis framework designed to generate comprehensive statistical reports and visual summaries from dataframes. It functions as a diagnostic tool for assessing data quality, identifying missing values, duplicates, and outliers, while providing a scalable engine for profiling massive datasets across distributed enterprise environments. The project distinguishes itself through its ability to handle large-scale data through distributed task orchestration and lazy stream processing, which minimizes memory overhead during complex computations. It in
Provides an automated framework for discovering data distributions, correlations, and quality issues within large datasets.
pgcli is an interactive command-line interface and database management tool for PostgreSQL. It functions as an interactive SQL shell and query editor that allows users to inspect schemas, manage connections, and run queries against PostgreSQL data sources. The tool is distinguished by its real-time, schema-aware autocompletion for keywords, tables, and columns, as well as dynamic SQL syntax highlighting. It provides safety mechanisms through transaction-aware guardrails that warn against or block destructive statements when no active transaction is detected. Broad capabilities include secure
Facilitates quick data inspection by formatting query results into readable tables for analysis.
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Automatically generates a complete data analysis workflow, including notebook scaffolding and visualization code.
GrowthBook is a feature flagging and experimentation platform that utilizes a warehouse-native approach to data analysis. It serves as a system for managing feature rollouts and conducting A/B tests by executing SQL queries directly against existing data warehouses to calculate experiment results. The platform is distinguished by its integration of a Model Context Protocol server, which allows AI coding assistants and IDEs to manage flags and query analytics using natural language. It also provides specialized capabilities for AI model optimization, enabling the testing of prompts and models
Generates automated drill-down analyses for a single metric across multiple dimensions.
这是一个 pandas 数据分析实战手册和 Python 数据科学指南。它提供了一系列用于清理、操作和分析结构化数据的编程配方和示例。 该项目专注于提供容器化的分析环境,以确保在执行数据处理脚本时拥有一致的工作空间和可复现的依赖项。 它涵盖了广泛的数据科学功能,包括从外部源进行数据摄取、原始数据清理和探索性数据分析。这些配方演示了如何通过过滤、聚合分组数据和处理文本数据等技术进行结构化数据分析。
Provides tools for examining real-world datasets to identify patterns and extract meaningful insights.
Osmedeus is a security workflow orchestration engine that coordinates AI agents, shell commands, and scanning tools through declarative YAML pipelines. It functions as a distributed security scanner, a declarative workflow automator, and an AI agent framework for security, enabling automated multi-step security analysis with conditional branching, parallel execution, and distributed workers. The engine distinguishes itself through a hybrid runner model that executes workflow steps on the local host, inside Docker containers, or over SSH to remote machines, selected per step or module. It supp
Sends prompts to language models and exports generated analysis for use in subsequent workflow steps.
Lux 是一款自动化探索性数据分析工具,旨在为 pandas 数据帧生成智能视觉表示。它通过根据数据集的统计属性推荐最佳图表类型和轴映射来识别模式和趋势。 该工具作为一个交互式数据分析层,允许用户使用过滤器和通配符浏览和查询图表集合。它还充当可视化代码生成器,将自动生成的图表转换为程序代码或 HTML,以便在外部库中进行手动优化。 该系统涵盖了广泛的探索性分析功能,包括自动图表编码、通过步骤推荐进行引导式发现,以及将视觉配置导出为声明式规范的能力。 该项目直接集成到 pandas 中,通过交互式可视化组件覆盖默认的数据帧打印方式。
Automates the exploratory data analysis process by recommending optimal chart types and axis mappings based on dataset attributes.
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
Implements a complete suite for examining metrics and identifying patterns through data analysis.
Visual Insights is an automated exploratory data analysis platform and causal inference tool designed to discover patterns and cause-and-effect relationships within datasets. It functions as an interactive data visualization library using a grammar-of-graphics approach to generate multi-dimensional charts and dashboards. The project distinguishes itself through a natural language interface that translates plain-text questions into data answers and visualizations via a language model. It provides a specialized framework for causal discovery and inference, allowing users to identify variable li
Discovers patterns and trends in unfamiliar datasets using automated agents to generate multi-dimensional visualizations.
iflow-cli is a command-line interface and suite of AI tools designed for software engineering, workflow orchestration, and multimodal data analysis. It functions as an LLM command line interface that enables users to execute AI workflows, analyze codebase structures, and interact with large language models directly from the terminal. The project features a plugin-based agent architecture that allows for the integration of specialized domain experts and custom instruction sets from an external marketplace. It distinguishes itself through a multimodal AI terminal capable of processing visual da
Extracts information from spreadsheets to merge data into tables or generate visual charts.