26 dépôts
Tools for examining metrics to identify patterns and inform strategy.
Distinguishing note: Focuses on strategic data analysis rather than raw data processing.
Explore 26 awesome GitHub repositories matching data & databases · Data Analysis. Refine with filters or upvote what's useful.
Ce projet est un répertoire de logiciels open source organisé par la communauté, conçu pour être déployé dans des environnements de serveurs privés et des laboratoires domestiques. Il sert de ressource complète pour découvrir des alternatives indépendantes et auto-hébergées aux services cloud grand public, permettant aux utilisateurs de conserver la pleine propriété des données et le contrôle de leur infrastructure numérique. Le répertoire est structuré par une taxonomie hiérarchique qui organise une vaste collection d'applications en catégories logiques, allant de la gestion multimédia et de l'analyse de données à la communication privée et aux outils de productivité d'équipe. Il se distingue par un processus de revue par les pairs collaboratif, où les membres de la communauté valident la qualité et la pertinence de chaque soumission pour garantir que le répertoire reste précis et fiable. Le projet couvre une large surface de capacités, notamment l'automatisation de l'infrastructure, le déploiement de services basés sur des conteneurs et la gestion de configuration déclarative. Ces outils aident les utilisateurs à maintenir des environnements de serveur reproductibles et à gérer des dépendances de services complexes sur du matériel privé. Le répertoire est maintenu en tant que dépôt contrôlé par version, garantissant que toutes les mises à jour et les changements pilotés par la communauté sont suivis et transparents.
Performs systematic computational analysis of data to discover and interpret meaningful patterns.
PrivateGPT is a private AI document assistant and local knowledge base manager designed for querying private files and documents using retrieval-augmented generation. It functions as a local language model application and API gateway, allowing users to obtain cited answers from unstructured data without sending information to external servers. The system differentiates itself by acting as a tool integrator that connects language models to external functions, including web search, tabular data analysis, and custom action extensions. It provides a standardized API layer that allows local infere
Extracts structured insights from CSV files using a local model to ensure sensitive data remains offline.
Chat2DB is an AI-powered SQL client and multi-database GUI manager designed for managing various relational and NoSQL database systems. It serves as a visual database management tool and a natural language to SQL interface, allowing users to convert plain text descriptions into executable and optimized queries. The platform distinguishes itself through automated business intelligence capabilities, which include the generation of real-time data visualization dashboards and AI-driven data analysis from spreadsheets. To ensure data privacy, it supports secure local AI deployment, enabling large
Provides AI-driven analysis of spreadsheet files to extract patterns and insights using natural language processing.
Pandas AI is a data analysis library and natural language interface that uses large language models to perform conversational querying on structured datasets. It functions as a retrieval-augmented generation framework designed to translate plain text questions into executable code for extracting insights from dataframes and structured files. The system includes a dedicated sandbox execution environment that runs AI-generated analysis code within an isolated container to prevent security risks and system compromise. It employs a natural language translation layer and contextual retrieval to ma
Allows complex queries and comparisons across several data tables simultaneously to identify related insights.
SuperAGI is a comprehensive marketing automation platform and customer data system designed to orchestrate multi-channel engagement workflows. It functions as a no-code workflow orchestrator, allowing users to build complex, automated task sequences triggered by real-time user behavior, transactional data, or scheduled events. By centralizing customer profiles and interaction history, the platform enables businesses to manage end-to-end marketing operations from a single interface. The platform distinguishes itself through its deep integration with e-commerce storefronts and its ability to ex
Provides tools for examining campaign metrics to identify patterns and inform marketing strategy.
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
Offers a specialized toolkit of algorithms for processing and evaluating large-scale complex datasets.
This library provides a diagnostic toolkit for automated data profiling and exploratory analysis. It generates comprehensive statistical summaries and visual reports for tabular datasets, enabling users to identify distribution patterns, missing values, and quality anomalies through a unified interface. The project distinguishes itself by offering differential analysis, which allows for the comparison of two dataset versions to track structural and statistical changes over time. It supports large-scale data processing through lazy evaluation and provides interactive widgets that embed directl
Automates the statistical summary and visualization of tabular datasets to identify patterns and quality issues.
This project is an exploratory data analysis framework and profiling tool designed to generate comprehensive statistical reports from Pandas and Spark DataFrames. It functions as a data quality profiler that identifies missing values, duplicates, and high correlations within tabular datasets. The tool distinguishes itself through specialized capabilities for time-series analysis, extracting temporal statistics, seasonality, and auto-correlation plots. It also includes a dataset comparison utility to identify structural or content changes between different versions of a dataset. The analysis
Provides a framework that automatically generates statistical summaries and visual insights from tabular datasets.
This project is a data profiling and exploratory data analysis tool designed to generate automated quality reports for Pandas and Spark dataframes. It serves as a system for computing descriptive statistics, identifying correlations, and analyzing univariate and multivariate data patterns. The tool provides specialized capabilities for comparing different versions of datasets to identify changes in data quality and distributions. It includes a dedicated profiler for time-dependent data to extract statistical information such as seasonality and auto-correlation. The software covers a broad an
Automatically generates statistical summaries and visual insights to discover patterns and anomalies in new datasets.
This project is an exploratory data analysis library and profiling tool for Pandas and Spark DataFrames. It automates the initial investigation of datasets by generating comprehensive descriptive analysis reports, statistical summaries, and data quality warnings. The system functions as a data quality profiler to detect missing values, duplicate rows, and type inconsistencies. It includes a dataset comparison tool for identifying structural and content shifts between different versions of the same data, as well as specialized tools for time-series analysis to calculate auto-correlation and se
Automatically generates statistical summaries and visual insights to facilitate the initial investigation of datasets.
Ydata-profiling is an automated exploratory data analysis framework designed to generate comprehensive statistical reports and visual summaries from dataframes. It functions as a diagnostic tool for assessing data quality, identifying missing values, duplicates, and outliers, while providing a scalable engine for profiling massive datasets across distributed enterprise environments. The project distinguishes itself through its ability to handle large-scale data through distributed task orchestration and lazy stream processing, which minimizes memory overhead during complex computations. It in
Provides an automated framework for discovering data distributions, correlations, and quality issues within large datasets.
pgcli is an interactive command-line interface and database management tool for PostgreSQL. It functions as an interactive SQL shell and query editor that allows users to inspect schemas, manage connections, and run queries against PostgreSQL data sources. The tool is distinguished by its real-time, schema-aware autocompletion for keywords, tables, and columns, as well as dynamic SQL syntax highlighting. It provides safety mechanisms through transaction-aware guardrails that warn against or block destructive statements when no active transaction is detected. Broad capabilities include secure
Facilitates quick data inspection by formatting query results into readable tables for analysis.
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Automatically generates a complete data analysis workflow, including notebook scaffolding and visualization code.
GrowthBook is a feature flagging and experimentation platform that utilizes a warehouse-native approach to data analysis. It serves as a system for managing feature rollouts and conducting A/B tests by executing SQL queries directly against existing data warehouses to calculate experiment results. The platform is distinguished by its integration of a Model Context Protocol server, which allows AI coding assistants and IDEs to manage flags and query analytics using natural language. It also provides specialized capabilities for AI model optimization, enabling the testing of prompts and models
Generates automated drill-down analyses for a single metric across multiple dimensions.
Ce projet est un livre de recettes d'analyse de données pandas et un guide de science des données Python. Il fournit une collection de recettes programmatiques et d'exemples pour nettoyer, manipuler et analyser des données structurées. Le projet se concentre sur la fourniture d'un environnement d'analyse conteneurisé pour assurer un espace de travail cohérent et des dépendances reproductibles lors de l'exécution de scripts de traitement de données. Il couvre un large éventail de capacités en science des données, y compris l'ingestion de données à partir de sources externes, le nettoyage de données brutes et l'analyse exploratoire des données. Ces recettes démontrent comment effectuer une analyse de données structurées via des techniques telles que le filtrage, l'agrégation de données groupées et le traitement de données textuelles.
Provides tools for examining real-world datasets to identify patterns and extract meaningful insights.
Osmedeus is a security workflow orchestration engine that coordinates AI agents, shell commands, and scanning tools through declarative YAML pipelines. It functions as a distributed security scanner, a declarative workflow automator, and an AI agent framework for security, enabling automated multi-step security analysis with conditional branching, parallel execution, and distributed workers. The engine distinguishes itself through a hybrid runner model that executes workflow steps on the local host, inside Docker containers, or over SSH to remote machines, selected per step or module. It supp
Sends prompts to language models and exports generated analysis for use in subsequent workflow steps.
Lux est un outil d'analyse exploratoire de données automatisé conçu pour générer des représentations visuelles intelligentes de dataframes pandas. Il identifie les modèles et les tendances en recommandant les types de graphiques et les mappings d'axes optimaux basés sur les attributs statistiques d'un jeu de données. L'outil fonctionne comme une couche de profilage de données interactive qui permet aux utilisateurs de parcourir et d'interroger des collections de graphiques en utilisant des filtres et des wildcards. Il sert également de générateur de code de visualisation, traduisant les graphiques produits automatiquement en code programmatique ou en HTML pour un affinement manuel dans des bibliothèques externes. Le système couvre un large éventail de capacités d'analyse exploratoire, notamment l'encodage automatique de graphiques, la découverte guidée via des recommandations d'étapes, et la capacité d'exporter des configurations visuelles sous forme de spécifications déclaratives. Ce projet s'intègre directement dans pandas pour remplacer l'affichage par défaut des dataframes par des composants de visualisation interactifs.
Automates the exploratory data analysis process by recommending optimal chart types and axis mappings based on dataset attributes.
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
Implements a complete suite for examining metrics and identifying patterns through data analysis.
Visual Insights est une plateforme d'analyse exploratoire de données automatisée et un outil d'inférence causale conçu pour découvrir des modèles et des relations de cause à effet au sein des jeux de données. Il fonctionne comme une bibliothèque de visualisation de données interactive utilisant une approche de grammaire graphique pour générer des graphiques et des tableaux de bord multidimensionnels. Le projet se distingue par une interface en langage naturel qui traduit les questions en texte brut en réponses de données et visualisations via un modèle de langage. Il fournit un framework spécialisé pour la découverte et l'inférence causales, permettant aux utilisateurs d'identifier les liens entre variables via des graphes causaux interactifs et d'effectuer des analyses de type « et si » pour valider des hypothèses. La plateforme couvre un large éventail de capacités, incluant le nettoyage visuel des données, le profilage statistique et la transformation automatisée des jeux de données. Elle prend en charge l'intégration de données diverses provenant de fichiers locaux et de bases de données distantes, et dispose d'un moteur de traitement haute performance pour gérer de grands jeux de données localement. De plus, le système permet l'intégration de composants d'analyse interactifs dans des applications web et des notebooks.
Discovers patterns and trends in unfamiliar datasets using automated agents to generate multi-dimensional visualizations.
iflow-cli is a command-line interface and suite of AI tools designed for software engineering, workflow orchestration, and multimodal data analysis. It functions as an LLM command line interface that enables users to execute AI workflows, analyze codebase structures, and interact with large language models directly from the terminal. The project features a plugin-based agent architecture that allows for the integration of specialized domain experts and custom instruction sets from an external marketplace. It distinguishes itself through a multimodal AI terminal capable of processing visual da
Extracts information from spreadsheets to merge data into tables or generate visual charts.