Analytics, dataframes and notebooks

Explore open-source tools for data manipulation, statistical analysis, and interactive computational notebook environments.

Find the best repos with AI.We'll search the best matching repositories with AI.

ambv/black
ambv/black
41,560View on GitHub
Black is a deterministic Python code formatter and style guide enforcer. It automatically reformats source code and Jupyter notebook cells into a consistent style to eliminate manual debates over code layout and reduce noise in version control diffs. The tool uses abstract syntax tree analysis to restructure code layout while ensuring that the underlying functional logic remains unchanged. It employs a deterministic engine that produces a single consistent output for any given input, removing subjective styling choices. The system provides capabilities for in-place file mutation, automated style enforcement across entire projects, and the use of configuration files to define line lengths and excluded file patterns. It further verifies code integrity by comparing the abstract syntax trees of the original and reformatted code to ensure functional equivalence.
PythonDeterministic FormattersAST Transformation ToolsAST-Based Formatters
View on GitHub41,560
grafana/grafana
grafana/grafana
74,456View on GitHub
Grafana is an observability data platform designed to aggregate metrics, logs, and traces from diverse sources into a unified environment. It functions as a centralized interface for visualizing complex telemetry data, transforming raw streams into interactive dashboards that support real-time system health tracking and performance monitoring. The platform distinguishes itself through a plugin-based modular architecture that integrates disparate databases, cloud services, and monitoring tools via a standardized data abstraction layer. This framework allows for the dynamic loading of external components to support varied data sources and visualization types without requiring modifications to the core codebase. Additionally, the system incorporates a rule-based alerting engine that evaluates incoming data streams against defined thresholds to trigger automated notifications for incident response. Beyond its core visualization and alerting capabilities, the platform provides tools for infrastructure performance monitoring and operational data analysis. It utilizes a declarative, component-driven interface to manage dashboard states and a compiled backend to process high-throughput queries and API requests. The system maintains configuration persistence and state consistency across distributed instances through a centralized metadata storage layer.
TypeScriptObservability Data PlatformsObservability DashboardsTelemetry Collection and Aggregation
View on GitHub74,456
psf/black
psf/black
41,578View on GitHub
This project is an uncompromising, deterministic code formatter for Python. It functions by parsing source code into an abstract syntax tree and regenerating it according to a rigid, opinionated set of style rules. By automating the formatting process, it eliminates manual style debates and configuration overhead, ensuring that code remains consistent across entire projects regardless of the original input. The tool distinguishes itself through its focus on speed and seamless integration into development workflows. It utilizes content-based file caching and parallel processing to maintain high performance on large codebases, while supporting version control hooks to enforce style consistency before code is committed. To preserve project history, it provides mechanisms to ignore specific commits in version control blame tracking, ensuring that automated style changes do not obscure original authorship. Beyond standard source files, the formatter extends its capabilities to include Jupyter notebooks, type stubs, and embedded code examples within documentation. It offers broad compatibility through plugins for major text editors and integrated development environments, as well as support for the language server protocol. Configuration is managed through project-level files that are automatically discovered within the directory hierarchy, allowing for consistent behavior across diverse development environments.
PythonCode FormattersPython Development ToolsAutomated Formatting Frameworks
View on GitHub41,578
microsoft/qlib
microsoft/qlib
44,490View on GitHub
This project is a comprehensive platform for quantitative investment research, machine learning, and algorithmic trading. It provides an end-to-end environment for developing, testing, and executing financial strategies, supporting the entire lifecycle from data ingestion and feature engineering to model training and backtesting. The system is distinguished by its configuration-driven workflow orchestration, which allows researchers to automate complex pipelines and manage experiments through declarative files. It features a high-performance data infrastructure that utilizes custom binary formats to optimize throughput for large-scale market datasets, while a dedicated temporal management layer enforces strict point-in-time data integrity to prevent information leakage during simulations. Furthermore, the platform includes a hierarchical simulation framework that coordinates multi-level trading interactions, such as the relationship between daily portfolio management and intraday order execution. Beyond its core research capabilities, the platform offers a specialized toolkit for financial machine learning, including support for reinforcement learning agents and meta-learning algorithms. Users can integrate custom models and trading strategies through standardized interfaces, ensuring flexibility in how predictive signals are generated and applied. The environment also provides robust utilities for experiment tracking, containerized deployment management, and performance reporting to facilitate reproducible research and strategy verification.
PythonAlgorithmic Trading FrameworksAlgorithmic Trading PlatformsAlgorithmic Trading Simulators
View on GitHub44,490
prefecthq/prefect
PrefectHQ/prefect
21,640View on GitHub
Prefect is a workflow orchestration platform designed to define, schedule, and monitor complex data pipelines as Python code. It functions as a container-native engine that wraps individual tasks in isolated environments, ensuring consistent dependencies and resource allocation across diverse infrastructure. By utilizing a state-machine-based orchestration model, the system tracks execution progress through discrete transitions and persistent event logs to maintain reliable and observable task processing. The platform distinguishes itself through a decoupled worker-API architecture, which separates task scheduling from execution by allowing remote workers to poll a central API for pending work units. This design enables distributed task concurrency, allowing parallel workloads to scale horizontally across clusters or remote nodes. Furthermore, the system supports event-driven workflow triggering, enabling pipelines to initiate or resume automatically in response to system state changes or external signals. The project provides a comprehensive capability surface for managing the entire lifecycle of data operations. This includes modular block-based configuration for injecting credentials and infrastructure settings, result persistence caching for optimizing redundant computations, and extensive integration support for cloud services, databases, and version control systems. Users can also leverage built-in tools for infrastructure automation, data lineage tracking, and automated notification management. The software is distributed as a Python-based framework, with documentation and installation guides available to assist in configuring self-hosted deployments or connecting to managed orchestration services.
PythonData Pipeline OrchestrationWorkflow OrchestrationContainer-Native Infrastructure
View on GitHub21,640
anuraghazra/github-readme-stats
anuraghazra/github-readme-stats
79,661View on GitHub
This project is a serverless service that generates dynamic, themeable visual summaries of software development activity. It functions as an automated metadata visualizer, transforming raw platform logs and repository metrics into resolution-independent vector graphics that can be embedded directly into markdown environments. The service distinguishes itself by offering highly configurable, query-parameter-driven rendering that allows users to customize the visual presentation of their coding patterns, language proficiency, and repository details. It supports both real-time generation via serverless functions and the creation of static image files through automated workflows, providing flexibility in how data is fetched and displayed. The platform aggregates disparate data points from multiple sources to provide comprehensive insights into development habits and project metadata. Users can deploy private instances of the service to maintain full control over caching strategies, authentication tokens, and rate limit management.
JavaScriptGitHub Stats CardsLanguage Distribution CardsProfile Personalization Suites
View on GitHub79,661
recommenders-team/recommenders
recommenders-team/recommenders
21,769View on GitHub
This project is a recommendation system framework designed for building, evaluating, and operationalizing personalized item suggestion engines. It provides a comprehensive toolkit for implementing collaborative filtering and content-based algorithms, supported by an end-to-end machine learning pipeline for preparing datasets and deploying predictive models. The framework distinguishes itself through the integration of knowledge graphs to provide richer context for recommendations and the use of industry-specific patterns to accelerate system deployment. It also includes a specialized model evaluation toolkit for measuring recommendation quality through diversity analysis, novelty, and ranking metrics. The system covers the full development lifecycle, including data engineering for interaction datasets, hyperparameter tuning, and distributed model training across CPU and GPU clusters. It further provides tools for performance benchmarking, API load testing, and model effectiveness tracking via A/B testing and conversion rates. The project includes command-line utilities for parameterized notebook execution to validate system behavior.
PythonRecommender SystemsCollaborative Filtering ModelsCollaborative Filtering Utilities
View on GitHub21,769
wshobson/agents
wshobson/agents
36,830View on GitHub
This project is an automated trading and agentic workflow platform designed to orchestrate complex financial tasks through state-based graphs. It provides a comprehensive framework for building, deploying, and managing autonomous agents that execute multi-step analytical processes, monitor real-time market conditions, and perform high-speed trade execution. The platform distinguishes itself through a robust agentic plugin ecosystem that integrates directly with popular AI-powered development environments and command-line interfaces. It features a specialized financial analysis engine capable of multimodal data processing, which converts complex visual charts and diverse market datasets into structured formats for advanced decision-making models. By utilizing state-graph orchestration, the system ensures precise control over agent transitions, tool sequencing, and state persistence during automated operations. Beyond its core orchestration capabilities, the platform includes extensive tools for quantitative financial analysis, risk management, and portfolio optimization. It supports the definition of custom financial functions, automated technical indicator computation, and the generation of actionable trading insights based on real-time sentiment and trend analysis. The architecture also incorporates modular routing, tiered caching, and language-agnostic service exposure to facilitate scalable, reliable data retrieval and system operation. The platform is designed for integration into professional development workflows, offering native support for installation via standard package managers and CLI registries. It provides a structured environment for configuring behavioral rules and tool-calling templates, enabling users to deploy containerized analytical services across various infrastructure environments.
PythonAlgorithmic Trading EnginesAutomated Trading PlatformsFinancial Analysis Tools
View on GitHub36,830
kaggle/kaggle-cli
Kaggle/kaggle-cli
7,417View on GitHub
The Kaggle API command line interface is a suite of utilities for managing datasets, machine learning models, and competition entries from a terminal. It functions as a command line wrapper that translates user input into API calls to control remote cloud resources. The project differentiates itself by providing specialized tools for automating the execution of notebook kernels and managing the lifecycle of machine learning models, including version iteration and performance tracking. It also includes a utility for executing evaluation tasks against large language models and downloading the resulting performance metrics. The tool covers several broad capability areas, including dataset management for uploading and downloading data collections, competition entry management for submitting and tracking contest results, and programmatic browsing of community discussion forums. User identity is managed through token-based client authentication using API keys stored in local configuration files or via a web-based authorization flow.
PythonCommand Line InterfacesKaggle API ClientsCompetition Management Systems
View on GitHub7,417
pathwaycom/llm-app
pathwaycom/llm-app
59,341View on GitHub
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transformation workflows. The framework distinguishes itself through differential dataflow execution, which propagates only changes through a pipeline rather than recomputing entire datasets. It supports distributed state management across worker nodes and utilizes incremental stream processing to trigger computations only when source data updates. These capabilities are paired with a specialized vector search framework that maintains low-latency access to evolving knowledge bases for retrieval-augmented generation. The platform facilitates enterprise AI integration by connecting large language models to private data sources. It includes pre-built application templates to assist in the deployment of high-accuracy retrieval systems and scalable data pipelines.
Jupyter NotebookData Processing FrameworksDifferential Dataflow EnginesDistributed State Management
View on GitHub59,341
jupyter/notebook
jupyter/notebook
13,204View on GitHub
This project is a browser-based interactive computing environment and data science IDE. It serves as a literate programming tool that allows users to create documents combining live code, mathematical equations, visualizations, and narrative text. As a polyglot notebook interface, it connects to various language kernels to execute code and render output within a single interface. The application distinguishes itself by separating the frontend interface from a remote compute engine through a language-agnostic kernel interface. This allows it to support multiple programming languages while maintaining a consistent document editor for computational authoring and data exploration. The system covers a broad range of capabilities, including interactive code debugging, inline code completion, and execution history recall. It provides tools for document structure visualization and a scratchpad console for variable inspection. Additionally, the interface supports rich media embedding, diagram rendering, and integrated audio-visual playback. Users can manage their environment through global application configuration, visual theme management, and customizable keyboard shortcuts. The application also includes a navigable file management interface for browsing and organizing documents.
Jupyter NotebookExecution KernelsInteractive Data Science EnvironmentsBlock-Based Document Models
View on GitHub13,204
dbeaver/dbeaver
dbeaver/dbeaver
50,678View on GitHub
DBeaver is a universal database client and administration environment designed for managing diverse relational and non-relational database systems. It provides a unified graphical interface that enables users to perform data manipulation, schema migration, and performance monitoring across multiple platforms. By utilizing a standardized driver abstraction layer, the application translates generic requests into database-specific commands, ensuring consistent interaction regardless of the underlying technology. The project distinguishes itself through an extensible, plugin-based architecture that allows for functional expansion and broad support for various database drivers. It integrates advanced workflow automation, enabling users to schedule repetitive tasks and execute complex sequences of operations as background processes. Additionally, the environment incorporates AI-driven assistance for generating SQL queries and executing natural language commands, alongside robust security features such as Kerberos authentication and cloud credential management. Beyond core connectivity, the application offers a comprehensive suite of tools for data analysis, including grid-based editing, schema comparison, and execution plan visualization. Users can manage large datasets efficiently through virtual data paging and customize their workspace with context-aware UI components. The platform also supports automated lifecycle management, allowing for the execution of custom shell commands during connection events to streamline administrative workflows.
JavaDatabase Management ClientsDatabase Management SystemsDatabase Administration Tools
View on GitHub50,678
sansan0/trendradar
sansan0/TrendRadar
59,513View on GitHub
TrendRadar is a market intelligence tool designed to aggregate and analyze external information sources for monitoring shifts in consumer behavior and industry patterns. It functions as a visual data analytics dashboard, transforming raw market data into interactive charts and insights through a component-based interface. The platform utilizes a declarative state management system where application behavior is governed by a centralized configuration object. This architecture supports interactive dashboard development, allowing users to manipulate data sets and visualize emerging trends over time. Changes to the configuration state are handled through event-driven synchronization, ensuring that data representations remain consistent across the interface. The system incorporates a structured configuration management workflow, utilizing a schema-driven approach to validate user-defined settings and parameters. This environment includes a dedicated editor for adjusting the filters and metrics used to track information, supported by a build process that optimizes assets for browser delivery.
PythonMarket Intelligence PlatformsAnalytics DashboardsComponent Architectures
View on GitHub59,513
finos/perspective
finos/perspective
10,967View on GitHub
Perspective is a columnar data analytics library and streaming data visualization engine. It provides an interactive data grid component and notebook analytics widgets designed for processing high-volume data and rendering interactive charts and grids. The system utilizes a high-performance query engine to enable real-time data analysis and streaming dataset visualization. It supports the creation of customizable dashboards and reports that update automatically as new data arrives without requiring full dataset reloads. The project covers large-scale dataset analytics through a schema-driven data model and columnar memory storage. It includes capabilities for virtualized grid rendering and integration with notebook environments for exploratory data analysis. The engine includes a pluggable interface for querying external data sources and utilizes WebAssembly for executing queries in the browser.
C++Columnar Data ProcessorsReal-Time Charting EnginesClient-Side Incremental State Updates
View on GitHub10,967
pathwaycom/pathway
pathwaycom/pathway
62,959View on GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features integrated vector-aware data ingestion, which automates the creation and maintenance of searchable document indexes that update instantly as new data arrives. Developers can connect language models directly into their pipelines, utilizing built-in capabilities for document chunking, embedding generation, and result reranking to maintain synchronized, context-aware information retrieval. Beyond its core processing capabilities, the platform provides a robust infrastructure for deploying data applications. It supports the transition from batch to streaming workflows by simply updating input connectors, while its containerized deployment model allows for scaling services across local and cloud environments. The system is designed to handle large-scale event-driven tasks, providing a consistent programming model for both analytics and automated content generation workflows.
PythonData Processing FrameworksData Stream ProcessorsDeclarative Pipeline Construction
View on GitHub62,959
altair-viz/altair
altair-viz/altair
10,410View on GitHub
Altair is a declarative data visualization library for Python based on the Vega-Lite grammar. It allows users to create statistical visualizations by mapping data fields to visual properties rather than writing imperative drawing code. The library focuses on interactive charting through a system of linked selections and filters that update multiple visualizations based on user input. It renders charts as JSON and HTML for display in web browsers and interactive notebooks. The project covers statistical data analysis and interactive data exploration, providing capabilities to export visuals as standalone HTML, JSON, or image files. These assets can execute directly within a web runtime to render charts without requiring a Python backend.
PythonDeclarative Visualization LanguagesDeclarative Visualization GrammarsInteractive Data Charting
View on GitHub10,410
aymericdamien/tensorflow-examples
aymericdamien/TensorFlow-Examples
43,749View on GitHub
This repository serves as a structured educational resource for machine learning and deep learning, providing a library of executable scripts and notebooks. It is designed to help users master the practical application of data processing, model evaluation, and neural network construction through annotated code samples and guided tutorials. The collection focuses on translating theoretical mathematical concepts into functional code, offering proven patterns for common tasks such as classification and regression. By providing curated examples of layer construction and training loops, the repository enables users to prototype experimental models and implement fundamental algorithms using standard industry frameworks. The materials cover the core mechanics of tensor-based data flow, automatic differentiation, and computational graph execution. These examples illustrate how to manage model state and optimize mathematical structures for hardware acceleration, providing a practical guide for those learning to build and train models within the framework.
Jupyter NotebookAutomatic Differentiation EnginesDeep Learning Code LibrariesTensor Processing Libraries
View on GitHub43,749
virattt/ai-hedge-fund
virattt/ai-hedge-fund
60,143View on GitHub
This project is an algorithmic trading platform designed to automate financial market analysis and the execution of investment strategies. It provides an end-to-end environment for processing real-time market data through automated decision models, allowing for the triggering of financial transactions based on predefined quantitative signals and risk parameters without manual intervention. The platform distinguishes itself through a modular pipeline architecture that decouples data ingestion, signal generation, and trade execution, facilitating the iterative refinement of investment models. It incorporates a comprehensive backtesting engine that evaluates strategies against historical market datasets to calculate performance metrics and risk profiles. To ensure consistency and reliability, the entire research and execution workflow is containerized, providing isolated environments that manage complex dependencies and standardize software stacks across different machines. The system includes a suite of infrastructure automation tools that simplify the deployment and maintenance of financial software. These tools support declarative environment configuration and automated deployment pipelines, enabling users to manage complex financial analysis tasks and strategy simulations within a repeatable, standardized workspace.
PythonAlgorithmic Trading PlatformsAlgorithmic TradingBacktesting Engines
View on GitHub60,143
marimo-team/marimo
marimo-team/marimo
21,468View on GitHub
Marimo is a reactive Python notebook environment and data science integrated development environment. It functions as a scripting tool that maintains state consistency by automatically tracking variable dependencies and re-executing downstream code blocks whenever upstream inputs are modified. The platform distinguishes itself by storing notebooks as standard, portable Python scripts rather than proprietary formats, ensuring compatibility with version control systems. It integrates artificial intelligence to assist with code generation and debugging based on the current execution context, while also providing built-in support for direct SQL database queries and automated dependency management within the project files. The environment supports the transformation of analytical documents into standalone web applications or executable command-line tools. It manages the execution lifecycle through a reactive model that prevents stale variable errors and ensures that the interface remains synchronized with the underlying memory state.
PythonNotebook EnvironmentsInteractive Data Science EnvironmentsReactive Execution Models
View on GitHub21,468
scikit-learn/scikit-learn
scikit-learn/scikit-learn
66,344View on GitHub
Scikit-learn is a machine learning library for predictive data analysis that provides a collection of algorithms for supervised and unsupervised learning. It functions as a comprehensive toolkit for data preprocessing, dimensionality reduction, and model selection, allowing users to classify data objects, predict continuous values, and cluster similar items based on historical patterns. The project is defined by a unified interface design where objects either learn from data, transform data, or chain these operations into sequential workflows. To ensure performance on large or high-dimensional datasets, the library utilizes vectorized numerical operations, memory-efficient sparse matrix structures, and multi-core parallel execution. Performance-critical components are implemented using compiled extension modules to maintain execution speed while integrating with standard scientific computing tools. The framework includes systematic tools for model validation, such as automated cross-validation loops and parameter tuning, which help identify optimal configurations and prevent overfitting. These capabilities are supported by a suite of utilities for feature engineering and data normalization, ensuring that raw information is structured and compatible with various analytical models.
PythonDimensionality Reduction EnginesFrameworksPipeline Patterns
View on GitHub66,344

Analytics, dataframes and notebooks

ambv/black

grafana/grafana

psf/black

microsoft/qlib

PrefectHQ/prefect

anuraghazra/github-readme-stats

recommenders-team/recommenders

wshobson/agents

Kaggle/kaggle-cli

pathwaycom/llm-app

jupyter/notebook

dbeaver/dbeaver

sansan0/TrendRadar

finos/perspective

pathwaycom/pathway

altair-viz/altair

aymericdamien/TensorFlow-Examples

virattt/ai-hedge-fund

marimo-team/marimo

scikit-learn/scikit-learn