# stefan-jansen/machine-learning-for-trading

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/stefan-jansen-machine-learning-for-trading).**

16,552 stars · 4,974 forks · Jupyter Notebook

## Links

- GitHub: https://github.com/stefan-jansen/machine-learning-for-trading
- Homepage: https://ml4trading.io
- awesome-repositories: https://awesome-repositories.com/repository/stefan-jansen-machine-learning-for-trading.md

## Topics

`artificial-intelligence` `data-science` `deep-learning` `finance` `investment` `investment-strategies` `machine-learning` `ml4t-workflow` `synthetic-data` `trading` `trading-agent` `trading-strategies`

## Description

This project is a comprehensive framework for engineering financial data pipelines, designed to automate the collection, cleaning, and synchronization of large-scale market datasets. It functions as a quantitative trading data engine, providing the infrastructure necessary to manage historical and real-time asset pricing information for research and machine learning workflows.

The system distinguishes itself through a configuration-driven approach to orchestration, allowing users to manage complex data acquisition tasks across multiple financial providers. It features resilient middleware that handles provider failover, rate limiting, and asynchronous batch requests, ensuring reliable data retrieval even when dealing with disparate sources. By normalizing diverse data formats and applying automated quality checks, the framework maintains consistent, high-fidelity inputs for downstream analytical models.

Beyond core acquisition, the project provides extensive capabilities for managing financial time series, including support for incremental updates, atomic file-based storage, and anomaly detection. It enables the construction of complex factor datasets and the definition of asset universes, while offering monitoring tools to track data health and provider performance over time. The repository is structured to support repeatable, automated workflows that can be easily integrated into broader quantitative research environments.

## Tags

### Business & Productivity Software

- [Financial Analysis Tools](https://awesome-repositories.com/f/business-productivity-software/financial-operational-management/billing-financial-systems/financial-analysis-tools.md) — Builds and maintains automated workflows to collect and clean large-scale market datasets for quantitative analysis.
- [Data Engines](https://awesome-repositories.com/f/business-productivity-software/quantitative-trading-platforms/data-engines.md) — Manages historical and real-time asset pricing data with support for provider failover, normalization, and incremental updates.
- [Continuous Futures Loaders](https://awesome-repositories.com/f/business-productivity-software/futures-position-managers/continuous-futures-loaders.md) — Constructs continuous time series for futures contracts by rolling across delivery dates. ([source](https://ml4trading.io/docs/data/providers/databento/))
- [Portfolio Data Fetchers](https://awesome-repositories.com/f/business-productivity-software/portfolio-management/portfolio-data-fetchers.md) — Fetches sorted portfolio data based on size and book-to-market ratios for asset pricing research. ([source](https://ml4trading.io/docs/data/providers/fama_french/))
- [Pricing Trackers](https://awesome-repositories.com/f/business-productivity-software/price-list-management/pricing-trackers.md) — Queries the latest probability prices for prediction market outcomes to inform real-time decision making. ([source](https://ml4trading.io/docs/data/providers/polymarket/))

### Data & Databases

- [Data Pipeline Automation](https://awesome-repositories.com/f/data-databases/data-pipeline-automation.md) — Orchestrates recurring data acquisition, validation, and synchronization tasks for quantitative research.
- [Time Series](https://awesome-repositories.com/f/data-databases/database-management-systems/database-engines/time-series.md) — Provides specialized engines for ingesting, indexing, and managing high-frequency financial time-series data for research and analysis.
- [Financial Data Connectors](https://awesome-repositories.com/f/data-databases/financial-data-connectors.md) — Integrates with multiple financial exchanges to fetch real-time and historical market data for diverse asset classes. ([source](https://ml4trading.io/docs/data/getting-started/provider-selection/))
- [Data Pipeline Orchestrators](https://awesome-repositories.com/f/data-databases/data-pipeline-orchestrators.md) — Orchestrates data collection and processing tasks using structured configuration files for repeatable workflows. ([source](https://ml4trading.io/docs/data/user-guide/))
- [Asset Universes](https://awesome-repositories.com/f/data-databases/file-asset-management/asset-universes.md) — Groups financial instruments into logical collections for batch processing and flexible workflow management. ([source](https://ml4trading.io/docs/data/book-guide/))
- [Financial Data Processing](https://awesome-repositories.com/f/data-databases/financial-data-processing.md) — Acts as a comprehensive toolkit for orchestrating, validating, and storing multi-source financial market data. ([source](https://ml4trading.io/docs/data/user-guide/))
- [Market Data Providers](https://awesome-repositories.com/f/data-databases/market-data-providers.md) — Fetches historical financial data from multiple providers for specific symbols to support machine learning pipelines. ([source](https://ml4trading.io/docs/data/))
- [Market Data Aggregators](https://awesome-repositories.com/f/data-databases/market-data-providers/market-data-aggregators.md) — Aggregates and standardizes financial information from diverse providers to ensure consistent inputs for analytical models.
- [Orchestration Middleware](https://awesome-repositories.com/f/data-databases/market-data-providers/orchestration-middleware.md) — Manages concurrent data acquisition, API rate limiting, and data quality verification across diverse financial providers.
- [Market Data Recorders](https://awesome-repositories.com/f/data-databases/market-data-recorders.md) — Persists processed financial data in partitioned formats with automated gap detection for high-speed retrieval. ([source](https://ml4trading.io/docs/data/user-guide/))
- [Time Series Data Storage](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage/data-storage/specialized-database-engines/time-series-data-storage.md) — Provides scalable storage for historical financial time-series data with metadata tracking and lazy evaluation. ([source](https://ml4trading.io/docs/data/user-guide/storage/))
- [Data Integrity and Validation](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-management-governance/data-integrity-validation.md) — Checks stored financial data for consistency and duplicates while performing anomaly detection. ([source](https://ml4trading.io/docs/data/user-guide/cli-reference/))
- [Atomic File Operations](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-management-governance/data-integrity-validation/data-integrity/atomic-file-operations.md) — Ensures filesystem consistency by writing to temporary files before atomic renaming during concurrent data operations.
- [Incremental Syncing](https://awesome-repositories.com/f/data-databases/data-modification-apis/incremental-syncing.md) — Updates local stores by fetching only new records since the last operation to minimize bandwidth usage.
- [Financial](https://awesome-repositories.com/f/data-databases/data-parsers/financial.md) — Normalizes data structures across multiple financial providers to ensure consistent input formats for research. ([source](https://ml4trading.io/docs/data/))
- [Data Quality Frameworks](https://awesome-repositories.com/f/data-databases/data-quality-frameworks.md) — Enforces accuracy and consistency standards on market data inputs using logical invariants and schema requirements. ([source](https://ml4trading.io/docs/data/tutorials/04_data_quality/))
- [Metric Calculators](https://awesome-repositories.com/f/data-databases/metric-calculators.md) — Computes essential financial indicators like returns and volatility from standardized market data structures. ([source](https://ml4trading.io/docs/data/tutorials/01_understanding_ohlcv/))
- [Time Series Data Utilities](https://awesome-repositories.com/f/data-databases/time-series-data-utilities.md) — Cleans and aligns market data by filling gaps and normalizing formats based on exchange calendars. ([source](https://ml4trading.io/docs/data/api/))
- [Anomaly Detection](https://awesome-repositories.com/f/data-databases/anomaly-detection.md) — Identifies statistical outliers and irregularities in time series data to maintain high-fidelity inputs. ([source](https://ml4trading.io/docs/data/user-guide/data-quality/))
- [Data Normalization](https://awesome-repositories.com/f/data-databases/data-normalization.md) — Normalizes historical price series by applying adjustments for stock splits and dividends. ([source](https://ml4trading.io/docs/data/providers/yahoo/))
- [Data Validation Tools](https://awesome-repositories.com/f/data-databases/data-validation-tools.md) — Compares results from multiple providers to verify data accuracy and consistency before analytical processing. ([source](https://ml4trading.io/docs/data/tutorials/05_multi_provider/))
- [Cross-Provider Validators](https://awesome-repositories.com/f/data-databases/data-validation/cross-provider-validators.md) — Ensures data reliability by comparing and reconciling datasets across multiple financial providers. ([source](https://ml4trading.io/docs/data/tutorials/04_data_quality/))
- [Market Data Access APIs](https://awesome-repositories.com/f/data-databases/market-data-access-apis.md) — Provides interfaces for retrieving historical currency pair and precious metal price data for quantitative analysis. ([source](https://ml4trading.io/docs/data/providers/oanda/))
- [Unified Data Provider Interfaces](https://awesome-repositories.com/f/data-databases/unified-data-provider-interfaces.md) — Normalizes disparate financial data sources into a unified interface for consistent downstream analytical pipelines.
- [Concurrency Control Mechanisms](https://awesome-repositories.com/f/data-databases/concurrency-control-mechanisms.md) — Manages simultaneous read and write access to data files using locking mechanisms to prevent corruption. ([source](https://ml4trading.io/docs/data/user-guide/incremental-updates/))
- [Data Parsing](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/data-parsing.md) — Parses raw tick-level data to extract order book events for microstructure research. ([source](https://ml4trading.io/docs/data/providers/nasdaq_itch/))
- [Local File Storage](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage/data-storage/file-based-storage/local-file-storage.md) — Persists financial datasets using partitioned file structures for efficient long-term management. ([source](https://ml4trading.io/docs/data/api/))
- [Batched Data Loading](https://awesome-repositories.com/f/data-databases/data-pipeline-orchestration/data-engineering-pipelines/batched-data-loading.md) — Downloads market information for multiple assets simultaneously to decrease total execution duration. ([source](https://ml4trading.io/docs/data/getting-started/quickstart/))
- [Dataset Aggregators](https://awesome-repositories.com/f/data-databases/dataset-aggregators.md) — Calculates and persists statistical summaries including null counts and distributions for data quality assessment. ([source](https://ml4trading.io/docs/data/user-guide/storage/))
- [In-Memory Caches](https://awesome-repositories.com/f/data-databases/in-memory-caches.md) — Groups data requests and caches results in memory to reduce network traffic and redundant lookups. ([source](https://ml4trading.io/docs/data/tutorials/02_rate_limiting/))
- [Lazy Evaluation Frameworks](https://awesome-repositories.com/f/data-databases/lazy-evaluation-frameworks.md) — Processes financial time series using deferred execution to optimize memory usage and performance.
- [Local Data Stores](https://awesome-repositories.com/f/data-databases/local-data-stores.md) — Synchronizes local data stores with remote sources using incremental, full refresh, or backfill strategies. ([source](https://ml4trading.io/docs/data/user-guide/cli-reference/))
- [Cryptocurrency](https://awesome-repositories.com/f/data-databases/market-data-providers/cryptocurrency.md) — Retrieves historical price and volume information for digital assets over specified date ranges. ([source](https://ml4trading.io/docs/data/providers/coingecko/))
- [Validation Integrations](https://awesome-repositories.com/f/data-databases/validation-integrations.md) — Provides automated integrity checks and anomaly detection to verify data quality within analytical pipelines. ([source](https://ml4trading.io/docs/data/user-guide/data-quality/))
- [Data Acquisition Optimizers](https://awesome-repositories.com/f/data-databases/data-acquisition-workflows/data-acquisition-optimizers.md) — Reduces network traffic and API usage by performing incremental updates and validating data quality. ([source](https://ml4trading.io/docs/data/tutorials/))
- [Data Storage](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage/data-storage.md) — Organizes financial data in local storage to support efficient retrieval and batch processing. ([source](https://ml4trading.io/docs/data/book-guide/))
- [Data Source Connectivity Tools](https://awesome-repositories.com/f/data-databases/data-source-connectivity-tools.md) — Manages connections to financial data sources by defining authentication, rate limits, and caching policies. ([source](https://ml4trading.io/docs/data/api/))
- [Data Storage Optimizers](https://awesome-repositories.com/f/data-databases/data-storage-optimizers.md) — Optimizes storage performance by partitioning large datasets into time-based chunks. ([source](https://ml4trading.io/docs/data/user-guide/incremental-updates/))
- [Dataset Downloaders](https://awesome-repositories.com/f/data-databases/dataset-downloaders.md) — Retrieves specific financial datasets such as historical futures or commitment of traders reports. ([source](https://ml4trading.io/docs/data/user-guide/cli-reference/))
- [Industry Classification Systems](https://awesome-repositories.com/f/data-databases/industry-classification-systems.md) — Downloads historical performance data for industry-specific portfolios to analyze sector-based market trends. ([source](https://ml4trading.io/docs/data/providers/fama_french/))
- [International Market Data Access](https://awesome-repositories.com/f/data-databases/market-data-providers/international-market-data-access.md) — Retrieves regional financial factor data for developed markets across Europe, Japan, and Asia-Pacific. ([source](https://ml4trading.io/docs/data/providers/fama_french/))

### Development Tools & Productivity

- [Configuration-Driven Orchestrators](https://awesome-repositories.com/f/development-tools-productivity/configuration-driven-orchestrators.md) — Uses structured configuration files to define and execute repeatable data acquisition and processing workflows.
- [Data Engineering Pipelines](https://awesome-repositories.com/f/development-tools-productivity/quantitative-workflow-orchestrators/data-engineering-pipelines.md) — Builds robust pipelines to fetch, normalize, and store large-scale historical market data for quantitative research.
- [Workflow Schedulers](https://awesome-repositories.com/f/development-tools-productivity/workflow-schedulers.md) — Schedules recurring data acquisition and processing tasks using time-based expressions and market-aware offsets. ([source](https://ml4trading.io/docs/data/user-guide/configuration/))

### System Administration & Monitoring

- [API Rate Limit Management](https://awesome-repositories.com/f/system-administration-monitoring/api-rate-limit-management.md) — Enforces global rate limits and handles retries or circuit breaking for financial data providers. ([source](https://ml4trading.io/docs/data/tutorials/02_rate_limiting/))
- [Health Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/health-monitoring.md) — Monitors the currency and integrity of stored market data to identify stale records or gaps. ([source](https://ml4trading.io/docs/data/user-guide/incremental-updates/))
- [Performance Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/performance-monitoring.md) — Tracks success rates, latency, and error logs for data providers to optimize acquisition performance. ([source](https://ml4trading.io/docs/data/tutorials/05_multi_provider/))

### Scientific & Mathematical Computing

- [Multi-Factor Research Models](https://awesome-repositories.com/f/scientific-mathematical-computing/multi-factor-research-models.md) — Provides historical asset pricing factors including market risk, size, and momentum for quantitative financial research. ([source](https://ml4trading.io/docs/data/providers/fama_french/))
- [Algorithmic Trading](https://awesome-repositories.com/f/scientific-mathematical-computing/quantitative-finance/algorithmic-trading.md) — Provides foundational infrastructure for defining asset universes and preparing historical data for algorithmic trading strategies.
- [Factor Dataset Aggregators](https://awesome-repositories.com/f/scientific-mathematical-computing/multi-factor-research-models/factor-dataset-aggregators.md) — Aggregates distinct factor streams into unified datasets to construct complex multi-factor models. ([source](https://ml4trading.io/docs/data/providers/fama_french/))
- [Economic Models](https://awesome-repositories.com/f/scientific-mathematical-computing/research-analysis-workflows/economic-analysis-tools/economic-models.md) — Retrieves historical time series data for interest rates, unemployment, and inflation indicators. ([source](https://ml4trading.io/docs/data/providers/fred/))

### Artificial Intelligence & ML

- [Failover Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/request-routing-gateways/failover-strategies.md) — Implements automated failover logic to switch between data providers during network errors or rate limits. ([source](https://ml4trading.io/docs/data/tutorials/05_multi_provider/))

### DevOps & Infrastructure

- [Automated Data Workflows](https://awesome-repositories.com/f/devops-infrastructure/automated-data-workflows.md) — Orchestrates complex data workflows with support for parallel execution, retries, and batch operations. ([source](https://ml4trading.io/docs/data/api/))
- [Dataset Update Managers](https://awesome-repositories.com/f/devops-infrastructure/automated-update-managers/dataset-update-managers.md) — Updates multiple datasets simultaneously using centralized configuration files to maintain data accuracy. ([source](https://ml4trading.io/docs/data/user-guide/cli-reference/))
- [Automated Update Management](https://awesome-repositories.com/f/devops-infrastructure/automated-update-management.md) — Automates the synchronization of local market datasets with remote sources using configuration-driven workflows. ([source](https://ml4trading.io/docs/data/getting-started/quickstart/))
- [Configuration Validation](https://awesome-repositories.com/f/devops-infrastructure/configuration-management/configuration-validation.md) — Validates configuration integrity by detecting missing references and logical inconsistencies before execution. ([source](https://ml4trading.io/docs/data/user-guide/configuration/))

### Software Engineering & Architecture

- [Data Schema Validation](https://awesome-repositories.com/f/software-engineering-architecture/data-schema-validation.md) — Performs automated quality checks and anomaly detection against logical invariants to ensure data reliability.
- [Request Middleware](https://awesome-repositories.com/f/software-engineering-architecture/request-middleware.md) — Implements automated retries, circuit breakers, and provider failover logic to maintain reliable data retrieval.
- [Asynchronous Request Handlers](https://awesome-repositories.com/f/software-engineering-architecture/concurrent-execution-managers/asynchronous-concurrency-managers/asynchronous-request-handlers.md) — Downloads financial data for multiple assets simultaneously using asynchronous batch processing. ([source](https://ml4trading.io/docs/data/))

### Programming Languages & Runtimes

- [Asynchronous Request Execution](https://awesome-repositories.com/f/programming-languages-runtimes/language-features-paradigms/concurrency-models/asynchronous-processing/asynchronous-request-execution.md) — Performs non-blocking data retrieval operations to improve performance when fetching large datasets. ([source](https://ml4trading.io/docs/data/providers/okx/))
