30 open-source projects similar to blue-yonder/tsfresh, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Tsfresh alternative.
sktime is a machine learning framework for time series analysis. It provides a unified toolkit for implementing time series classification, forecasting, and anomaly detection using standardized machine learning interfaces. The library serves as a collection of tools for assigning categorical labels to temporal sequences, predicting future values based on historical patterns, and identifying outliers or unusual patterns within temporal data. The framework includes capabilities for panel-data handling and pipeline-based transformations. It utilizes a unified API wrapper and plugin-based model
Darts is a Python time series library designed for forecasting, anomaly detection, and the preprocessing of univariate and multivariate temporal data. It serves as a comprehensive framework for training and evaluating a wide range of statistical, machine learning, and deep learning models to predict future numerical values. The toolkit is distinguished by its support for global time series modeling, allowing a single model to be trained across multiple different series to leverage shared patterns. It also features a hierarchical time series manager to ensure consistency between aggregate and
PyCaret is a Python AutoML platform and MLOps lifecycle manager designed to automate machine learning workflows. It functions as a low-code environment that leverages a scikit-learn native engine to execute preprocessing, training, and evaluation for tabular data. The platform distinguishes itself as an LLM-powered ML copilot, using large language model agents to analyze datasets, design experiment configurations, and explain model results. It also serves as a Kubernetes ML orchestrator and model registry, enabling the versioning of trained pipelines and their promotion to production API endp
tsai is a deep learning library for time series classification, regression, and forecasting. Built on PyTorch and fastai, it provides a framework for assigning labels to sequential data, predicting future values in univariate or multivariate sequences, and training representations on unlabeled data through self-supervised learning. The library distinguishes itself with specialized temporal engineering and scaling capabilities. It includes tools for cyclical temporal encoding to capture seasonal patterns and online window slicing to process datasets larger than available memory. It also suppor
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
sktime is a machine learning framework designed for time series analysis. It provides a unified interface for performing time series forecasting, classification, and anomaly detection, integrating these capabilities into a standardized toolkit compatible with the scikit-learn API. The framework allows for the construction of complex analysis workflows through model pipelining and ensemble-based aggregation. It uses adapter-based integration to wrap external time series libraries, providing a single entry point for diverse algorithmic implementations. Its capabilities cover temporal data tran
GluonTS is a probabilistic time series library and deep learning forecasting framework. It provides a toolkit for building, training, and evaluating neural network architectures that predict future values as probability distributions to quantify uncertainty. The project distinguishes itself by supporting zero-shot forecasting and integrating diverse modeling approaches, including deep probabilistic neural networks and wrappers for external statistical libraries such as Prophet and R forecast. It implements specialized architectural primitives like causal convolutions and invertible residual n
PyTorch Forecasting is a deep learning framework designed for building and training neural network architectures specifically for time series forecasting. It serves as a comprehensive toolkit for implementing autoregressive models, multi-horizon forecasting, and probabilistic prediction intervals using PyTorch tensors. The library distinguishes itself through a probabilistic forecasting toolkit that generates prediction intervals and quantile forecasts using both parametric and non-parametric distributions. It further provides a neural network model optimizer for automated hyperparameter tuni
Featuretools is an automated feature engineering library and data transformation framework written in Python. It automatically generates machine learning feature vectors from multi-table datasets by applying synthesis patterns to relational and timestamped data. The system functions as a distributed feature synthesis engine, allowing the process of creating feature vectors to scale across multiple cores or clusters to handle large-scale datasets. The library supports the synthesis of multi-table datasets, time series feature generation, and the creation of custom machine learning primitives
Modin is a distributed dataframe library and parallel data processing engine designed to handle large datasets that exceed system memory. It functions as a distributed computing framework that parallelizes data manipulation tasks across multiple CPU cores or clusters to increase throughput and avoid memory errors. The project mirrors the Pandas API, allowing for the distribution of data workflows without changing core code logic. It utilizes a pluggable backend interface, which enables users to switch between different distributed execution engines to optimize performance based on available h
Vaex is a high-performance Apache Arrow DataFrame library and out-of-core data processing engine designed to handle billion-row tabular datasets in Python. It functions as a lazy evaluation framework that defers computations and transformations until results are required, enabling the processing of datasets that exceed available system RAM by mapping files directly from disk. The project distinguishes itself as a tool for big data visualization and exploration, specifically integrated for use within interactive notebooks. It provides specialized capabilities for machine learning feature engin
This PyTorch-based deep learning library provides a framework for analyzing and forecasting temporal data. It implements specialized architectures for time series forecasting, anomaly detection, data imputation, and classification. The project distinguishes itself through the inclusion of zero-shot inference capabilities, allowing large-scale temporal models to be evaluated on unseen datasets without requiring task-specific fine-tuning. The framework covers a broad range of analytical capabilities, including the recovery of missing values in incomplete datasets, the identification of irregul
Kats is a time series analysis framework and library providing tools for statistical characterization, anomaly detection, and trend forecasting. It functions as a toolkit for predicting future values based on historical data and identifying irregular patterns or structural change points within temporal sequences. The project includes a temporal feature extraction tool to calculate descriptive statistics and characteristics that summarize time series behavior. It also provides a system for model hyperparameter tuning using self-supervised learning to improve the scale and generalization of pre
TimesFM is a time series foundation model designed to generalize across diverse temporal datasets for forecasting and anomaly detection. It functions as a pretrained model for predicting future values in univariate time series data, eliminating the need for manual training from scratch. The project includes a framework for adapting pretrained weights to specific datasets using low-rank adaptation to improve accuracy. It also provides specialized capabilities for integrating time-series predictions as tools within autonomous AI agent architectures and automated workflows. The system supports
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Rayon is a data parallelism library for Rust that provides a framework for converting sequential computations into parallel operations. It enables the transformation of standard data structures and loops into parallel iterators, allowing workloads to be distributed across multiple processor cores. By utilizing a work-stealing scheduler, the library dynamically balances tasks to maximize throughput and minimize execution time. The library distinguishes itself through its focus on safe, scoped task synchronization, which ensures that all spawned operations complete before a scope exits to preve
This is a scikit-learn automated machine learning framework designed to optimize model selection and hyperparameters. It functions as an automated model selector and hyperparameter optimization tool for classification and regression tasks, utilizing an automated ensemble builder to combine high-performing models for increased predictive accuracy. The system features a distributed search engine that uses Dask for parallel machine learning optimization across CPU cores or clusters. It implements a budget-based evaluation strategy through successive halving to prioritize promising model configur
Statsmodels is a comprehensive Python library designed for statistical modeling, econometric research, and data analysis. It provides a robust framework for estimating and diagnosing a wide range of statistical models, enabling users to perform rigorous hypothesis testing, regression analysis, and complex data exploration within structured environments. The library distinguishes itself through its support for advanced statistical methodologies, including state space representation for dynamic systems and generalized linear frameworks that accommodate non-normal response variables. It offers s
Smile is a comprehensive JVM machine learning library and statistical computing toolkit. It provides a suite of algorithms for classification, regression, and clustering, implemented natively for Java, Scala, and Kotlin. The project also functions as a deep learning framework, a natural language processing library, and an inference engine for large language models. The library distinguishes itself through GPU acceleration via LibTorch bindings and support for the ONNX model interchange format. It includes specialized capabilities for large language model inference, featuring Byte-Pair Encodin
DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets. The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
Featuretools is a Python data science library and automated feature engineering framework designed to create predictive features from multiple related datasets. It automates the data preparation and transformation steps required for machine learning models through deep feature synthesis. The library enables the automatic generation of comprehensive feature tables by applying recursive transformations to relational data. It supports the transformation of unstructured text into structured numeric features and allows users to define custom primitives to extend the synthesis process with specific
Optuna is a Python-based hyperparameter optimization framework designed to automate the search for optimal machine learning model configurations. It functions as a Bayesian optimization library that systematically tests parameter combinations to maximize or minimize objective functions, streamlining the model development process through iterative evaluation. The project distinguishes itself through a define-by-run dynamic construction model, which allows users to build complex, conditional search spaces using standard programming logic. Its architecture is highly modular, featuring a pluggabl
Prophet is a time series forecasting library and decomposition tool that uses an additive regression model to predict future values. It functions as an uncertainty estimation tool, calculating confidence intervals and error metrics to quantify the risk associated with future predictions. The project is distinguished by its ability to incorporate human-interpretable parameters for model tuning and its use of Bayesian inference for parameter estimation. It supports the integration of external regressors and special event modeling to account for the impact of holidays and specific dates on forec
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
Feast is an open-source feature store for machine learning that provides a central platform for defining, storing, and serving features across both training and inference workflows. It operates as a declarative system where feature definitions are written as code in Python files, synchronized to a central registry, and made available for low-latency online retrieval or point-in-time correct historical joins for training datasets. The project abstracts storage behind a pluggable architecture, allowing offline and online backends to be swapped without changing retrieval logic, and coordinates ma
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized