30 open-source projects similar to statsmodels/statsmodels, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Statsmodels alternative.
PyMC is a Bayesian probabilistic programming framework used for building probabilistic models and performing Bayesian inference. It provides a probabilistic graphical model library for specifying random variables, priors, and likelihood functions, supported by an MCMC sampling engine and variational inference tools to estimate posterior distributions. The framework features a GPU-accelerated inference backend that compiles models into machine code to increase execution speed. It utilizes a backend-agnostic tensor execution model and just-in-time graph compilation to optimize the computation o
This project is a scientific computing framework for the .NET ecosystem, providing a comprehensive suite of libraries for numerical analysis, statistics, and mathematical optimization. It serves as a foundational toolkit for developing applications in machine learning, digital signal processing, and computer vision. The framework provides specialized toolkits for training and deploying predictive models, including neural networks, support vector machines, and decision trees. It further distinguishes itself with deep integrations for real-time visual analysis, such as object tracking and facia
SymPy is a Python computer algebra system and symbolic mathematics library. It performs algebraic manipulations, calculus, and equation solving using symbolic representations to achieve exact computations rather than numerical approximations. The library includes a LaTeX expression parser that converts mathematical strings into symbolic representations for computation and formula manipulation. It also incorporates a mathematical benchmarking suite to measure execution speed and detect performance regressions across different software versions. The system provides capabilities for automated m
This project is a collection of foundational machine learning algorithms and data science tools implemented in Python. It focuses on building the logic of these tools using basic programming primitives rather than relying on specialized libraries. The implementation covers several core domains, including a linear algebra library for matrix and vector operations, a statistical analysis toolkit for probability and hypothesis testing, and a framework for map-reduce distributed processing. It also includes implementations for natural language processing, graph theory for network analysis, and var
Prophet is a predictive analytics framework and time series regression library designed for forecasting future values. It uses additive models to fit non-linear growth and periodic seasonal patterns, providing tools for producing forecasts with integrated error measurement. The project handles multiple seasonalities and holiday effects to improve accuracy for periodic data. It supports the integration of external regressors and manages data irregularities, such as missing data and outliers, to maintain prediction stability. The framework covers a broad range of analysis capabilities, includi
DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets. The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f
This project is a comprehensive library for numerical linear algebra and scientific computing, designed to provide optimized routines for matrix decomposition, statistical modeling, and high-performance data analysis. It serves as both a toolkit for solving complex linear systems and an educational resource for understanding the fundamental algorithms behind matrix factorizations and numerical solvers. The library distinguishes itself through a focus on randomized numerical linear algebra, utilizing probabilistic algorithms and approximate methods to perform dimensionality reduction and matri
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
Multiple Pairwise Comparisons (Post Hoc) Tests in Python
ObsPy: A Python Toolbox for seismology/seismological observatories.
SciPy is a scientific computing library for Python that provides a comprehensive collection of mathematical algorithms and numerical tools for research and engineering. It functions as a high-performance numerical analysis framework, bridging high-level Python code with compiled C and Fortran routines to execute complex computations at hardware speeds. The library is built upon array-based data structures that utilize strided memory layouts to enable efficient data manipulation and slicing. By employing vectorized operation dispatch and linking to optimized hardware-specific linear algebra li
quant-wiki is a comprehensive knowledge base and structured reference for quantitative finance, financial engineering, and algorithmic trading. It serves as a centralized library of documentation covering mathematical models, financial instruments, and systematic trading strategies. The project integrates AI-driven capabilities through a modular retrieval-augmented generation framework that extracts structured data from research papers and news. It features a multi-agent workflow engine designed to discover and validate predictive alpha factors, alongside tools for local large language model
GrowthBook is a feature flagging and experimentation platform that utilizes a warehouse-native approach to data analysis. It serves as a system for managing feature rollouts and conducting A/B tests by executing SQL queries directly against existing data warehouses to calculate experiment results. The platform is distinguished by its integration of a Model Context Protocol server, which allows AI coding assistants and IDEs to manage flags and query analytics using natural language. It also provides specialized capabilities for AI model optimization, enabling the testing of prompts and models
This project is a comprehensive collection of practical code examples and implementation libraries for machine learning. It provides a wide array of reference materials for building supervised, unsupervised, and reinforcement learning algorithms. The repository serves as a multi-domain resource, featuring specific implementation suites for financial AI, Bayesian statistical modeling, and deep learning architectures. It includes a framework for training intelligent agents using policy gradients and actor-critic models, as well as practical guides for fine-tuning transformers and utilizing larg
QuantResearch is a quantitative research framework and specialized toolkit for algorithmic simulation, financial time-series analysis, and systematic trading. It provides an event-driven backtesting environment for validating strategies against historical tick and bar data, alongside a dedicated portfolio optimization engine for calculating asset weights and risk metrics. The project distinguishes itself through a machine learning finance toolkit that implements recurrent neural networks for price prediction and reinforcement learning for derivative pricing. It also features advanced statisti
Smile is a comprehensive JVM machine learning library and statistical computing toolkit. It provides a suite of algorithms for classification, regression, and clustering, implemented natively for Java, Scala, and Kotlin. The project also functions as a deep learning framework, a natural language processing library, and an inference engine for large language models. The library distinguishes itself through GPU acceleration via LibTorch bindings and support for the ONNX model interchange format. It includes specialized capabilities for large language model inference, featuring Byte-Pair Encodin
This repository provides a collection of Python implementations for causal inference, designed to estimate the impact of specific interventions using observational data. It serves as a statistical toolkit for researchers to isolate causal signals from complex confounding factors in data sets that lack experimental control. The framework enables the application of rigorous methodologies to study health determinants and evaluate policy interventions. By utilizing structural causal modeling and directed acyclic graphs, the library allows users to map causal dependencies and identify the necessar
Kats is a time series analysis framework and library providing tools for statistical characterization, anomaly detection, and trend forecasting. It functions as a toolkit for predicting future values based on historical data and identifying irregular patterns or structural change points within temporal sequences. The project includes a temporal feature extraction tool to calculate descriptive statistics and characteristics that summarize time series behavior. It also provides a system for model hyperparameter tuning using self-supervised learning to improve the scale and generalization of pre
This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis. The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
This project is a comprehensive machine learning educational resource and tutorial series delivered as a collection of interactive Jupyter Notebooks. It provides practical Python implementations for the end-to-end machine learning lifecycle, covering supervised and unsupervised learning, deep learning, and reinforcement learning. The resource distinguishes itself by providing detailed implementation guides for complex architectures, including transformers, generative adversarial networks, and convolutional neural networks. It also features specialized courseware for developing reinforcement l
tsfresh is an automated feature engineering tool and library designed to extract statistical characteristics from raw time series data. It transforms sequential data into tabular datasets, converting time series into a flat format where each row represents a unique entity and columns represent extracted features. The project distinguishes itself through a parallel data processing framework that distributes heavy computational workloads across multiple CPU cores. It also implements hypothesis-based feature selection to identify the most predictive characteristics and filter out irrelevant ones
Prophet is a time series forecasting library and decomposition tool that uses an additive regression model to predict future values. It functions as an uncertainty estimation tool, calculating confidence intervals and error metrics to quantify the risk associated with future predictions. The project is distinguished by its ability to incorporate human-interpretable parameters for model tuning and its use of Bayesian inference for parameter estimation. It supports the integration of external regressors and special event modeling to account for the impact of holidays and specific dates on forec
Ludwig is a multimodal machine learning platform and low-code framework designed for building, training, and deploying neural networks. It enables the construction of models that process text, images, audio, and tabular data through a unified interface using declarative configuration files rather than custom code. The system features a specialized low-code framework for large language models, supporting supervised fine-tuning, preference alignment, and a constrained decoding tool to force structured data output via logit extraction. It also includes an automated model architecture search to i
Vowpal Wabbit is an open-source machine learning system designed for online learning, where models update incrementally from streaming data without requiring full retraining. It provides a reduction-based learning framework that composes complex tasks from simpler algorithms, and includes a feature hashing trick that maps unbounded feature names into a fixed-size vector space to keep memory usage constant regardless of dataset size. The system supports distributed training across a cluster using an allreduce protocol for synchronized updates, and offers an active learning query strategy that s
ScottPlot is a cross-platform, high-performance charting library for .NET that renders interactive plots across desktop and web GUI frameworks including Windows Forms, WPF, MAUI, Avalonia, Blazor, and WinUI. It provides an optimized rendering engine capable of displaying millions of data points with interactive pan, zoom, and live data streaming, while also supporting image export to formats like PNG and SVG for file output, cloud applications, and notebooks. The library distinguishes itself through a comprehensive set of chart types including scatter, line, bar, pie, heatmap, financial, rada
This project is a synthetic data generator designed to create realistic tabular and time-series datasets for machine learning and testing workflows. It functions as a privacy-preserving platform that models the underlying statistical distributions of source data to produce new records that maintain the original statistical properties and structural integrity. The tool distinguishes itself by utilizing CPU-optimized statistical sampling, allowing for high-performance data generation on standard hardware without the need for specialized graphics processing units. It employs a configuration-driv
cuml is a GPU-accelerated machine learning library and framework that uses CUDA to accelerate tabular data preprocessing and model execution. It provides a suite of tools for training and deploying classification, regression, and clustering models on NVIDIA GPUs and GPU clusters. The library is designed for scalability, offering a distributed GPU machine learning environment that can spread computation and data across multiple hardware accelerators and nodes to handle datasets exceeding single-device memory. It mirrors standard estimator interfaces to allow the replacement of CPU-based models
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
statsforecast is a high-performance statistical time series forecasting library designed to generate point forecasts and prediction intervals. It functions as a distributed time series framework that utilizes a C-based forecasting engine and an automated model selector to identify and fit the optimal statistical model for every unique series in a dataset. The system also includes a time series anomaly detector to identify unusual data points by comparing observed values against probabilistic forecast intervals. The project is distinguished by its ability to handle massive-scale parallel forec