ML From Scratch

This project is an educational toolkit that provides implementations of fundamental machine learning algorithms built from scratch. By avoiding high-level library abstractions, it serves as a pedagogical reference for understanding the mathematical foundations and core mechanics of supervised learning, unsupervised learning, and reinforcement learning models.

The repository distinguishes itself through a modular approach to model construction, allowing users to build custom neural networks by chaining independent functional blocks. It covers a wide range of techniques, including gradient-based weight optimization, backpropagation through time for sequential data, and ensemble-based aggregation methods like boosting and bagging. These implementations rely on vectorized computation to perform linear algebra operations, providing a transparent view into how models learn from data.

The collection encompasses a broad capability surface, ranging from classic statistical methods and decision trees to complex deep learning architectures and clustering algorithms. It includes resources for training agents in dynamic environments, performing dimensionality reduction, and discovering patterns in unlabeled datasets. The project is structured as a comprehensive reference, with documentation and installation instructions provided to help users configure their local environments for experimentation.

Features

Machine Learning Toolkits - Provides a collection of fundamental algorithms implemented from scratch to demonstrate core learning mechanics.
Supervised Learning - Trains models on labeled datasets to predict outcomes or classify observations.
Clustering Algorithms - Partition datasets into clusters by iteratively assigning points to the nearest center and updating positions to minimize variance.
Deep Learning Architectures - Builds custom deep learning models by stacking layers and activation functions.
Multilayer Perceptrons - Solve complex classification problems by defining hidden layers and backpropagation logic to learn non-linear mappings from input data.
Random Forest Ensembles - Improve predictive accuracy and reduce variance by combining the outputs of multiple decision trees trained on random data subsets.
Recurrent Neural Networks - Implements backpropagation through time to train sequential models by unrolling recurrent structures.
Reinforcement Learning - Develops agents that learn to make sequences of decisions to maximize cumulative rewards.
Unsupervised Learning - Discovers hidden structures and patterns in unlabeled data using clustering and dimensionality reduction.
Educational Examples - Provides a collection of machine learning models and algorithms built from scratch for educational purposes.
Machine Learning Tutorials - Provides fundamental machine learning algorithm implementations built from scratch for educational purposes.
Ensemble Learning - Combines multiple weak learners into robust models using boosting and bagging strategies.
Expectation-Maximization Models - Group data points by iteratively calculating membership probabilities to optimize the fit of statistical distributions to the data.
Gradient Boosting - Minimize prediction errors by combining multiple decision trees into a single ensemble model through iterative gradient descent.
Logistic Regression Models - Predict the likelihood of binary outcomes by applying the sigmoid function and optimizing weights through gradient descent.
Neural Network Architectures - Constructs custom neural networks by defining specific layers, activation functions, and loss functions.
Neural Network Frameworks - Constructs complex neural network architectures by chaining independent functional blocks.
Clustering Suites - Provides a collection of density and centroid-based algorithms for grouping unlabeled data.
Computer Vision - Recognizes visual patterns in images by training convolutional neural networks to extract features.
Decision Trees - Builds decision-based models by repeatedly splitting datasets into subsets based on feature thresholds.
Deep Learning Frameworks - Constructs custom neural network architectures to demonstrate backpropagation and gradient descent mechanics.
Generative Adversarial Networks - Creates synthetic images of handwritten digits by training generative adversarial networks.
Naive Bayes Classifiers - Classify data points by calculating the probability of class membership based on the statistical distribution of input features.
Optimization Algorithms - Updates model parameters iteratively by calculating partial derivatives of the loss function.
Sequential Learning - Processes time-series data by using recurrent neural networks to capture temporal dependencies.
General Machine Learning - Educational implementations of ML models.
Machine Learning - Implementations of machine learning algorithms from the ground up.
Autoencoders - Reduces data complexity by training encoder and decoder networks to reconstruct original inputs.
Density-Based Clustering - Identifies clusters of arbitrary shapes by analyzing local data density.
Ensemble Learning Libraries - Ships a suite of boosting and bagging implementations to improve predictive performance.
Evolutionary Algorithms - Optimizes neural network architectures and weights using evolutionary strategies.
Perceptron Classifiers - Create simple linear models by iteratively adjusting weights based on prediction errors to separate labeled data points.
Algorithm References - Provides clean, readable code examples for standard statistical and neural network architectures.
Association Rule Learning - Discovers frequent patterns and relationships within transactional datasets using the Apriori algorithm.
Boltzmann Machines - Represents complex data structures by training Boltzmann machines to learn underlying feature patterns.
Dimensionality Reduction - Identify linear combinations of features that maximize class separation to simplify datasets or improve classification performance.
Gaussian Mixture Models - Implements Gaussian mixture models to represent complex data distributions using expectation-maximization.
Genetic Algorithms - Solves optimization problems by simulating natural selection processes to evolve candidate solutions.
K-Medoids Clustering - Groups data points by minimizing total dissimilarity to representative medoids.
Numerical Computing Libraries - Performs mathematical operations on multidimensional arrays to accelerate linear algebra calculations.

Star history

eriklindernorenML-From-Scratch

Name: eriklindernoren/ml-from-scratch
Author: eriklindernoren

View on GitHub

31,918 stars5,345 forksPythonMIT14 views

ML From Scratch

Features

Machine Learning Toolkits - Provides a collection of fundamental algorithms implemented from scratch to demonstrate core learning mechanics.
Supervised Learning - Trains models on labeled datasets to predict outcomes or classify observations.
Clustering Algorithms - Partition datasets into clusters by iteratively assigning points to the nearest center and updating positions to minimize variance.
Deep Learning Architectures - Builds custom deep learning models by stacking layers and activation functions.
Multilayer Perceptrons - Solve complex classification problems by defining hidden layers and backpropagation logic to learn non-linear mappings from input data.
Random Forest Ensembles - Improve predictive accuracy and reduce variance by combining the outputs of multiple decision trees trained on random data subsets.
Recurrent Neural Networks - Implements backpropagation through time to train sequential models by unrolling recurrent structures.
Reinforcement Learning - Develops agents that learn to make sequences of decisions to maximize cumulative rewards.
Unsupervised Learning - Discovers hidden structures and patterns in unlabeled data using clustering and dimensionality reduction.
Educational Examples - Provides a collection of machine learning models and algorithms built from scratch for educational purposes.
Machine Learning Tutorials - Provides fundamental machine learning algorithm implementations built from scratch for educational purposes.
Ensemble Learning - Combines multiple weak learners into robust models using boosting and bagging strategies.
Expectation-Maximization Models - Group data points by iteratively calculating membership probabilities to optimize the fit of statistical distributions to the data.
Gradient Boosting - Minimize prediction errors by combining multiple decision trees into a single ensemble model through iterative gradient descent.
Logistic Regression Models - Predict the likelihood of binary outcomes by applying the sigmoid function and optimizing weights through gradient descent.
Neural Network Architectures - Constructs custom neural networks by defining specific layers, activation functions, and loss functions.
Neural Network Frameworks - Constructs complex neural network architectures by chaining independent functional blocks.
Clustering Suites - Provides a collection of density and centroid-based algorithms for grouping unlabeled data.
Computer Vision - Recognizes visual patterns in images by training convolutional neural networks to extract features.
Decision Trees - Builds decision-based models by repeatedly splitting datasets into subsets based on feature thresholds.
Deep Learning Frameworks - Constructs custom neural network architectures to demonstrate backpropagation and gradient descent mechanics.
Generative Adversarial Networks - Creates synthetic images of handwritten digits by training generative adversarial networks.
Naive Bayes Classifiers - Classify data points by calculating the probability of class membership based on the statistical distribution of input features.
Optimization Algorithms - Updates model parameters iteratively by calculating partial derivatives of the loss function.
Sequential Learning - Processes time-series data by using recurrent neural networks to capture temporal dependencies.
General Machine Learning - Educational implementations of ML models.
Machine Learning - Implementations of machine learning algorithms from the ground up.
Autoencoders - Reduces data complexity by training encoder and decoder networks to reconstruct original inputs.
Density-Based Clustering - Identifies clusters of arbitrary shapes by analyzing local data density.
Ensemble Learning Libraries - Ships a suite of boosting and bagging implementations to improve predictive performance.
Evolutionary Algorithms - Optimizes neural network architectures and weights using evolutionary strategies.
Perceptron Classifiers - Create simple linear models by iteratively adjusting weights based on prediction errors to separate labeled data points.
Algorithm References - Provides clean, readable code examples for standard statistical and neural network architectures.
Association Rule Learning - Discovers frequent patterns and relationships within transactional datasets using the Apriori algorithm.
Boltzmann Machines - Represents complex data structures by training Boltzmann machines to learn underlying feature patterns.
Dimensionality Reduction - Identify linear combinations of features that maximize class separation to simplify datasets or improve classification performance.
Gaussian Mixture Models - Implements Gaussian mixture models to represent complex data distributions using expectation-maximization.
Genetic Algorithms - Solves optimization problems by simulating natural selection processes to evolve candidate solutions.
K-Medoids Clustering - Groups data points by minimizing total dissimilarity to representative medoids.
Numerical Computing Libraries - Performs mathematical operations on multidimensional arrays to accelerate linear algebra calculations.

Open-source alternatives to ML From Scratch

Similar open-source projects, ranked by how many features they share with ML From Scratch.

ageron/handson-ml2
ageron/handson-ml2
29,938View on GitHub
This project provides a collection of practical machine learning code examples, including implementations for supervised, unsupervised, and reinforcement learning algorithms. It features deep learning model implementations for convolutional, recurrent, and generative architectures, alongside specific examples of reinforcement learning agents that maximize rewards in simulated environments. The repository includes dedicated data preprocessing pipelines for sanitization, feature scaling, and dimensionality reduction. It also provides implementations for a wide range of specific models, such as
Jupyter Notebook
View on GitHub29,938
rasbt/machine-learning-book
rasbt/machine-learning-book
5,239View on GitHub
This project is a comprehensive machine learning educational resource and tutorial series delivered as a collection of interactive Jupyter Notebooks. It provides practical Python implementations for the end-to-end machine learning lifecycle, covering supervised and unsupervised learning, deep learning, and reinforcement learning. The resource distinguishes itself by providing detailed implementation guides for complex architectures, including transformers, generative adversarial networks, and convolutional neural networks. It also features specialized courseware for developing reinforcement l
Jupyter Notebook
View on GitHub5,239
rasbt/python-machine-learning-book
rasbt/python-machine-learning-book
12,614View on GitHub
This project is an educational resource providing practical code examples and implementations of machine learning algorithms using the Python language. It serves as a guide for constructing predictive pipelines, clustering models, and dimensionality reduction within the Scikit-Learn ecosystem. The repository includes comprehensive demonstrations for supervised and unsupervised learning, as well as detailed examples for implementing neural networks and deep architectures. It also provides practical guidance on exporting model parameters to JSON and wrapping trained models in web APIs for produ
Jupyter Notebook
View on GitHub12,614

Frequently asked questions

What does eriklindernoren/ml-from-scratch do?

What are the main features of eriklindernoren/ml-from-scratch?

The main features of eriklindernoren/ml-from-scratch are: Machine Learning Toolkits, Supervised Learning, Clustering Algorithms, Deep Learning Architectures, Multilayer Perceptrons, Random Forest Ensembles, Recurrent Neural Networks, Reinforcement Learning.

What are some open-source alternatives to eriklindernoren/ml-from-scratch?

Open-source alternatives to eriklindernoren/ml-from-scratch include: ageron/handson-ml2 — This project provides a collection of practical machine learning code examples, including implementations for… rasbt/machine-learning-book — This project is a comprehensive machine learning educational resource and tutorial series delivered as a collection of… rasbt/python-machine-learning-book — This project is an educational resource providing practical code examples and implementations of machine learning… ljpzzz/machinelearning — This project is a machine learning implementation library featuring a collection of code examples that implement… d2l-ai/d2l-en — This project is an educational platform and research toolkit designed to teach deep learning through a combination of… wepe/machinelearning — This project is a machine learning library providing a collection of implementations for supervised and unsupervised…