30 open-source projects similar to linyiqun/dataminingalgorithm, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best DataMiningAlgorithm alternative.
This project is a machine learning implementation library featuring a collection of code examples that implement supervised, unsupervised, and reinforcement learning algorithms from scratch. It provides a comprehensive set of toolkits for core machine learning components, including a natural language processing toolkit, a reinforcement learning framework, and suites for data dimensionality reduction and pattern mining. The library includes specialized implementations for reinforcement learning, such as Q-Learning, Deep Q-Networks, and Actor-Critic agents. The natural language processing capab
This project provides a collection of practical machine learning code examples, including implementations for supervised, unsupervised, and reinforcement learning algorithms. It features deep learning model implementations for convolutional, recurrent, and generative architectures, alongside specific examples of reinforcement learning agents that maximize rewards in simulated environments. The repository includes dedicated data preprocessing pipelines for sanitization, feature scaling, and dimensionality reduction. It also provides implementations for a wide range of specific models, such as
This project is an educational toolkit that provides implementations of fundamental machine learning algorithms built from scratch. By avoiding high-level library abstractions, it serves as a pedagogical reference for understanding the mathematical foundations and core mechanics of supervised learning, unsupervised learning, and reinforcement learning models. The repository distinguishes itself through a modular approach to model construction, allowing users to build custom neural networks by chaining independent functional blocks. It covers a wide range of techniques, including gradient-base
This project is a collection of supervised and unsupervised machine learning algorithms implemented from scratch using Python. It serves as an educational resource for studying model training, parameter optimization, and the implementation of core predictive models. The library provides a variety of supervised learning tools, including linear and logistic regression, decision trees, and support vector machines. It also features unsupervised learning capabilities for discovering patterns in unlabeled datasets through clustering algorithms. Broad capability areas include ensemble learning thro
This project is an educational resource providing practical code examples and implementations of machine learning algorithms using the Python language. It serves as a guide for constructing predictive pipelines, clustering models, and dimensionality reduction within the Scikit-Learn ecosystem. The repository includes comprehensive demonstrations for supervised and unsupervised learning, as well as detailed examples for implementing neural networks and deep architectures. It also provides practical guidance on exporting model parameters to JSON and wrapping trained models in web APIs for produ
This project serves as an educational and practical resource for mastering machine learning workflows using Python. It provides a comprehensive collection of code examples and exercises designed to guide users through the implementation of predictive systems, ranging from fundamental algorithms to deep learning architectures. The repository distinguishes itself by offering a structured approach to both classical machine learning and neural network training. It covers the full lifecycle of model development, including the orchestration of reusable data transformation pipelines, advanced ensemb
This project is a reference collection of statistical learning algorithms built from scratch using NumPy for linear algebra and matrix operations. It serves as an educational resource for studying the mathematical foundations and inner workings of machine learning models through manual implementations. The codebase provides hand-coded implementations of both supervised and unsupervised learning. This includes classification and regression models such as support vector machines, decision trees, and Naive Bayes, as well as data clustering and pattern discovery methods like k-means and hierarchi
This project is a collection of foundational machine learning algorithms and tools implemented from scratch in Python. It serves as a library of core implementations for regression, classification, and clustering models, designed to demonstrate the underlying mathematical structures of these algorithms without relying on high-level machine learning frameworks. The project focuses on the manual implementation of algorithmic logic, including neural networks with forward propagation and weight updates, as well as various supervised and unsupervised learning models. It utilizes NumPy for vectoriz
This project is a collection of foundational machine learning algorithms and data science tools implemented in Python. It focuses on building the logic of these tools using basic programming primitives rather than relying on specialized libraries. The implementation covers several core domains, including a linear algebra library for matrix and vector operations, a statistical analysis toolkit for probability and hypothesis testing, and a framework for map-reduce distributed processing. It also includes implementations for natural language processing, graph theory for network analysis, and var
This project is a machine learning educational resource and implementation guide for Python. It provides a collection of executable code and notebooks that demonstrate predictive modeling, data analysis workflows, and the implementation of various machine learning algorithms. The repository features practical examples of classification, regression, and clustering tasks using Scikit-Learn, alongside tutorials for building and training deep learning architectures with TensorFlow. These include implementations of convolutional and recurrent networks. The content covers a broad range of capabili
Orange3 is a visual data mining platform that provides an interactive canvas for building data analysis workflows without writing code. At its core, it offers a widget-based visual programming environment where users connect configurable components to perform data preprocessing, machine learning model training, statistical evaluation, and interactive visualization. The platform is built on NumPy-backed data tables with domain descriptors that define variable names, types, and roles, and includes a lazy SQL query proxy for working with database tables without loading all data into memory. The
This project is a Python data science curriculum and programming tutorial collection. It provides a structured set of educational notebooks and scripts designed to teach data analysis, machine learning, and deep learning. The repository serves as a learning path for building and tuning predictive models, including regression, decision trees, and neural networks. It includes a data visualization guide for creating financial time-series plots and a multiprocessing reference for implementing parallel task execution and shared memory synchronization. The curriculum covers broader capability area
This is a machine learning educational repository consisting of a collection of notebooks and code examples. It provides practical implementations of diverse machine learning algorithms and workflows, ranging from traditional scientific computing to deep learning. The project features specific implementations of Scikit-Learn models, such as decision trees, random forests, and support vector machines, as well as TensorFlow examples for building neural networks, convolutional layers, and recurrent architectures. It also includes tutorials on reinforcement learning development and the creation o
Smile is a comprehensive JVM machine learning library and statistical computing toolkit. It provides a suite of algorithms for classification, regression, and clustering, implemented natively for Java, Scala, and Kotlin. The project also functions as a deep learning framework, a natural language processing library, and an inference engine for large language models. The library distinguishes itself through GPU acceleration via LibTorch bindings and support for the ONNX model interchange format. It includes specialized capabilities for large language model inference, featuring Byte-Pair Encodin
This project is an interactive data science environment that combines code execution, rich media visualization, and narrative documentation into a persistent, browser-based platform. It serves as a comprehensive educational resource for scientific computing, providing a framework for iterative data analysis and machine learning prototyping. The environment is distinguished by its focus on high-performance numerical computing, utilizing vectorized array operations and memory-mapped data structures to handle large-scale computations efficiently. It features a unified estimator interface that st
This project is a comprehensive collection of practical code examples and implementation libraries for machine learning. It provides a wide array of reference materials for building supervised, unsupervised, and reinforcement learning algorithms. The repository serves as a multi-domain resource, featuring specific implementation suites for financial AI, Bayesian statistical modeling, and deep learning architectures. It includes a framework for training intelligent agents using policy gradients and actor-critic models, as well as practical guides for fine-tuning transformers and utilizing larg
This project is a machine learning algorithm reference and implementation guide that provides theoretical foundations and code for supervised learning, deep learning, and natural language processing. It serves as a comprehensive toolkit for implementing predictive models and a technical reference for algorithm engineering. The project focuses on ensemble learning frameworks, including the construction of decision trees, random forests, and gradient boosting models. It also functions as a probabilistic graphical model library and an NLP algorithm reference, with specific implementations for se
This repository is a collection of practical machine learning implementations designed to demonstrate core predictive analytics, computer vision, and natural language processing techniques. It serves as a resource for applying standard machine learning frameworks to solve diverse data science problems, ranging from automated classification to complex pattern recognition. The project distinguishes itself by providing concrete examples across multiple domains, including the development of conversational interfaces, the analysis of geospatial data, and the implementation of deep learning archite
This repository is a collection of implementation references and solved notebooks covering supervised, unsupervised, and reinforcement learning techniques. It provides practical guides for building predictive models, clustering algorithms, and autonomous agents. The project includes specific implementations for neural network architectures, such as multi-layer perceptrons for digit recognition, and recommender systems using collaborative and content-based filtering. It also features reinforcement learning systems that utilize deep Q-learning to optimize decision-making policies. The codebase
mlxtend is a pure Python machine learning extension library that provides additional tools for association rule mining, ensemble learning, and feature selection. It is built on numpy and pandas, with all data operations accepting and returning pandas DataFrames, and custom estimators inherit from scikit-learn’s base classes to offer a uniform fit-predict interface compatible with grid search. The library implements the Apriori algorithm for mining frequent itemsets from transaction data and generating association rules with confidence and lift metrics. For classification, it combines multiple
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
This project provides a collection of machine learning algorithms implemented from scratch in Python. It serves as an educational resource using interactive notebooks that combine code with mathematical explanations to demonstrate the first principles of data science. The repository includes reference implementations for neural networks, such as multilayer perceptrons with backpropagation, and supervised learning models including linear and logistic regression. It also covers unsupervised learning through k-means clustering and Gaussian anomaly detection. The codebase covers a broad range of
Ai-Learn is an educational repository and technical reference designed to facilitate the mastery of artificial intelligence and data science workflows. It provides a structured curriculum that combines theoretical mathematical foundations with practical coding exercises, enabling users to build predictive models, neural networks, and analytical pipelines using Python. The project distinguishes itself by emphasizing a first-principles approach to machine learning. Rather than relying solely on high-level abstractions, it guides users through the reconstruction of core algorithms from scratch,
BERTopic is a topic modeling library used to extract interpretable themes from collections of text documents and images. It functions as a document clustering framework that transforms unstructured data into numerical vectors to group semantically similar content. The project distinguishes itself through a multimodal embedding tool that allows for joint clustering of text and images in a shared vector space. It also features a class-based TF-IDF representation engine to identify representative words for clusters and an integrated system for using large language models to generate natural lang
This project is a machine learning library providing a collection of implementations for supervised and unsupervised learning algorithms. It serves as a deep learning framework, a statistical classifier collection, and a suite of tools for unsupervised learning and dimensionality reduction. The library enables the construction of neural networks, including multi-layer perceptrons and convolutional networks for pattern recognition. It also provides tools for performing principal component analysis and manifold learning to visualize high-dimensional datasets, alongside a suite of clustering alg
This project provides a translated version of the scikit-learn machine learning library guides and API references for Chinese speakers. It serves as a localized knowledge base and technical reference for implementing predictive data analysis and statistical modeling using a Python-based toolkit. The resource covers the implementation of supervised learning, including classification and regression tasks, and unsupervised learning workflows for pattern discovery and anomaly detection. It also provides guidance on data science education, specifically focusing on the use of scikit-learn for machi
This is a comprehensive educational curriculum designed to teach machine learning fundamentals using the Python programming language. It provides a structured course covering the implementation and theory of supervised learning, unsupervised learning, and deep learning. The curriculum is delivered through interactive notebooks that combine executable code with technical tutorials. It includes dedicated guides for building neural network architectures, implementing classification and regression models, and utilizing clustering techniques for pattern discovery in unlabeled data. The materials
This project is a collection of reference implementations for algorithms, mathematics, cryptography, compression, and machine learning written in C#. It serves as an educational library providing standard implementations of sorting, searching, and graph theory algorithms. The repository covers a wide range of computational domains, including combinatorial optimization for constraint satisfaction and scheduling, as well as symmetric and classical cryptographic ciphers. It also provides reference code for lossless data compression techniques and fundamental machine learning primitives such as r
This project is a comprehensive educational curriculum for learning data science and predictive modeling using the Python programming language. It provides structured instructional material and guides covering supervised learning, unsupervised learning, and neural network design. The curriculum focuses on building, training, and evaluating machine learning models. It includes specific guides for implementing linear regression, decision trees, and support vector machines for predictive analysis, as well as tutorials on designing convolutional and recurrent neural network architectures. The co
Linfa is a classical machine learning framework and statistical learning suite implemented in Rust. It provides a collection of algorithms for supervised and unsupervised learning, focused on traditional statistical methods such as regression, clustering, and decision trees. The toolkit is distinguished by its ability to be compiled into WebAssembly, enabling analytical models to execute within browser environments. It employs a trait-based algorithm interface to standardize the process of training and prediction across its various models. The library covers a broad range of capabilities, in