30 open-source projects similar to donnemartin/data-science-ipython-notebooks, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Data Science Ipython Notebooks alternative.
This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis. The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
This repository serves as a structured educational resource for machine learning and data science, providing a centralized collection of tutorials, lecture notes, and implementation guides. It is designed to support self-directed learning by organizing complex technical concepts into a clear, hierarchical path that spans from foundational statistical methods to advanced deep learning architectures. The project distinguishes itself through a comprehensive approach to skill development, bridging the gap between theoretical algorithmic foundations and functional software applications. It offers
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify discovery across the artificial intelligence ecosystem. The collection distinguishes itself by providing a cross-language development index that spans diverse programming environments, including C, C++, Rust, Clojure, and Python. It covers a wide range of specialized capabilities, fr
This project is a comprehensive collection of practical code examples and implementation libraries for machine learning. It provides a wide array of reference materials for building supervised, unsupervised, and reinforcement learning algorithms. The repository serves as a multi-domain resource, featuring specific implementation suites for financial AI, Bayesian statistical modeling, and deep learning architectures. It includes a framework for training intelligent agents using policy gradients and actor-critic models, as well as practical guides for fine-tuning transformers and utilizing larg
This project is a structured educational curriculum designed to guide developers through the fundamentals of machine learning. It functions as a technical skill builder, offering a curated roadmap of progressive coding challenges that cover core algorithms, statistical concepts, and essential data science libraries. The repository distinguishes itself through an iterative sequencing of content, organizing complex technical topics into a daily progression that facilitates incremental mastery. It integrates third-party academic lectures and educational resources to provide necessary theoretical
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
This project is a Python data science curriculum and programming tutorial collection. It provides a structured set of educational notebooks and scripts designed to teach data analysis, machine learning, and deep learning. The repository serves as a learning path for building and tuning predictive models, including regression, decision trees, and neural networks. It includes a data visualization guide for creating financial time-series plots and a multiprocessing reference for implementing parallel task execution and shared memory synchronization. The curriculum covers broader capability area
This project is an educational toolkit that provides implementations of fundamental machine learning algorithms built from scratch. By avoiding high-level library abstractions, it serves as a pedagogical reference for understanding the mathematical foundations and core mechanics of supervised learning, unsupervised learning, and reinforcement learning models. The repository distinguishes itself through a modular approach to model construction, allowing users to build custom neural networks by chaining independent functional blocks. It covers a wide range of techniques, including gradient-base
This is a machine learning educational repository consisting of a collection of notebooks and code examples. It provides practical implementations of diverse machine learning algorithms and workflows, ranging from traditional scientific computing to deep learning. The project features specific implementations of Scikit-Learn models, such as decision trees, random forests, and support vector machines, as well as TensorFlow examples for building neural networks, convolutional layers, and recurrent architectures. It also includes tutorials on reinforcement learning development and the creation o
This project provides a collection of practical machine learning code examples, including implementations for supervised, unsupervised, and reinforcement learning algorithms. It features deep learning model implementations for convolutional, recurrent, and generative architectures, alongside specific examples of reinforcement learning agents that maximize rewards in simulated environments. The repository includes dedicated data preprocessing pipelines for sanitization, feature scaling, and dimensionality reduction. It also provides implementations for a wide range of specific models, such as
Ai-Learn is an educational repository and technical reference designed to facilitate the mastery of artificial intelligence and data science workflows. It provides a structured curriculum that combines theoretical mathematical foundations with practical coding exercises, enabling users to build predictive models, neural networks, and analytical pipelines using Python. The project distinguishes itself by emphasizing a first-principles approach to machine learning. Rather than relying solely on high-level abstractions, it guides users through the reconstruction of core algorithms from scratch,
Grokking-Deep-Learning is a collection of educational resources and courseware designed to teach the construction of neural networks from scratch. It serves as a programming tutorial and implementation guide for understanding the internal mechanics of deep learning. The project focuses on building various network architectures, including convolutional, recurrent, and long short-term memory networks. It provides step-by-step implementations of fundamental mechanisms such as forward propagation, backpropagation, and gradient descent. The material covers a broad range of deep learning capabilit
Numba is a just-in-time compiler that translates high-level Python functions into optimized machine code at runtime. By leveraging the LLVM compiler infrastructure, it provides a framework for accelerating numerical data processing and mathematical computations, enabling performance levels comparable to statically compiled languages. The project distinguishes itself through its ability to perform type-inference-based specialization, which generates machine instructions tailored to the specific data types used during execution. It employs a lazy compilation pipeline that defers translation unt
This project is a collection of educational Jupyter Notebooks providing tutorials on neural network construction and tensor operations using the TensorFlow framework. It serves as a machine learning educational repository and implementation guide for deep learning students. The suite focuses on specific advanced architectures, including convolutional networks for image classification, residual networks with skip connections for training stability, and variational autoencoders for generative modeling and data synthesis. It also includes guides for building denoising and deep autoencoders to pe
This project is a curated collection of programming exercises designed to build proficiency in numerical computing and data manipulation. It provides a structured learning path for mastering multidimensional array operations, vectorized arithmetic, and statistical analysis. The repository focuses on developing practical expertise in array-based workflows, emphasizing techniques such as memory management, efficient data processing, and the replacement of explicit loops with vectorized operations. Users engage with hands-on challenges that cover the full lifecycle of numerical data, from initia
pyprobml is a collection of notebook-based implementations of probabilistic machine learning models and algorithms. It uses scientific computing and data analysis libraries to execute mathematical concepts and theories for practical application and research. The project focuses on the programmatic generation of scientific figures and visualizations to recreate results from a technical text. It employs a system of branch-based asset storage to isolate these generated images from the source code. The repository covers a wide range of probabilistic modeling and machine learning tasks, including
This is the companion code repository for the third edition of the book Python Machine Learning. It delivers the entire learning path as a structured collection of Jupyter notebooks that progress from classical machine learning algorithms to advanced deep learning models, with every concept demonstrated through executable code and narrative text. What distinguishes this resource is its pedagogical design. Each notebook cell encapsulates a single conceptual step, letting readers run, inspect, and modify discrete units of learning. The code provides interchangeable implementations of deep lea
This repository serves as a comprehensive educational resource for mastering machine learning and deep learning through a series of interactive Jupyter Notebooks. It provides a structured collection of tutorials and code examples designed to guide users through the fundamental and advanced techniques of the Python data science ecosystem. The project distinguishes itself by offering hands-on exercises that demonstrate the full lifecycle of machine learning projects. Users can explore end-to-end data pipelines, ranging from initial data loading and preprocessing to the training and deployment o
This repository is an educational collection of deep learning implementations designed to demonstrate the fundamental principles of neural network architecture and optimization. It provides a comprehensive resource for understanding machine learning through hands-on code examples, ranging from basic multilayer perceptrons to complex generative models. The project distinguishes itself by emphasizing the manual construction of models, including the implementation of backpropagation from scratch to illustrate core mathematical mechanics. It covers a wide array of architectural design patterns, s
VisiData is a terminal-based interactive data analysis tool and browser designed for exploring, filtering, and sorting large tabular datasets. It functions as a structured data inspector that loads and flattens complex formats like JSON, XML, and PCAP into interactive sheets, as well as a terminal file manager for navigating directories and performing staged filesystem operations. The project distinguishes itself by rendering data visualizations, such as scatter plots and histograms, directly in the terminal using Unicode Braille characters. It provides a Python-based data wrangling environme
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
This project is a comprehensive educational curriculum designed to teach Python programming through the lens of data science and financial analysis. It provides a structured guide for learning how to process complex numerical information, build data models, and perform scientific computing tasks using standard industry libraries. The materials focus on practical applications, enabling users to develop skills in financial data analysis and interactive exploration. By working through these resources, learners gain experience in executing high-performance mathematical operations, transforming ra
Smile is a comprehensive JVM machine learning library and statistical computing toolkit. It provides a suite of algorithms for classification, regression, and clustering, implemented natively for Java, Scala, and Kotlin. The project also functions as a deep learning framework, a natural language processing library, and an inference engine for large language models. The library distinguishes itself through GPU acceleration via LibTorch bindings and support for the ONNX model interchange format. It includes specialized capabilities for large language model inference, featuring Byte-Pair Encodin
This repository serves as an educational framework for building large language models from the ground up. It provides a structured curriculum that guides learners through the end-to-end lifecycle of model development, including data processing, architecture design, and optimization. By focusing on low-level implementation, the project enables users to master the fundamental mechanics of artificial intelligence without relying on high-level abstraction frameworks. The project distinguishes itself by constructing neural network components and gradient-based optimization logic from first princip
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
r4ds is a data science curriculum and educational resource designed for mastering the R programming language. It provides a structured learning path for the end-to-end process of importing, tidying, transforming, and visualizing data. The project emphasizes a reproducible data science guide and a comprehensive curriculum for data wrangling. It includes specialized tutorials on the grammar of graphics for layered data visualization and technical publications created with Quarto that blend executable code with narrative prose. The material covers a broad range of analytical capabilities, inclu
Hadoop is a big data infrastructure suite and distributed data processing framework designed to store and process massive datasets across clusters of computers. It consists of a distributed storage system for managing large files across multiple nodes and a parallel computing engine for processing data across a distributed cluster. The framework implements a distributed file system to ensure fault tolerance and high throughput, paired with a programming model that processes large datasets in parallel. It manages the underlying hardware and software environment required for distributed big dat
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
This repository is a collection of implementation references and solved notebooks covering supervised, unsupervised, and reinforcement learning techniques. It provides practical guides for building predictive models, clustering algorithms, and autonomous agents. The project includes specific implementations for neural network architectures, such as multi-layer perceptrons for digit recognition, and recommender systems using collaborative and content-based filtering. It also features reinforcement learning systems that utilize deep Q-learning to optimize decision-making policies. The codebase