# harvard-edge/cs249r_book

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/harvard-edge-cs249r-book).**

20,217 stars · 2,330 forks · JavaScript · other

## Links

- GitHub: https://github.com/harvard-edge/cs249r_book
- Homepage: http://mlsysbook.ai/
- awesome-repositories: https://awesome-repositories.com/repository/harvard-edge-cs249r-book.md

## Topics

`artificial-intelligence` `cloud-ml` `computer-systems` `courseware` `deep-learning` `edge-machine-learning` `embedded-ml` `machine-learning` `machine-learning-systems` `mobile-ml` `textbook` `tinyml`

## Description

This project is a comprehensive educational framework designed to teach the design, deployment, and performance optimization of machine learning systems. It provides a structured curriculum that covers the full stack of artificial intelligence engineering, ranging from the construction of core framework components like tensors and automatic differentiation engines to the orchestration of large-scale distributed training clusters.

The platform distinguishes itself through its integration of physics-grounded systems modeling and interactive simulation environments. Users can experiment with distributed training strategies, analyze communication overhead, and perform economic modeling to estimate the total cost of ownership, energy consumption, and reliability of hardware clusters. By combining these analytical tools with hands-on embedded hardware kits and browser-based notebooks, the project enables students to bridge the gap between theoretical architecture and practical deployment on resource-constrained edge devices.

Beyond core training, the project offers a broad suite of capabilities for evaluating machine learning operations. This includes tools for assessing inference latency, quantifying environmental impact, and optimizing production workloads across diverse environments. The curriculum is supported by extensive pedagogical resources, including lecture materials, assessment banks, and interview preparation scenarios that focus on hardware selection and parallel scaling strategies.

The project is maintained as an open-source repository, providing version-controlled educational content and modular software components that allow for collaborative development and adaptation by the academic community.

## Tags

### Artificial Intelligence & ML

- [Machine Learning Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning-systems.md) — Provides architectural guidance for end-to-end workflows covering data pipelines, training infrastructure, and deployment strategies for scalable artificial intelligence applications. ([source](http://mlsysbook.ai/vol1/))
- [Large Scale Training](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-training.md) — Models communication overhead and scaling efficiency for parallel processing across large-scale compute clusters.
- [Machine Learning Operations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning-operations.md) — Includes tools for calculating total cost of ownership and energy consumption for machine learning production workloads.
- [Edge AI Model Deployment](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment.md) — Executes machine learning models on resource-constrained devices while respecting strict memory and power limitations. ([source](http://mlsysbook.ai/kits/))
- [Distributed Training Scaling Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-scaling-utilities.md) — Provides tools for modeling scaling efficiency and communication overhead in distributed machine learning clusters.
- [Inference Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-optimization.md) — Manages the operational lifecycle of models by optimizing performance across the serving stack and edge environments. ([source](http://mlsysbook.ai/vol2/))
- [AI Security and Governance](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-security-and-governance.md) — Implements practices for security, robustness, and environmental sustainability in machine learning operations. ([source](http://mlsysbook.ai/vol2/))
- [Framework Construction Guides](https://awesome-repositories.com/f/artificial-intelligence-ml/automatic-differentiation-frameworks/framework-construction-guides.md) — Guides users through building core machine learning components from scratch to demystify internal mechanics.
- [Autoregressive Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/autoregressive-inference-engines.md) — Simulates pre-fill and decode phases of autoregressive inference to estimate latency and memory pressure. ([source](http://mlsysbook.ai/vol1/assets/downloads/Machine-Learning-Systems-Vol1.epub))
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Calculates scaling efficiency and communication overhead for parallel training strategies in distributed systems. ([source](http://mlsysbook.ai/vol1/assets/downloads/Machine-Learning-Systems-Vol1.epub))
- [Environmental Impact Assessment](https://awesome-repositories.com/f/artificial-intelligence-ml/environmental-impact-assessment.md) — Quantifies energy, carbon footprint, and water usage for machine learning workloads based on regional grid intensity. ([source](http://mlsysbook.ai/mlsysim/))
- [Inference Latency Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-latency-optimizers.md) — Models time-to-first-token and inter-token latency for large language model serving. ([source](http://mlsysbook.ai/mlsysim/))
- [Large-Scale Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-model-training.md) — Simulates large-scale training and inference workflows to analyze scaling efficiency and hardware performance.
- [Audio Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-processing.md) — Implements on-device keyword spotting and voice command recognition for edge applications. ([source](http://mlsysbook.ai/kits/))
- [Large Language Models](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-models.md) — Enables the deployment of text and vision-language models on resource-constrained edge hardware. ([source](http://mlsysbook.ai/kits/))
- [Computer Vision](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/computer-vision.md) — Provides tools for deploying real-time image classification and object detection models on embedded hardware. ([source](http://mlsysbook.ai/kits/))

### Education & Learning Resources

- [Machine Learning Education](https://awesome-repositories.com/f/education-learning-resources/educational-resources/systems-applied-computing/machine-learning-education.md) — Offers a comprehensive curriculum with lecture materials and interactive notebooks for teaching machine learning systems.
- [Deep Learning Frameworks](https://awesome-repositories.com/f/education-learning-resources/educational-resources/systems-applied-computing/machine-learning-education/deep-learning-frameworks.md) — Provides educational frameworks for building core machine learning components like tensors and autograd engines from scratch.
- [Interactive Learning Tools](https://awesome-repositories.com/f/education-learning-resources/interactive-learning-tools.md) — Provides interactive notebooks that allow users to modify system parameters and observe behavior to develop intuition for machine learning principles. ([source](http://mlsysbook.ai/index.html))
- [Interactive Notebook Environments](https://awesome-repositories.com/f/education-learning-resources/interactive-notebook-environments.md) — Uses interactive notebook environments to simulate distributed training and hardware constraints.
- [System Design Interview Preparation](https://awesome-repositories.com/f/education-learning-resources/system-design-interview-preparation.md) — Offers physics-grounded interview questions and mock scenarios for practicing hardware selection and parallelism strategies. ([source](http://mlsysbook.ai/))
- [Machine Learning Roadmaps](https://awesome-repositories.com/f/education-learning-resources/curricula-instructional-design/curricula-roadmaps/ai-machine-learning-roadmaps/foundational-ml-data-science/machine-learning-roadmaps.md) — Implements and optimizes machine learning models on resource-constrained hardware while managing strict operational limitations.
- [Curriculum Guides](https://awesome-repositories.com/f/education-learning-resources/curriculum-guides.md) — Supplies comprehensive syllabi, pedagogical guides, and assessment rubrics for machine learning systems courses. ([source](http://mlsysbook.ai/index.html))
- [Curriculum Development](https://awesome-repositories.com/f/education-learning-resources/open-source-guides/curriculum-development.md) — Creates and distributes open-source educational materials to standardize the discipline of artificial intelligence engineering. ([source](http://mlsysbook.ai/about/people.html))
- [Open Source Textbooks](https://awesome-repositories.com/f/education-learning-resources/open-source-textbooks.md) — Updates open-source textbooks and software modules to allow collaborative contributions from the academic community. ([source](http://mlsysbook.ai/newsletter/))
- [Machine Learning Lectures](https://awesome-repositories.com/f/education-learning-resources/machine-learning-lectures.md) — Supplies structured slide decks and technical diagrams to support classroom instruction. ([source](http://mlsysbook.ai/index.html))

### Networking & Communication

- [Distributed Coordination Primitives](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-systems-coordination/distributed-coordination-primitives.md) — Orchestrates parallel processing across devices using collective communication primitives, fault tolerance mechanisms, and fleet management strategies. ([source](http://mlsysbook.ai/vol2/))
- [Parallel Scaling Analysis](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/model-parallelism-techniques/pipeline-parallelism-strategies/parallel-scaling-analysis.md) — Estimates efficiency and overhead for data, tensor, and pipeline parallelism across distributed hardware clusters. ([source](http://mlsysbook.ai/mlsysim/))

### Software Engineering & Architecture

- [Distributed Infrastructure Patterns](https://awesome-repositories.com/f/software-engineering-architecture/software-architecture/architectural-patterns/backend-enterprise-systems/infrastructure-design-patterns/distributed-infrastructure-patterns.md) — Provides design patterns for physical computer systems, network fabrics, and scalable data storage foundations required for distributed machine learning. ([source](http://mlsysbook.ai/vol2/))
- [System Performance Optimization](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/performance-optimization/data-handling-throughput/system-performance-optimization.md) — Analyzes memory usage and compute efficiency to profile and accelerate models for production performance. ([source](http://mlsysbook.ai/vol1/))
- [Systems Modeling](https://awesome-repositories.com/f/software-engineering-architecture/system-reliability-principles/systems-modeling.md) — Uses mathematical models and first-principles calculations to estimate performance, cost, and reliability of hardware architectures.
- [System Reliability](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/system-reliability.md) — Provides methodologies for calculating fleet mean time between failures and managing checkpoint intervals for fault-tolerant machine learning systems. ([source](http://mlsysbook.ai/vol1/assets/downloads/Machine-Learning-Systems-Vol1.epub))
- [Curriculum Distributions](https://awesome-repositories.com/f/software-engineering-architecture/development-methodologies/engineering-best-practices/open-source-collaboration/open-source-methodologies/curriculum-distributions.md) — Maintains version-controlled educational materials to enable collaborative improvement by the academic community.
- [Hardware Abstraction Layers](https://awesome-repositories.com/f/software-engineering-architecture/hardware-abstraction-layers.md) — Provides consistent interfaces for interacting with diverse embedded hardware and edge devices.

### Scientific & Mathematical Computing

- [Physics Simulations](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/scientific-computing-platforms/physics-simulations.md) — Provides physics-based simulation environments for manipulating parameters to observe the impact of design choices on system performance. ([source](http://mlsysbook.ai/labs/))
- [Operational Cost Estimators](https://awesome-repositories.com/f/scientific-mathematical-computing/research-analysis-workflows/economic-analysis-tools/operational-cost-estimators.md) — Projects total cost of ownership including capital expenditure, electricity, and per-query costs. ([source](http://mlsysbook.ai/mlsysim/))

### Testing & Quality Assurance

- [Performance Analysis](https://awesome-repositories.com/f/testing-quality-assurance/performance-testing-analysis/performance-analysis.md) — Predicts latency, throughput, and hardware bottlenecks for machine learning workloads across diverse environments. ([source](http://mlsysbook.ai/vol1/assets/downloads/Machine-Learning-Systems-Vol1.epub))

### Hardware & IoT

- [Educational Hardware Kits](https://awesome-repositories.com/f/hardware-iot/educational-hardware-kits.md) — Provides physical hardware kits to university educators to support hands-on instruction in embedded machine learning. ([source](http://mlsysbook.ai/newsletter/))

### DevOps & Infrastructure

- [Resource Cost Management](https://awesome-repositories.com/f/devops-infrastructure/resource-cost-management.md) — Calculates capital expenditures, energy consumption, and maintenance costs for hardware clusters to inform budget planning. ([source](http://mlsysbook.ai/vol1/assets/downloads/Machine-Learning-Systems-Vol1.epub))

### Security & Cryptography

- [Foundational Architecture Implementations](https://awesome-repositories.com/f/security-cryptography/cryptography/historical-methods/foundational-architecture-implementations.md) — Facilitates recreating foundational breakthroughs ranging from early perceptrons to modern transformer models to master the evolution of artificial intelligence. ([source](http://mlsysbook.ai/tinytorch/))
