# chiphuyen/dmls-book

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/chiphuyen-dmls-book).**

4,395 stars · 867 forks

## Links

- GitHub: https://github.com/chiphuyen/dmls-book
- Homepage: https://www.amazon.com/Designing-Machine-Learning-Systems-Production-Ready/dp/1098107969
- awesome-repositories: https://awesome-repositories.com/repository/chiphuyen-dmls-book.md

## Description

This is a reference guide for designing, deploying, and maintaining production-ready machine learning systems, grounded in MLOps best practices. It covers the complete machine learning lifecycle, from system design and workflow planning through to deployment and ongoing maintenance, with a focus on reliability, scalability, and maintainability as business requirements evolve.

The guide provides an architecture reference for establishing shared ML infrastructure, including model registries and feature stores that standardize asset reuse across teams. It details pipeline automation through configurable directed acyclic graphs with automated triggers and retry logic, and describes a production monitoring framework for detecting performance degradation, data drift, and algorithmic bias in real time. Responsible AI implementation is addressed through built-in fairness checks and bias detection mechanisms that validate model outputs against ethical guidelines.

The material is organized around key architectural patterns such as DAG-based pipeline orchestration, infrastructure-as-code provisioning, and a pipeline-defined ML lifecycle with clear handoff points from data collection to production monitoring. It serves as a practical manual for planning end-to-end ML workflows and designing systems that stay reliable and maintainable over time.

## Tags

### Artificial Intelligence & ML

- [Production Machine Learning Guides](https://awesome-repositories.com/f/artificial-intelligence-ml/production-machine-learning-guides.md) — Serves as a comprehensive architecture reference for designing, deploying, and maintaining production-ready ML systems.
- [ML Lifecycle Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/end-to-end-pipelines/ml-lifecycle-managers.md) — Plans and manages the complete machine learning lifecycle from system design to deployment and maintenance.
- [ML Workflow Planners](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/end-to-end-pipelines/ml-workflow-planners.md) — Maps the key decisions and components needed to develop, deploy, and update machine learning models in production. ([source](https://cdn.jsdelivr.net/gh/chiphuyen/dmls-book@main/README.md))
- [Pipeline Automation Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/end-to-end-training-pipelines/pipeline-automation-layers.md) — Orchestrates end-to-end ML workflows through configurable DAGs with automated triggers and retry logic.
- [Feature Stores](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-stores.md) — Covers feature store abstraction for standardizing feature computation, storage, and serving across ML teams.
- [Machine Learning Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning-systems.md) — Designs reliable, scalable, and maintainable machine learning systems for production deployment and changing business requirements.
- [MLOps Best Practices](https://awesome-repositories.com/f/artificial-intelligence-ml/mlops-best-practices.md) — Documents proven MLOps strategies for automation, monitoring, and shared infrastructure in ML pipelines.
- [Model Registries](https://awesome-repositories.com/f/artificial-intelligence-ml/model-architecture-registries/model-registries.md) — Describes a centralized version control system for storing, tracking, and managing trained models across environments.
- [Model Versioning](https://awesome-repositories.com/f/artificial-intelligence-ml/model-versioning.md) — Centralizes version control for storing, tracking, and managing trained models across different deployment environments.
- [Production Monitoring Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/production-monitoring-frameworks.md) — Provides a structured framework for detecting performance degradation and bias in deployed ML models.
- [AI Ethics and Fairness](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-ethics-and-fairness.md) — Implements built-in fairness checks and bias detection mechanisms that validate model outputs against ethical guidelines.
- [Fairness and Bias Detection Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-security-and-governance/responsible-ai-development-practices/fairness-and-bias-detection-mechanisms.md) — Ships built-in fairness checks and bias detection mechanisms that validate model outputs against ethical guidelines.
- [Model Registries](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-stores/model-registries.md) — Provides guidance on establishing model registries and feature stores as shared ML infrastructure.

### Part of an Awesome List

- [Production Monitoring Stacks](https://awesome-repositories.com/f/awesome-lists/ai/observability-and-monitoring/production-model-decay-tracking/production-monitoring-stacks.md) — Provides a continuous observability system that tracks model performance metrics, data drift, and algorithmic bias in real-time.
- [ML System Design Patterns](https://awesome-repositories.com/f/awesome-lists/devtools/system-design-references/ml-system-design-patterns.md) — Designs machine learning systems that stay reliable, scalable, and maintainable as business needs change over time. ([source](https://cdn.jsdelivr.net/gh/chiphuyen/dmls-book@main/README.md))
- [Bias Detection Guardrails](https://awesome-repositories.com/f/awesome-lists/ai/bias-and-fairness/bias-detection-loops/bias-detection-guardrails.md) — Validates model outputs against ethical guidelines using built-in fairness checks and bias detection mechanisms.
- [Infrastructure as Code](https://awesome-repositories.com/f/awesome-lists/devtools/infrastructure-as-code.md) — Provides declarative configuration templates for provisioning and managing ML infrastructure components.

### Development Tools & Productivity

- [ML Lifecycle Pipelines](https://awesome-repositories.com/f/development-tools-productivity/build-lifecycle-pipelines/ml-lifecycle-pipelines.md) — Organizes the ML process into distinct stages with clear handoff points from data collection to production monitoring.
- [DAG-Based Orchestration](https://awesome-repositories.com/f/development-tools-productivity/parallel-execution/custom-parallel-task-execution/dag-based-orchestration.md) — Describes DAG-based pipeline orchestration for end-to-end ML workflows with automated triggers and retry logic.

### DevOps & Infrastructure

- [MLOps Pipeline Automation](https://awesome-repositories.com/f/devops-infrastructure/cicd-pipeline-automation/cicd-pipeline-management/automation-workflows/mlops-pipeline-automation.md) — Automates model operations and establishes shared infrastructure like model registries and feature stores for reproducible workflows.
- [MLOps](https://awesome-repositories.com/f/devops-infrastructure/devops/mlops.md) — Organizes the ML process into distinct stages from data collection to production monitoring with clear handoff points.
- [ML Infrastructure Managers](https://awesome-repositories.com/f/devops-infrastructure/ml-infrastructure-managers.md) — Standardizes shared tools like model stores and feature stores so multiple teams can reuse ML assets across the organization. ([source](https://cdn.jsdelivr.net/gh/chiphuyen/dmls-book@main/README.md))
- [Declarative Infrastructure Provisioning](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure-deployment/managed-infrastructure-deployment/infrastructure-deployment-provisioning/declarative-infrastructure-provisioning.md) — Uses declarative configuration files to provision and manage ML infrastructure components across cloud environments.

### Software Engineering & Architecture

- [ML System Design References](https://awesome-repositories.com/f/software-engineering-architecture/reference-architectures/ml-system-design-references.md) — Provides a reference for system design, workflow automation, and responsible AI implementation across the ML lifecycle.

### System Administration & Monitoring

- [ML Production Monitors](https://awesome-repositories.com/f/system-administration-monitoring/real-time-monitoring-systems/ml-production-monitors.md) — Continuously tracks model performance metrics, data drift, and algorithmic bias through an observability system.
