# linkedin/school-of-sre

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/linkedin-school-of-sre).**

8,093 stars · 738 forks · HTML · other

## Links

- GitHub: https://github.com/linkedin/school-of-sre
- Homepage: https://linkedin.github.io/school-of-sre/
- awesome-repositories: https://awesome-repositories.com/repository/linkedin-school-of-sre.md

## Topics

`git` `hadoop` `linux` `mysql` `networking` `nosql` `python` `security` `sre` `system-design`

## Description

This project is a comprehensive educational resource and curriculum focused on site reliability engineering, distributed systems, and infrastructure operations. It provides technical guides, a systems engineering course, and instructional manuals designed to teach the principles of managing large-scale computing environments.

The curriculum covers high-level architectural design for scalability and resilience, including fault-tolerant infrastructure, high-availability patterns, and microservices decomposition. It emphasizes the practical application of site reliability engineering through the study of system design, resource estimation, and the elimination of single points of failure.

The material extends into broad operational capabilities, including container orchestration, continuous integration and delivery pipelines, layered observability, and network routing. It also provides detailed instruction on Linux system administration, database management, security auditing, and the implementation of service level indicators and objectives.

## Tags

### Part of an Awesome List

- [System Design And Architecture](https://awesome-repositories.com/f/awesome-lists/learning/system-design-and-architecture.md) — Provides a comprehensive curriculum for learning how to design scalable and reliable distributed systems. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/conclusion/))
- [Chaos Engineering](https://awesome-repositories.com/f/awesome-lists/devops/chaos-engineering.md) — Teaches the use of controlled failure injection to test and improve system resilience. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/conclusion/))

### Education & Learning Resources

- [DevOps Training Programs](https://awesome-repositories.com/f/education-learning-resources/educational-resources/courses-training-certifications/courses-structured-learning/courses/devops-training-programs.md) — Offers a comprehensive structured learning path for implementing CI/CD pipelines and automated delivery.
- [Curricula](https://awesome-repositories.com/f/education-learning-resources/sre-guides/curricula.md) — Implements a comprehensive educational program covering site reliability engineering, systems design, and infrastructure operations.
- [Distributed Systems Study Guides](https://awesome-repositories.com/f/education-learning-resources/distributed-systems-study-guides.md) — Provides comprehensive instructional content and study guides for architecting scalable and fault-tolerant distributed systems.
- [Systems Engineering](https://awesome-repositories.com/f/education-learning-resources/educational-resources/courses-training-certifications/courses-structured-learning/systems-engineering.md) — Provides a structured learning path for mastering Linux, networking, containerization, and distributed systems management.
- [Service Level Indicators](https://awesome-repositories.com/f/education-learning-resources/service-level-indicators.md) — Establishes quantitative measures for specific service aspects to monitor overall system health. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/large-system-design/))
- [Service Level Objectives](https://awesome-repositories.com/f/education-learning-resources/service-level-objectives.md) — Provides resources on defining target value ranges for indicators to ensure optimal user experience. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/large-system-design/))
- [Technical Manuals](https://awesome-repositories.com/f/education-learning-resources/technical-manuals.md) — Ships a technical resource and manual for managing Kubernetes, Docker, and cloud observability tools.
- [DNS Resolution Tutorials](https://awesome-repositories.com/f/education-learning-resources/dns-resolution-tutorials.md) — Provides instructional guides on the domain name system resolution process. ([source](https://linkedin.github.io/school-of-sre/level101/linux_networking/dns/))
- [Container Networking](https://awesome-repositories.com/f/education-learning-resources/educational-resources/systems-applied-computing/infrastructure-architecture/computer-networks/container-networking.md) — Offers implementation guides for managing networking and security within containerized software environments. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/))
- [Command-Line Text Processing](https://awesome-repositories.com/f/education-learning-resources/text-processing-tutorials/command-line-text-processing.md) — Offers educational resources on using shell utilities to filter, replace, and sort text data. ([source](https://linkedin.github.io/school-of-sre/level101/linux_basics/command_line_basics/))

### Data & Databases

- [Database Partitioning and Sharding](https://awesome-repositories.com/f/data-databases/database-partitioning-and-sharding.md) — Instruction on dividing large datasets based on attributes to improve query performance and isolate incidents. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/scalability/))
- [Backup & Recovery](https://awesome-repositories.com/f/data-databases/backup-recovery.md) — Covers tools and mechanisms for creating, managing, and restoring database snapshots and backups. ([source](https://linkedin.github.io/school-of-sre/level102/linux_intermediate/archiving_backup/))
- [Concurrent Update Resolution](https://awesome-repositories.com/f/data-databases/concurrent-update-resolution.md) — Provides instruction on handling simultaneous updates using timestamps, optimistic locking, and vector clocks. ([source](https://linkedin.github.io/school-of-sre/level101/databases_nosql/key_concepts/))
- [Data Backup Solutions](https://awesome-repositories.com/f/data-databases/data-backup-solutions.md) — Provides strategies for automated data replication and backup across distributed environments. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/operations/))
- [Data Caching](https://awesome-repositories.com/f/data-databases/data-caching.md) — Teaches how to implement temporary data storage layers to reduce database load and improve response times. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/scaling/))
- [Database Backup Restoration](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-management-governance/backup-recovery-systems/database-backup-restoration.md) — Instructs on restoring database metadata and snapshots to recover from system failures. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/backup_recovery/))
- [Data Integrity Constraints](https://awesome-repositories.com/f/data-databases/data-integrity-constraints.md) — Instructs on defining primary and foreign key constraints to maintain relational data integrity. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/concepts/))
- [Data Replication](https://awesome-repositories.com/f/data-databases/data-replication.md) — Explains mechanisms for synchronizing data across distributed database nodes to ensure consistency and availability. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/operations/))
- [Database Backups](https://awesome-repositories.com/f/data-databases/database-backups.md) — Covers the generation of SQL statements to create logical backups for database recovery. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/backup_recovery/))
- [Query Optimizers](https://awesome-repositories.com/f/data-databases/database-management-systems/database-systems-management/database-operations/sql-query-execution/query-optimizers.md) — Teaches the use of B+ tree structures to create indexes that speed up data retrieval. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/concepts/))
- [Database Performance Tuning](https://awesome-repositories.com/f/data-databases/database-performance-tuning.md) — Teaches techniques for adjusting database server and client parameters to optimize resource usage and stability. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/operations/))
- [Execution Plan Analysis](https://awesome-repositories.com/f/data-databases/database-query-execution/execution-plan-analysis.md) — Teaches how to generate and visualize execution plans to identify bottlenecks in table joins and index usage. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/query_performance/))
- [Database Query Joins](https://awesome-repositories.com/f/data-databases/database-query-joins.md) — Covers merging records from multiple tables using various SQL join types. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/select_query/))
- [Database Sharding](https://awesome-repositories.com/f/data-databases/database-sharding.md) — Provides educational material on architectural patterns for partitioning large datasets across multiple database nodes. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/scaling/))
- [Distributed Data Processing](https://awesome-repositories.com/f/data-databases/distributed-data-processing.md) — Explains frameworks and utilities for scaling data operations and analyzing high-volume streams across multiple nodes. ([source](https://linkedin.github.io/school-of-sre/level101/big_data/intro/))
- [Data Partitioning](https://awesome-repositories.com/f/data-databases/distributed-sharding-architectures/process-sharding/data-partitioning.md) — Instructs on distributing data across nodes using sharding and clustering for scalability. ([source](https://linkedin.github.io/school-of-sre/level101/databases_nosql/key_concepts/))
- [Message Ordering Management Systems](https://awesome-repositories.com/f/data-databases/message-ordering-management-systems.md) — Covers the management of message sequencing and partitioning to ensure data consistency. ([source](https://linkedin.github.io/school-of-sre/level101/messagequeue/key_concepts/))
- [Node Name Key Mappings](https://awesome-repositories.com/f/data-databases/node-name-key-mappings.md) — Explains the use of hashing functions to assign data keys to specific servers within a cluster. ([source](https://linkedin.github.io/school-of-sre/level101/databases_nosql/key_concepts/))
- [Primary-Replica Replication](https://awesome-repositories.com/f/data-databases/primary-replica-replication.md) — Explains replication architectures where a primary node handles writes and propagates them to read-only replicas.
- [Query Performance Analyzers](https://awesome-repositories.com/f/data-databases/query-performance-analyzers.md) — Teaches how to evaluate query execution plans and runtime statistics to identify performance bottlenecks. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/operations/))
- [Redundant Storage Configurations](https://awesome-repositories.com/f/data-databases/storage-configuration/redundant-storage-configurations.md) — Covers settings for grouping physical disks into redundant sets via RAID to maintain availability during hardware failure.
- [Subquery Support](https://awesome-repositories.com/f/data-databases/subquery-support.md) — Explains the use of nested queries to refine resultsets for primary searches. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/select_query/))
- [Table Data Retrieval](https://awesome-repositories.com/f/data-databases/table-data-retrieval.md) — Teaches how to fetch records from tables using filters, sorting, and grouping. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/select_query/))
- [Index Optimizations](https://awesome-repositories.com/f/data-databases/table-data-retrieval/index-optimizations.md) — Provides guidance on using primary, secondary, and composite indexes to reduce row scans and query times. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/query_performance/))

### Development Tools & Productivity

- [CLI Automation Tools](https://awesome-repositories.com/f/development-tools-productivity/cli-automation-tools.md) — Teaches the creation of non-interactive command-line tools to automate infrastructure tasks and debug production issues. ([source](https://linkedin.github.io/school-of-sre/level101/python_web/sre-conclusion/))

### DevOps & Infrastructure

- [Automated Software Delivery](https://awesome-repositories.com/f/devops-infrastructure/automated-software-delivery.md) — Provides guides on executing automated build and test sequences to deliver stable software updates to non-production environments. ([source](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/conclusion/))
- [Automated Build Pipelines](https://awesome-repositories.com/f/devops-infrastructure/cicd-pipeline-automation/core-build-engines/build-tooling/automated-build-pipelines.md) — Teaches the setup of integrated systems for compiling, packaging, and testing software within automated pipelines. ([source](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/continuous_integration_build_pipeline/))
- [Continuous Delivery Pipelines](https://awesome-repositories.com/f/devops-infrastructure/continuous-delivery-pipelines.md) — Provides a framework for automating the transition of code from integration to production through staged environments.
- [Continuous Deployment](https://awesome-repositories.com/f/devops-infrastructure/continuous-deployment.md) — Teaches the implementation of automated pipelines that trigger application updates via version control events and feature toggles. ([source](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/continuous_delivery_release_pipeline/))
- [Continuous Integration](https://awesome-repositories.com/f/devops-infrastructure/continuous-integration.md) — Provides a comprehensive guide to automating the integration and verification of code changes from multiple contributors. ([source](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/cicd_brief_history/))
- [Continuous Integration & Deployment](https://awesome-repositories.com/f/devops-infrastructure/continuous-integration-deployment.md) — Provides a curriculum for implementing automated testing and delivery pipelines to reduce release risk. ([source](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/cicd_brief_history/))
- [Horizontal Scaling Strategies](https://awesome-repositories.com/f/devops-infrastructure/distributed-systems/horizontal-scaling-strategies.md) — Explains architectural methods for expanding system throughput by distributing load across additional nodes. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/scalability/))
- [High Availability Clustering](https://awesome-repositories.com/f/devops-infrastructure/high-availability-clustering.md) — Teaches the deployment of clustering and replication solutions with failover to ensure continuous uptime. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/operations/))
- [High Availability Systems](https://awesome-repositories.com/f/devops-infrastructure/high-availability-systems.md) — Covers architectural patterns and configurations designed to ensure continuous service availability and fault tolerance. ([source](https://linkedin.github.io/school-of-sre/level102/networking/scale/))
- [Infrastructure as Code](https://awesome-repositories.com/f/devops-infrastructure/infrastructure-as-code.md) — Instructs on maintaining environment configurations as version-controlled code for repeatable and consistent provisioning. ([source](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/conclusion/))
- [Observability Stacks](https://awesome-repositories.com/f/devops-infrastructure/observability-stacks.md) — Teaches the deployment of integrated suites that combine metrics, logs, and tracing for full-stack observability.
- [Remote Command Execution](https://awesome-repositories.com/f/devops-infrastructure/remote-command-execution.md) — Instructs on running shell commands and configuration playbooks in parallel across multiple remote servers. ([source](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/important-tools/))
- [Zero-Downtime Deployments](https://awesome-repositories.com/f/devops-infrastructure/zero-downtime-deployments.md) — Explains deployment strategies that ensure continuous service availability by rotating traffic between environments. ([source](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/continuous_delivery_release_pipeline/))
- [CI/CD Pipeline Integrations](https://awesome-repositories.com/f/devops-infrastructure/ci-cd-pipeline-integrations.md) — Teaches the integration of static analysis, security checks, and privacy audits into delivery pipelines. ([source](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/conclusion/))
- [Cloud Backups](https://awesome-repositories.com/f/devops-infrastructure/cloud-backups.md) — Instructs on using services and utilities to automate data backups to cloud storage providers. ([source](https://linkedin.github.io/school-of-sre/level102/linux_intermediate/archiving_backup/))
- [Container Lifecycle Management](https://awesome-repositories.com/f/devops-infrastructure/container-lifecycle-management.md) — Teaches the operational pipeline for automating the build, deployment, and maintenance of container images. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/))
- [Container Image Distribution](https://awesome-repositories.com/f/devops-infrastructure/container-orchestration/image-management-tools/container-image-distribution.md) — Provides instructional content on serving container images and manifests over HTTP to enable application deployment. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/))
- [Cluster Extensibility](https://awesome-repositories.com/f/devops-infrastructure/container-orchestration/platforms/kubernetes-ecosystem/cluster-extensibility.md) — Teaches how to extend cluster functionality using custom resource definitions and operators for application-specific logic. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/))
- [Self-Healing Infrastructure](https://awesome-repositories.com/f/devops-infrastructure/container-orchestration/workload-scheduling-scaling/self-healing-infrastructure.md) — Instructs on implementing systems that automatically detect and recover from container or node failures. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/))
- [Container Storage Persistence](https://awesome-repositories.com/f/devops-infrastructure/container-storage-persistence.md) — Explains techniques for mapping host-level storage to containerized applications to ensure data durability. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/))
- [Containerized Application Deployments](https://awesome-repositories.com/f/devops-infrastructure/containerized-application-deployments.md) — Describes how to deploy full application environments using portable container images and manifest files. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/))
- [Containerized Packaging](https://awesome-repositories.com/f/devops-infrastructure/containerized-packaging.md) — Teaches technologies for bundling applications into isolated, portable environments for consistent deployment. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/))
- [Datacenter Infrastructure Design](https://awesome-repositories.com/f/devops-infrastructure/datacenter-infrastructure-design.md) — Provides guidance on planning application placement based on security, scaling, and latency targets. ([source](https://linkedin.github.io/school-of-sre/level102/networking/introduction/))
- [Distributed Job Execution](https://awesome-repositories.com/f/devops-infrastructure/distributed-job-execution.md) — Teaches the execution of processing jobs across multiple worker nodes using shared data stores for large-scale analytics. ([source](https://linkedin.github.io/school-of-sre/level101/big_data/tasks/))
- [System Package Manager Installations](https://awesome-repositories.com/f/devops-infrastructure/distribution-packaging/system-package-manager-installations.md) — Teaches the use of native OS package managers to install and upgrade software. ([source](https://linkedin.github.io/school-of-sre/level101/linux_basics/conclusion/))
- [Helm Chart Management](https://awesome-repositories.com/f/devops-infrastructure/helm-chart-management.md) — Provides guidance on automating the installation and lifecycle of packaged Kubernetes applications using Helm charts. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/))
- [Infrastructure Scaling](https://awesome-repositories.com/f/devops-infrastructure/infrastructure-scaling.md) — Covers strategies for optimizing infrastructure performance and scaling applications to handle high concurrent loads. ([source](https://linkedin.github.io/school-of-sre/level101/python_web/sre-conclusion/))
- [Distributed Cluster Provisioners](https://awesome-repositories.com/f/devops-infrastructure/managed-cluster-orchestration/test-cluster-deployers/distributed-cluster-provisioners.md) — Provides instructions for automating the deployment of multi-node compute clusters for parallel processing. ([source](https://linkedin.github.io/school-of-sre/level101/big_data/tasks/))
- [Dead Letter Queues](https://awesome-repositories.com/f/devops-infrastructure/message-queues/dead-letter-queues.md) — Explains how to redirect problematic messages to a dead letter queue for isolated analysis. ([source](https://linkedin.github.io/school-of-sre/level101/messagequeue/key_concepts/))
- [Full-Stack Observability Strategies](https://awesome-repositories.com/f/devops-infrastructure/observability-stacks/full-stack-observability-strategies.md) — Teaches the combination of metrics, logs, and tracing to diagnose root causes of service failures. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/conclusion/))
- [Retry Strategies](https://awesome-repositories.com/f/devops-infrastructure/rate-limiters/retry-strategies.md) — Explains the use of exponential back-off strategies to manage retry frequency and prevent system saturation. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/resiliency/))
- [Remote Administration](https://awesome-repositories.com/f/devops-infrastructure/remote-administration.md) — Provides instruction on using secure shells to log into and manage remote hosts. ([source](https://linkedin.github.io/school-of-sre/level101/linux_basics/conclusion/))
- [Quorum-Based Elections](https://awesome-repositories.com/f/devops-infrastructure/remote-cluster-access/cluster-failover-managers/automated-master-failovers/quorum-based-elections.md) — Explains consensus-based processes to maintain cluster quorum and prevent split-brain scenarios. ([source](https://linkedin.github.io/school-of-sre/level101/databases_nosql/key_concepts/))
- [Budgetary Constraint Analysis](https://awesome-repositories.com/f/devops-infrastructure/resource-cost-management/cost-estimators/operational-cost-monitoring/budgetary-constraint-analysis.md) — Teaches how to balance technical requirements for scalability and availability against budget constraints. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/intro/))
- [Workload Orchestration](https://awesome-repositories.com/f/devops-infrastructure/workload-orchestration.md) — Covers the management of application lifecycles and resource configuration for isolated workloads across clusters. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro/))

### Networking & Communication

- [Load Balancers](https://awesome-repositories.com/f/networking-communication/load-balancers.md) — Teaches how to split traffic across identical server clusters or network links to maximize throughput. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/scalability/))
- [Publish-Subscribe Messaging](https://awesome-repositories.com/f/networking-communication/communication-platforms-services/messaging-notification-systems/messaging-services/message-broker-infrastructure/publish-subscribe-messaging.md) — Provides instruction on using a publish-subscribe model for low-latency communication between distributed services. ([source](https://linkedin.github.io/school-of-sre/level101/messagequeue/key_concepts/))
- [Request Timeout Management](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/communication-protocols/request-timeout-management.md) — Covers the definition and enforcement of request deadlines to prevent resource exhaustion. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/resiliency/))
- [Consistency Models](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/data-synchronization-consistency/consistency-models.md) — Teaches the balance between consistency, availability, and partition tolerance using various consistency models. ([source](https://linkedin.github.io/school-of-sre/level101/databases_nosql/key_concepts/))
- [Geographic Traffic Routing](https://awesome-repositories.com/f/networking-communication/geographic-traffic-routing.md) — Teaches techniques for directing clients to the nearest server IP based on geographic location. ([source](https://linkedin.github.io/school-of-sre/level101/linux_networking/dns/))
- [HTTP Protocols](https://awesome-repositories.com/f/networking-communication/http-protocols.md) — Provides educational resources explaining the structure of HTTP requests, responses, verbs, and status codes. ([source](https://linkedin.github.io/school-of-sre/level101/linux_networking/http/))
- [DNS-Based Load Balancing](https://awesome-repositories.com/f/networking-communication/load-balancers/dns-based-load-balancing.md) — Implements traffic distribution by redirecting clients to different server IPs during DNS resolution. ([source](https://linkedin.github.io/school-of-sre/level102/networking/scale/))
- [Sticky Session Configurations](https://awesome-repositories.com/f/networking-communication/load-balancing/sticky-session-configurations.md) — Instructs on pinning users to a specific server to ensure session consistency and manage propagation delays. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/scaling-beyond-the-datacenter/))
- [Message Brokers](https://awesome-repositories.com/f/networking-communication/message-brokers.md) — Covers the use of middleware components to facilitate asynchronous communication and decoupling between distributed services.
- [Message Delivery Guarantees](https://awesome-repositories.com/f/networking-communication/message-delivery-guarantees.md) — Explains mechanisms for selecting delivery semantics to balance message loss and duplication. ([source](https://linkedin.github.io/school-of-sre/level101/messagequeue/key_concepts/))
- [Network Traffic Prioritization](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-infrastructure-configuration/network-management/network-traffic-prioritization.md) — Teaches how to assign priority to specific packets in forwarding queues to manage congestion. ([source](https://linkedin.github.io/school-of-sre/level102/networking/infrastructure-features/))
- [BGP and Spine-Leaf Architectures](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-routing-traffic-management/bgp-and-spine-leaf-architectures.md) — Instructs on implementing BGP and spine-leaf architectures to manage traffic flow and datacenter resiliency. ([source](https://linkedin.github.io/school-of-sre/level102/networking/introduction/))
- [Network Routing](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-routing-traffic-management/network-routing.md) — Explains how network packets are routed via routing tables and gateways to reach destinations. ([source](https://linkedin.github.io/school-of-sre/level101/linux_networking/ipr/))
- [Packet Capture Utilities](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-routing-traffic-management/packet-capture-utilities.md) — Instructs on recording raw network traffic from live interfaces for offline analysis. ([source](https://linkedin.github.io/school-of-sre/level101/security/network_security/))
- [Network Optimization](https://awesome-repositories.com/f/networking-communication/network-optimization.md) — Provides techniques for analyzing round-trip time and TCP throughput to improve host communication. ([source](https://linkedin.github.io/school-of-sre/level102/networking/introduction/))
- [Network Protocol Analysis](https://awesome-repositories.com/f/networking-communication/network-protocol-analysis.md) — Covers decoding packet bits and protocol sequences to troubleshoot network traffic. ([source](https://linkedin.github.io/school-of-sre/level101/security/network_security/))
- [Anycast Routing](https://awesome-repositories.com/f/networking-communication/network-traffic-routing/anycast-routing.md) — Covers advertising shared virtual addresses to route network traffic to the nearest available server.
- [Point-to-Point Messaging](https://awesome-repositories.com/f/networking-communication/point-to-point-messaging.md) — Teaches the implementation of point-to-point messaging to ensure single processing of tasks. ([source](https://linkedin.github.io/school-of-sre/level101/messagequeue/key_concepts/))
- [Service Discovery](https://awesome-repositories.com/f/networking-communication/service-discovery.md) — Teaches mechanisms for identifying and locating network services within distributed private environments. ([source](https://linkedin.github.io/school-of-sre/level101/linux_networking/dns/))
- [Traffic Flow Aggregators](https://awesome-repositories.com/f/networking-communication/traffic-flow-aggregators.md) — Instructs on aggregating IP traffic statistics into flow records to analyze congestion causes. ([source](https://linkedin.github.io/school-of-sre/level101/security/network_security/))

### Software Engineering & Architecture

- [Application Scaling Strategies](https://awesome-repositories.com/f/software-engineering-architecture/application-scaling-strategies.md) — Provides instructional material on scaling application throughput by adjusting pod replicas and distributing workloads. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/))
- [Capacity Planning](https://awesome-repositories.com/f/software-engineering-architecture/capacity-planning.md) — Provides a framework for translating functional architectural blocks into concrete hardware requirements. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/large-system-design/))
- [Failure Domain Redundancy](https://awesome-repositories.com/f/software-engineering-architecture/failure-domain-redundancy.md) — Teaches strategies for increasing availability by removing solitary components and implementing redundancy across failure domains. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/availability/))
- [Fault Tolerance Implementation](https://awesome-repositories.com/f/software-engineering-architecture/fault-tolerance-strategies/fault-tolerance-implementation.md) — Provides architectural guidance on distributing resources across independent zones to ensure fault tolerance. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/fault-tolerance/))
- [Availability Modeling](https://awesome-repositories.com/f/software-engineering-architecture/availability-modeling.md) — Offers theoretical frameworks for calculating and optimizing system availability based on component dependency structures. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/availability/))
- [Cascading Failure Preventions](https://awesome-repositories.com/f/software-engineering-architecture/cascading-failure-preventions.md) — Explains architectural patterns to prevent cascading failures by isolating shared resources. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/fault-tolerance/))
- [Container Isolation](https://awesome-repositories.com/f/software-engineering-architecture/execution-control/namespace-isolation/namespace-provisioners/container-isolation.md) — Explains mechanisms for isolating operating system environments using kernel namespaces and cgroups. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/conclusion/))
- [Graceful Degradation](https://awesome-repositories.com/f/software-engineering-architecture/graceful-degradation.md) — Provides strategies for maintaining minimum viable functionality during partial component failures. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/resiliency/))
- [Monolith Decompositions](https://awesome-repositories.com/f/software-engineering-architecture/monolithic-architectures/monolith-decompositions.md) — Provides a methodology for dividing a unified codebase into independent, loosely coupled services. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/scalability/))
- [Resource-Based Decompositions](https://awesome-repositories.com/f/software-engineering-architecture/monolithic-architectures/resource-based-decompositions.md) — Instructs on splitting monolithic applications into chunks to balance CPU and memory requirements. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/scaling/))
- [Stream Aggregators](https://awesome-repositories.com/f/software-engineering-architecture/request-dispatchers/fan-out-dispatchers/stream-aggregators.md) — Implements fan-out and fan-in patterns for distributing and aggregating messages in distributed systems. ([source](https://linkedin.github.io/school-of-sre/level101/messagequeue/key_concepts/))
- [Resilience Patterns](https://awesome-repositories.com/f/software-engineering-architecture/resilience-patterns.md) — Instructs on implementing fault-tolerant patterns such as circuit breakers and timeouts. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/intro/))
- [Logic Decoupling](https://awesome-repositories.com/f/software-engineering-architecture/software-architecture/architectural-patterns/modular-decoupled-design/structural-design-paradigms/decoupled-logic-encapsulation/logic-decoupling.md) — Provides architectural patterns for separating business logic from state and transport layers to improve scalability. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/scaling/))

### System Administration & Monitoring

- [Metric and Performance Monitors](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors.md) — Analyzes real-time CPU, memory, disk, and network metrics to identify system bottlenecks. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/introduction/))
- [Reliability Metrics](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/reliability-metrics.md) — Provides methods for measuring operational health and quantifying reliability using time-based averages. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/fault-tolerance/))
- [Operational Task Automation](https://awesome-repositories.com/f/system-administration-monitoring/operational-task-automation.md) — Provides training on writing scripts for automating system maintenance, monitoring, and recovery tasks. ([source](https://linkedin.github.io/school-of-sre/level102/linux_intermediate/introduction/))
- [Performance Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/performance-monitoring.md) — Implements comprehensive strategies for tracking real-time system health and resource utilization. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/introduction/))
- [Resource Estimation](https://awesome-repositories.com/f/system-administration-monitoring/resource-estimation.md) — Provides methodologies for calculating necessary storage, throughput, and CPU capacity based on user load. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/large-system-design/))
- [System Performance Monitors](https://awesome-repositories.com/f/system-administration-monitoring/system-performance-monitors.md) — Uses command-line utilities to measure hardware metrics and resource utilization for anomaly detection. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/command-line_tools/))
- [System Resource Monitors](https://awesome-repositories.com/f/system-administration-monitoring/system-resource-monitors.md) — Tracks process activity and memory utilization to maintain system health and performance. ([source](https://linkedin.github.io/school-of-sre/level101/linux_basics/conclusion/))
- [Alerting and Incident Management](https://awesome-repositories.com/f/system-administration-monitoring/alerting-and-incident-management.md) — Instructs on creating workflows to detect abnormal behavior and reduce incident resolution time. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/introduction/))
- [Alert Runbooks](https://awesome-repositories.com/f/system-administration-monitoring/alerting-and-incident-management/alerting-systems/alert-runbooks.md) — Provides guidance on documenting specific checks and actions required to resolve firing alerts. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/best_practices/))
- [Cache Management](https://awesome-repositories.com/f/system-administration-monitoring/cache-management.md) — Provides instruction on managing and cleaning local data caches to maintain operational efficiency. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/scaling/))
- [Centralized Logging Systems](https://awesome-repositories.com/f/system-administration-monitoring/centralized-logging-systems.md) — Provides a framework for aggregating and storing logs from distributed components to troubleshoot failures. ([source](https://linkedin.github.io/school-of-sre/level101/linux_basics/linux_server_administration/))
- [System Call Tracing](https://awesome-repositories.com/f/system-administration-monitoring/diagnostic-tools/diagnostics/execution-tracers/kernel-tracing-frameworks/system-call-tracing.md) — Provides guidance on monitoring kernel system calls to debug application interactions with the OS. ([source](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/))
- [DNS Resolvers](https://awesome-repositories.com/f/system-administration-monitoring/dns-resolvers.md) — Configures authoritative DNS servers and manages resolver settings for network clients. ([source](https://linkedin.github.io/school-of-sre/level101/linux_networking/conclusion/))
- [Auto-Recovery Systems](https://awesome-repositories.com/f/system-administration-monitoring/health-checks/auto-recovery-systems.md) — Teaches mechanisms that automatically restart services or containers upon detection of health check failures. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/resiliency/))
- [SQL-Based Log Analysis](https://awesome-repositories.com/f/system-administration-monitoring/log-querying-interfaces/log-query-engines/sql-based-log-analysis.md) — Demonstrates how to use relational query languages for aggregations and analysis of log data. ([source](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/important-tools/))
- [Log Search Engines](https://awesome-repositories.com/f/system-administration-monitoring/logging-and-telemetry/log-search-engines.md) — Teaches how to extract specific information from log files using search and filtering techniques. ([source](https://linkedin.github.io/school-of-sre/level101/linux_basics/conclusion/))
- [Distributed Tracing](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/distributed-tracing-execution-analysis/distributed-tracing.md) — Explains how to track a single user request across microservices using spans to identify latency. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/observability/))
- [Metric Type Selection](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/metric-type-selection.md) — Provides guidance on choosing appropriate metric types like gauges, timers, and counters for system monitoring. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/best_practices/))
- [Alert Management Systems](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/operational-health-alerting/alert-management-systems.md) — Teaches how to route and inhibit alerts to ensure only actionable notifications reach engineers. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/best_practices/))
- [Network Bottleneck Diagnosis](https://awesome-repositories.com/f/system-administration-monitoring/network-bottleneck-diagnosis.md) — Teaches the analysis of socket states and retransmission metrics to locate performance bottlenecks. ([source](https://linkedin.github.io/school-of-sre/level101/linux_networking/tcp/))
- [Network Connectivity Diagnostics](https://awesome-repositories.com/f/system-administration-monitoring/network-connectivity-diagnostics.md) — Instructs on interpreting routing and ARP table errors to identify unreachable hosts and IP conflicts. ([source](https://linkedin.github.io/school-of-sre/level101/linux_networking/ipr/))
- [Network Latency Testing](https://awesome-repositories.com/f/system-administration-monitoring/network-latency-testing.md) — Teaches how to calculate round-trip time for packets to determine the impact of distance on response times. ([source](https://linkedin.github.io/school-of-sre/level102/networking/rtt/))
- [Network Traffic Analysis](https://awesome-repositories.com/f/system-administration-monitoring/network-traffic-analysis.md) — Teaches how to filter network packets and socket statistics to inspect bandwidth and connections. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/command-line_tools/))
- [Observability Dashboards](https://awesome-repositories.com/f/system-administration-monitoring/observability-dashboards.md) — Teaches how to aggregate key performance metrics into a single view for service health monitoring. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/conclusion/))
- [Performance Profiling Tools](https://awesome-repositories.com/f/system-administration-monitoring/performance-profiling-tools.md) — Teaches the use of instrumentation and sampling tools to identify frequent code paths and memory leaks. ([source](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/performance-improvements/))
- [Kernel Resource Limiting](https://awesome-repositories.com/f/system-administration-monitoring/resource-usage-limiters/kernel-resource-limiting.md) — Instructs on restricting hardware resources like CPU and memory using Linux kernel control groups. ([source](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/))
- [Service Alerting Configurations](https://awesome-repositories.com/f/system-administration-monitoring/service-alerting-configurations.md) — Provides instruction on setting thresholds and notifications to identify service anomalies. ([source](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/conclusion/))
- [Service Capacity Benchmarking](https://awesome-repositories.com/f/system-administration-monitoring/service-capacity-benchmarking.md) — Teaches how to measure maximum requests per second and latency under synthetic load to detect regressions. ([source](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/performance-improvements/))
- [Slow Query Tracking](https://awesome-repositories.com/f/system-administration-monitoring/slow-query-tracking.md) — Provides guidance on logging and identifying SQL statements that exceed execution time thresholds. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/lab/))
- [System Automation Scripts](https://awesome-repositories.com/f/system-administration-monitoring/system-automation-scripts.md) — Teaches the creation of executable Bash scripts to automate repetitive system administration workflows. ([source](https://linkedin.github.io/school-of-sre/level102/linux_intermediate/bashscripting/))
- [System Performance Analyzers](https://awesome-repositories.com/f/system-administration-monitoring/system-performance-monitors/system-performance-analyzers.md) — Provides techniques for measuring software stack behavior under controlled workloads to identify bottlenecks. ([source](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/introduction/))

### Content Management & Publishing

- [Static Content Distribution](https://awesome-repositories.com/f/content-management-publishing/static-site-document-generators/static-site-generators/content-delivery-publishing/static-content-distribution.md) — Instructs on caching bandwidth-intensive resources at distributed points of presence to reduce server load. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/scaling-beyond-the-datacenter/))

### Operating Systems & Systems Programming

- [Filesystem Crash Recovery](https://awesome-repositories.com/f/operating-systems-systems-programming/filesystem-crash-recovery.md) — Teaches the implementation of redo logs and atomic commits to restore consistency after system failures. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/innodb/))
- [Child Process Management Helpers](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/process-lifecycle-orchestrators/child-process-management-helpers.md) — Covers the collection of termination statuses and the removal of zombie processes from the system table. ([source](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/))
- [Namespace-Based Isolation](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/process-isolation/namespace-based-isolation.md) — Explains how to use Linux kernel namespaces and cgroups to create isolated environments for processes.
- [Process Lifecycle Management](https://awesome-repositories.com/f/operating-systems-systems-programming/process-lifecycle-management.md) — Provides instruction on managing process states and signal handling within terminal environments. ([source](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/conclusion/))
- [Process Signal Management](https://awesome-repositories.com/f/operating-systems-systems-programming/process-signal-management.md) — Provides instruction on triggering software interrupts in target processes using system calls. ([source](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/))
- [Signal Masking](https://awesome-repositories.com/f/operating-systems-systems-programming/process-signal-management/signal-masking.md) — Explains how to add signals to a process mask to prevent interruptions of critical code segments. ([source](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/))
- [System Signal Handling](https://awesome-repositories.com/f/operating-systems-systems-programming/system-signal-handling.md) — Teaches how to define custom handler functions to override default OS responses to signals. ([source](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/))

### Scientific & Mathematical Computing

- [Cluster Resource Managers](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/high-performance-and-parallel-computing/high-performance-computing/cluster-resource-managers.md) — Teaches how to provision and scale compute resources across diverse infrastructure to ensure efficient task execution. ([source](https://linkedin.github.io/school-of-sre/level101/big_data/evolution/))

### Security & Cryptography

- [DoS Attack Defenses](https://awesome-repositories.com/f/security-cryptography/dos-attack-defenses.md) — Instructs on blocking resource exhaustion and flooding attempts using intrusion prevention systems. ([source](https://linkedin.github.io/school-of-sre/level101/security/threats_attacks_defences/))
- [Network Address Translation](https://awesome-repositories.com/f/security-cryptography/firewalls/network-address-translation.md) — Instructs on mapping internal server IP addresses to public addresses via firewall NAT. ([source](https://linkedin.github.io/school-of-sre/level102/networking/infrastructure-features/))
- [Security and Threat Mitigations](https://awesome-repositories.com/f/security-cryptography/governance-policy-frameworks/compliance-governance/security-and-compliance/security-and-threat-mitigations.md) — Covers applying perimeter security and cluster ring-fencing to protect internal and external services. ([source](https://linkedin.github.io/school-of-sre/level102/networking/introduction/))
- [Database Access Controls](https://awesome-repositories.com/f/security-cryptography/granular-access-controls/database-access-controls.md) — Instructs on managing database accounts with fine-grained permissions to restrict administrative operations. ([source](https://linkedin.github.io/school-of-sre/level101/databases_sql/concepts/))
- [DDoS Protections](https://awesome-repositories.com/f/security-cryptography/network-infrastructure-security/web-network-security/network-security/ddos-protections.md) — Provides strategies for dropping anomalous traffic and bot-driven floods at the network edge. ([source](https://linkedin.github.io/school-of-sre/level102/networking/security/))
- [Client Request Quotas](https://awesome-repositories.com/f/security-cryptography/request-size-limiters/request-limiters/request-throttling/client-request-quotas.md) — Provides strategies for assigning request quotas per client to prevent a single consumer from overwhelming the system. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/resiliency/))
- [Administrative Privilege Management](https://awesome-repositories.com/f/security-cryptography/role-based-access-control/administrative-privilege-management.md) — Covers the management and elevation of superuser privileges for restricted system operations. ([source](https://linkedin.github.io/school-of-sre/level101/linux_basics/linux_server_administration/))
- [Security Auditing](https://awesome-repositories.com/f/security-cryptography/security-auditing.md) — Details how to perform periodic security reviews of infrastructure, software patches, and compliance standards. ([source](https://linkedin.github.io/school-of-sre/level102/networking/security/))
- [Resilience Drill Orchestrators](https://awesome-repositories.com/f/security-cryptography/security/operations-and-incident-response/resilience-drill-orchestrators.md) — Executes mock failure drills and stress tests to validate system resilience. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/availability/))
- [Access Control](https://awesome-repositories.com/f/security-cryptography/security/policies/access-control.md) — Teaches how to assign specific permissions to authenticated users to restrict access to system functions. ([source](https://linkedin.github.io/school-of-sre/level101/security/fundamentals/))
- [File System Access Controls](https://awesome-repositories.com/f/security-cryptography/security/policies/host-resource-access/file-system-access-controls.md) — Provides guidance on restricting server access and controlling file ownership based on user roles. ([source](https://linkedin.github.io/school-of-sre/level101/linux_basics/conclusion/))

### Testing & Quality Assurance

- [Memory Leak Detection](https://awesome-repositories.com/f/testing-quality-assurance/debugging-diagnostics/memory-leak-detection.md) — Instructs on identifying memory mismanagement through allocation snapshots and object tracking. ([source](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/troubleshooting-example/))
- [Failure Isolation Mechanisms](https://awesome-repositories.com/f/testing-quality-assurance/general-testing-utilities/test-isolation/service-isolation-utilities/failure-isolation-mechanisms.md) — Teaches how to use circuit breakers to isolate unstable services and prevent downstream overloading. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/resiliency/))
- [Production State Validation](https://awesome-repositories.com/f/testing-quality-assurance/production-state-validation.md) — Explains how to use staging environments that replay real production traffic to validate changes before deployment. ([source](https://linkedin.github.io/school-of-sre/level102/system_design/resiliency/))

### User Interface & Experience

- [Failover Strategies](https://awesome-repositories.com/f/user-interface-experience/design-systems/failover-strategies.md) — Provides guidance on ensuring systems transition to redundant backups without introducing data corruption. ([source](https://linkedin.github.io/school-of-sre/level101/systems_design/availability/))
