awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Operational Reliability · Awesome GitHub Repositories

11 repos

Awesome GitHub RepositoriesOperational Reliability

Systems and practices designed to maintain service availability through automation, monitoring, and resilient orchestration.

Explore 11 awesome GitHub repositories matching devops & infrastructure · Operational Reliability. Refine with filters or upvote what's useful.

  1. Home
  2. DevOps & Infrastructure
  3. DevOps
  4. Operational Reliability

Awesome Operational Reliability GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • torvalds/linux

    torvalds/linux

    217,986GitHubView on GitHub↗

    The Linux kernel is a monolithic operating system kernel that serves as the primary interface between computer hardware and software applications. It provides the foundational infrastructure for managing system resources, including memory allocation, process scheduling, and synchronization primitives. The project inclu

    C
  • kubernetes/kubernetes

    kubernetes/kubernetes

    120,673GitHubView on GitHub↗

    Kubernetes is a distributed container orchestration platform that automates the deployment, scaling, and management of containerized applications across clusters of computing nodes. It functions as a declarative infrastructure controller, utilizing a control loop architecture that continuously monitors the current syst

    Gocncfcontainersgo
  • ripienaar/free-for-dev

    ripienaar/free-for-dev

    118,073GitHubView on GitHub↗

    This project is a community-maintained directory of technical resources, tools, and services that offer free tiers for developers. It serves as a centralized reference point for discovering infrastructure, software, and educational materials, helping individuals and teams minimize operational costs while building and s

    HTMLawesome-listfree-for-developers
  • goldbergyoni/nodebestpractices

    goldbergyoni/nodebestpractices

    105,100GitHubView on GitHub↗

    This project provides a comprehensive collection of industry-standard guidelines for developing, testing, and deploying Node.js applications. It covers the entire software lifecycle, offering actionable advice on code style, architectural patterns, and security measures to ensure maintainability and consistency across

    Dockerfilebest-practiceses6eslint
  • spring-projects/spring-boot

    spring-projects/spring-boot

    80,046GitHubView on GitHub↗

    Spring Boot is an opinionated application framework designed to streamline the creation of production-ready services. It functions as a comprehensive development platform that utilizes a centralized dependency injection container to manage object lifecycles and wiring. By employing convention-over-configuration, the fr

    Javaframeworkjavaspring
  • vitejs/vite

    vitejs/vite

    78,295GitHubView on GitHub↗

    Vite is a frontend build toolchain that provides a unified development and production pipeline for modern web applications. It functions as a modular, environment-agnostic build engine that leverages native ES modules to serve source code directly to the browser, eliminating the need for expensive bundling during the d

    TypeScriptbuild-tooldev-serverfrontend
  • redis/redis

    redis/redis

    73,096GitHubView on GitHub↗

    Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to pr

    Ccachecachingdatabase
  • xtekky/gpt4free

    xtekky/gpt4free

    65,720GitHubView on GitHub↗

    This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, co

    Pythonchatbotchatbotschatgpt
  • traefik/traefik

    traefik/traefik

    61,814GitHubView on GitHub↗

    Traefik is a cloud-native edge router and API gateway designed to manage service communication and traffic flow across distributed infrastructure. It functions as a dynamic service proxy that automatically discovers backend services and configures routing rules in real time, eliminating the need for manual restarts or

    Goconsuldockeretcd
  • commaai/openpilot

    commaai/openpilot

    60,104GitHubView on GitHub↗

    Openpilot is an open-source driver assistance system that integrates with vehicle control units to provide automated steering, acceleration, and braking. It functions as an automotive robotics middleware, utilizing a specialized runtime environment to process sensor data and execute real-time control commands that mana

    Pythonadvanced-driver-assistance-systemsdriver-assistance-systemsrobotics
  • etcd-io/etcd

    etcd-io/etcd

    51,618GitHubView on GitHub↗

    etcd is a distributed, strongly consistent key-value store designed to provide reliable storage for critical system metadata and coordination primitives. It functions as a distributed consensus engine, utilizing a replicated log and leader-based state machine to ensure that all nodes in a cluster maintain a synchronize

    Gocncfconsensusdatabase

Explore sub-tags

  • Application MonitoringTools and practices for tracking performance and health metrics in production.
  • Automated Failover MechanismsSystems that detect service degradation and automatically route traffic to secondary providers or endpoints.
  • Automated Service ReliabilityMechanisms for health monitoring and automated recovery of distributed services.
  • Capacity Planning1 sub-tagTools for analyzing and forecasting resource requirements to ensure systems meet future demand.
Declarative Infrastructure ManagementManaging system state via version-controlled configuration files.
  • Disaster Recovery SystemsMechanisms for restoring system state from backups after failures.
  • Distributed Container OrchestrationLifecycle management of containers across multi-node clusters.
  • Error Tracking and Exception HandlingServices for capturing and analyzing runtime application errors.
  • Performance Tuning1 sub-tagUtilities and configurations focused on optimizing the speed and responsiveness of software or system components.
  • Resource Utilization OptimizationMaximizing hardware efficiency through intelligent workload packing.
  • Stateful Workload OrchestrationManaging persistent data requirements for stateful applications.
  • Watchdog MonitorsMechanisms that monitor system health and performance to trigger automatic resets or immediate interventions when failures occur.