# Cloud tooling and cost

> Search results for `Cloud tooling and cost` on awesome-repositories.com. 116 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/cloud-tooling-and-cost

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/cloud-tooling-and-cost).**

## Results

- [clickhouse/clickhouse](https://awesome-repositories.com/repository/clickhouse-clickhouse.md) (48,229 ⭐) — ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring.

The platform distinguishes itself through advanced storage and execution techniques, including vectorized query processing and a merge tree storage engine that maintains performance during massive insertions. It features adaptive subcolumn mapping for semi-structured data and supports native vector search for machine learning and generative AI applications. To facilitate efficient data movement, the engine utilizes zero-copy shared memory buffers, minimizing overhead when interacting with external analytical tools or processing diverse file formats like Parquet, JSON, and Arrow.

Beyond its core storage and processing capabilities, the project provides a comprehensive suite of tools for observability, security, and data integration. It includes built-in support for natural language querying, automated workflow orchestration for AI agents, and extensive diagnostic features for query plan inspection. The platform also offers robust cloud infrastructure management, including support for private networking, compliant deployment strategies, and integrated billing consolidation.
- [dagster-io/dagster](https://awesome-repositories.com/repository/dagster-io-dagster.md) (14,974 ⭐) — Dagster is a data orchestration platform designed to manage the entire lifecycle of data assets through declarative modeling and version-controlled code. It functions as a workflow engine that treats data assets as first-class primitives, allowing teams to define, schedule, and monitor complex pipelines while maintaining clear visibility into lineage, dependencies, and data quality.

The platform distinguishes itself by using a code-as-configuration framework that enables standard software engineering practices, such as unit testing and local mocking, to be applied directly to data workflows. Its architecture is built on a pluggable execution engine that decouples orchestration logic from the underlying compute, allowing tasks to run across diverse cloud-native, serverless, and containerized environments. Furthermore, it supports partition-aware scheduling, which enables incremental processing and efficient management of high-volume datasets.

Beyond core orchestration, the system provides a comprehensive suite of tools for data platform management, including automated quality governance, infrastructure cost optimization, and centralized asset cataloging. It integrates with enterprise identity providers for access control and offers robust observability features, such as streaming logs and visual lineage tracking, to ensure system health and compliance.

The platform supports a variety of deployment models, ranging from self-hosted and hybrid configurations to a fully managed control plane. It includes specialized utilities for migrating legacy pipelines and operationalizing interactive scripts into production-ready components.
- [aws/aws-cdk](https://awesome-repositories.com/repository/aws-aws-cdk.md) (12,817 ⭐) — The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane.

The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It employs a language-agnostic intermediate representation to synthesize these definitions into platform-specific configurations, while supporting aspect-oriented policy injection to apply security and compliance rules across infrastructure definitions during the synthesis phase.

Beyond core provisioning, the project provides a modular component registry for distributing and reusing pre-configured infrastructure building blocks. It supports multi-account orchestration, allowing for the deployment of consistent resource sets across different regions and accounts from a single template, and includes capabilities for detecting infrastructure drift to ensure deployed environments remain aligned with their defined state.

The project is distributed as a software development kit, providing programmatic interfaces to manage the full lifecycle of cloud resources and integrate infrastructure definitions directly into application codebases.
- [comfy-org/comfyui](https://awesome-repositories.com/repository/comfy-org-comfyui.md) (117,227 ⭐) — ComfyUI is a node-based generative AI orchestration engine designed for constructing, testing, and executing complex image and video synthesis pipelines. By utilizing a directed acyclic graph execution model, the platform allows users to build reproducible workflows through modular, interconnected processing blocks without requiring manual code implementation. It serves as both a local environment for high-performance model inference and a production-ready server for deploying generative capabilities.

The platform distinguishes itself through its focus on workflow portability and extensibility. Complex pipelines are persisted as structured JSON files, enabling version control and programmatic reconstruction. Users can extend the system’s core functionality by dynamically loading custom node extensions at runtime, while the engine’s lazy evaluation strategy ensures efficiency by computing only the necessary nodes for a given output. Real-time state synchronization via WebSockets provides immediate feedback during the generation process.

Beyond its core execution capabilities, the platform supports a broad range of operational needs, including local model orchestration, cloud-scale infrastructure management, and API integration. It provides tools for managing generative models, local software environments, and enterprise-grade infrastructure. The system exposes visual workflows as programmable endpoints, allowing developers to integrate advanced generative tasks into external software applications.
- [jasonwilbur/cloud-cost-mcp](https://awesome-repositories.com/repository/jasonwilbur-cloud-cost-mcp.md) (3 ⭐) — Model Context Protocol server for Cloud Infrastructure pricing information
- [anthropics/claude-code](https://awesome-repositories.com/repository/anthropics-claude-code.md) (132,728 ⭐) — Anthropic's terminal-native AI coding agent.
- [boto/boto3](https://awesome-repositories.com/repository/boto-boto3.md) (9,834 ⭐) — Boto3 is the AWS SDK for Python, providing a programmatic interface for managing and automating AWS cloud infrastructure and services. It serves as a cloud management API client and resource manager for provisioning, configuring, and scaling virtual servers, databases, and storage.

The library enables the implementation of infrastructure-as-code through declarative templates and scripts, allowing for the deployment of identical resource stacks across multiple accounts and geographic regions. It also provides a framework for coordinating distributed workflows, serverless functions, and containerized applications within the cloud ecosystem.

The toolkit covers a broad range of operational capabilities, including generative AI orchestration, identity and access control, and detailed cloud resource monitoring. It further extends to data lifecycle management, including automated backups and migrations, as well as comprehensive billing and cost optimization tools.
- [salesforce/cost](https://awesome-repositories.com/repository/salesforce-cost.md) (0 ⭐) — Figure 1. Overall CoST Architecture.
- [infracost/infracost](https://awesome-repositories.com/repository/infracost-infracost.md) (12,369 ⭐) — Infracost is an infrastructure-as-code financial governance platform that calculates the cost impact of cloud resource changes. By performing static analysis on configuration files, the tool identifies infrastructure resources and their properties to estimate spending changes before deployment occurs.

The platform distinguishes itself by integrating directly into development workflows, providing automated cost reporting and policy validation within pull request comments. It utilizes a modular architecture to map infrastructure definitions to real-time pricing data from cloud providers, allowing teams to receive immediate feedback on the financial implications of their code changes.

Beyond basic estimation, the tool includes a policy-as-code engine that enforces organizational budget constraints and compliance standards. This allows for the automated detection of potential spending violations or tagging requirement failures during the continuous integration process.
- [cloud-custodian/cloud-custodian](https://awesome-repositories.com/repository/cloud-custodian-cloud-custodian.md) (6,011 ⭐) — Rules engine for cloud security, cost optimization, and governance, DSL in yaml for policies to query, filter, and take actions on resources
- [langchain-ai/langchainjs](https://awesome-repositories.com/repository/langchain-ai-langchainjs.md) (17,818 ⭐) — LangChain.js is a framework for building, executing, and monitoring stateful agentic applications. It provides an orchestration engine that models workflows as directed graphs, allowing developers to connect language models, data sources, and external tools into modular, multi-step processes.

The platform distinguishes itself through its focus on stateful execution and human-in-the-loop control. It manages agent lifecycles by persisting execution state across threads, enabling fault tolerance and the ability to pause workflows at designated breakpoints for manual review or modification. This architecture supports both autonomous agent orchestration and complex multi-agent systems, with built-in capabilities for streaming real-time execution updates and managing long-term memory.

Beyond core orchestration, the project offers a comprehensive suite of tools for the entire application lifecycle. This includes integrated observability for tracing and evaluating agent performance, schema-enforced data serialization for reliable communication, and extensive support for deployment, security, and infrastructure management.

The project provides a TypeScript-based software development kit and a command-line interface to facilitate local development, testing, and deployment of agentic workflows.
- [kananinirav/aws-certified-cloud-practitioner-notes](https://awesome-repositories.com/repository/kananinirav-aws-certified-cloud-practitioner-notes.md) (3,829 ⭐) — This project is a collection of structured study notes and conceptual breakdowns designed for the AWS Certified Cloud Practitioner exam. It serves as a technical reference and study guide, organizing cloud service details and architectural principles to assist in certification preparation.

The knowledge base is built using markdown files and includes curated cheat sheets and interactive mind-map visualizations. These tools map complex certification topics into visual hierarchies to enable drill-down study paths and rapid revision.

The materials cover a wide range of cloud capabilities, including core infrastructure, security governance, and the shared responsibility model. It provides detailed references for compute, storage, networking, and database services, as well as guidance on cloud economics and cost management.

The repository utilizes Git-based versioning to track updates to the study materials.
- [dragondrop-cloud/cloud-concierge](https://awesome-repositories.com/repository/dragondrop-cloud-cloud-concierge.md) (245 ⭐) — "Terraform best practices as a Pull Request." Codify resources outside of Terraform control, detect drift, estimate cloud costs, identify security risks, and more.
- [juspay/hyperswitch](https://awesome-repositories.com/repository/juspay-hyperswitch.md) (43,019 ⭐) — Hyperswitch is a payment orchestration platform designed to manage complex transaction lifecycles through a centralized control layer. It functions as a processor-agnostic integration hub that standardizes disparate external payment APIs, allowing businesses to route transactions across multiple providers to optimize for authorization rates and cost efficiency. The platform utilizes a state-machine-based architecture to track every payment from initial authentication to final settlement, ensuring consistent processing and reliable error recovery.

What distinguishes the platform is its intelligent, rule-based traffic routing engine, which dynamically selects the most performant or cost-effective processor in real time. It includes automated recovery mechanisms that execute background retries for failed payments and payouts without requiring additional customer interaction. Furthermore, the platform provides a secure tokenization vault that replaces sensitive card data with non-sensitive tokens, which minimizes regulatory compliance scope and simplifies security audits.

The platform offers a comprehensive suite of financial operations tools, including automated reconciliation pipelines that match transaction records across multiple banks and processors. It also provides centralized management for disputes, refunds, and global payouts, alongside detailed analytics for monitoring payment costs, interchange fees, and provider markups. Security is managed through adaptive authentication workflows and integrated fraud risk management modules that can be configured via a no-code interface.
- [voltagent/awesome-claude-code-subagents](https://awesome-repositories.com/repository/voltagent-awesome-claude-code-subagents.md) (21,906 ⭐) — This project provides a framework for managing multi-agent systems, designed to automate complex software development, infrastructure, and business workflows. It functions as a multi-agent workflow orchestrator that routes tasks to domain-specific workers while maintaining state persistence and infrastructure automation. By leveraging large language models, the system decomposes high-level objectives into actionable plans, ensuring that complex operations are executed with consistency and reliability.

The framework distinguishes itself through its hierarchical agent registry and policy-driven tool access, which enforce security boundaries by restricting agent operations based on defined functional roles. It utilizes context-aware task routing to match incoming requests with specific agent capabilities and model performance profiles, while implementing deterministic fallback mechanisms to maintain operational continuity when agents encounter errors or context limits. This architecture allows for modular capability expansion and reproducible environment configurations through version-controlled templates.

The system covers a broad capability surface, including automated technical documentation, cloud infrastructure management, and security auditing. It supports diverse domains such as API design, database optimization, and system reliability engineering, providing tools for incident response, performance monitoring, and compliance enforcement. These capabilities are integrated into a command-line interface that enables developers to search, fetch, and deploy specialized subagents directly from the repository.
- [springzfx/point-cloud-annotation-tool](https://awesome-repositories.com/repository/springzfx-point-cloud-annotation-tool.md) (0 ⭐) — It is a tool used to annotate 3D box in point cloud. Point cloud in KITTI-bin format is supported. Annotation format is the same as Applo 3D format. Data examples can be found at here.
- [sapph1re/agent-cost-guardrails](https://awesome-repositories.com/repository/sapph1re-agent-cost-guardrails.md) (0 ⭐) — Budget limits and cost guardrails for AI agent frameworks. Prevents runaway API spend with hard budget enforcement, circuit breakers, and per-agent cost tracking.
- [medusajs/medusa](https://awesome-repositories.com/repository/medusajs-medusa.md) (34,404 ⭐) — Medusa is a headless commerce engine designed as a modular, API-first platform for building custom digital storefronts and business applications. Its architecture is built on a decoupled system where core business logic is encapsulated into independent, swappable modules that communicate through defined interfaces, allowing developers to incrementally adopt or replace components to fit specific operational needs.

The platform distinguishes itself through a highly extensible design that supports complex commerce requirements, including multi-vendor marketplace operations, B2B purchasing workflows, and multi-location inventory management. It provides a service-oriented API layer and a flexible administrative interface that allows for the injection of custom views and tools, ensuring that the management experience can be tailored to unique business processes.

Beyond its core commerce capabilities, the platform includes a comprehensive suite of features for managing the entire order lifecycle, product catalogs, and dynamic pricing rules. It integrates with a wide range of third-party services for payments, logistics, and content management, while offering built-in support for transactional emails, API caching, and multi-tenant resource isolation.

Developers can accelerate project initialization using pre-built starters and managed cloud deployment pipelines. The platform also provides specialized command-line tooling and AI-assisted development agents to streamline infrastructure management, debugging, and deployment workflows.
- [antonbabenko/terraform-cost-estimation](https://awesome-repositories.com/repository/antonbabenko-terraform-cost-estimation.md) (730 ⭐) — Anonymized, secure, and free Terraform cost estimation based on Terraform plan (0.12+) or Terraform state (any version)
- [dragonflydb/dragonfly](https://awesome-repositories.com/repository/dragonflydb-dragonfly.md) (30,688 ⭐) — Dragonfly is a high-performance, multi-model in-memory data store designed to serve as a drop-in replacement for existing database infrastructures. By utilizing a multi-threaded, shared-nothing architecture and a fiber-based concurrency model, it maximizes CPU utilization and minimizes latency for read and write operations. The system supports a wide range of data structures, including strings, hashes, lists, sets, sorted sets, and JSON documents, while maintaining full compatibility with standard industry wire protocols and client libraries.

What distinguishes Dragonfly is its focus on efficiency and scalability through advanced memory management and request processing. It employs a lock-free, cache-friendly hash table structure and zero-copy serialization to reduce overhead during high-throughput operations. For durability, the system utilizes asynchronous, snapshot-based persistence that captures the state of the dataset without blocking active requests. Furthermore, it provides built-in support for horizontal scaling and cluster management, allowing for the distribution of large datasets across multiple nodes to ensure high availability.

Beyond core storage, the platform includes a comprehensive suite of operational and analytical capabilities. It features integrated support for geospatial data management, real-time message brokering via publish-subscribe patterns, and full-text search. To handle massive datasets efficiently, the engine incorporates probabilistic data structures for cardinality estimation, frequency tracking, and membership testing. These features are complemented by robust administrative tools, including access control, request rate limiting, and detailed server monitoring.
- [stellar/stellar-core](https://awesome-repositories.com/repository/stellar-stellar-core.md) (3,269 ⭐) — Stellar Core is the primary software implementation of the Stellar blockchain network, serving as a distributed ledger and a Federated Byzantine Agreement system. It functions as a core node that maintains the shared state of the network and provides a runtime environment for executing WebAssembly smart contracts.

The project enables the creation and management of digital assets, including the implementation of decentralized exchanges through distributed orderbooks and automated liquidity pools. It facilitates cross-border payment settlement by routing assets via path payments and bridging digital assets with traditional banking rails through regulated anchors.

The system covers broad capabilities including cryptographic identity management, multi-signature authorization, and a comprehensive suite of smart contract tools for deployment and state persistence. It also provides infrastructure for validator node operation, historical ledger archiving, and real-time network monitoring.
- [coder/coder](https://awesome-repositories.com/repository/coder-coder.md) (12,272 ⭐) — Coder is a self-hosted platform for provisioning and managing isolated, containerized development environments. It provides a centralized infrastructure for teams to deploy ephemeral workspaces on private cloud or on-premises hardware, ensuring consistent toolchains and dependencies across distributed development environments.

The platform distinguishes itself through its focus on secure, infrastructure-as-code governance and autonomous agent integration. It allows organizations to define reusable, versioned environment templates that integrate with existing identity providers and role-based access controls. Beyond standard workspace management, it supports AI-assisted coding workflows by executing autonomous agents within secure, sandboxed environments, providing centralized oversight and planning enforcement for complex development tasks.

The system covers a broad range of operational capabilities, including automated lifecycle management, cost optimization through resource scaling, and bidirectional file synchronization between local machines and remote instances. It supports diverse access methods, ranging from browser-based terminals and remote graphical desktops to direct integration with local desktop editors.

The platform is designed for deployment across various infrastructure providers and supports operation within air-gapped or disconnected networks. Documentation and installation guides are provided to assist with the setup of server clusters and the configuration of environment templates.
- [spectacularai/point-cloud-tools](https://awesome-repositories.com/repository/spectacularai-point-cloud-tools.md) (0 ⭐)
- [grafana/grafana](https://awesome-repositories.com/repository/grafana-grafana.md) (74,456 ⭐) — Grafana is an observability data platform designed to aggregate metrics, logs, and traces from diverse sources into a unified environment. It functions as a centralized interface for visualizing complex telemetry data, transforming raw streams into interactive dashboards that support real-time system health tracking and performance monitoring.

The platform distinguishes itself through a plugin-based modular architecture that integrates disparate databases, cloud services, and monitoring tools via a standardized data abstraction layer. This framework allows for the dynamic loading of external components to support varied data sources and visualization types without requiring modifications to the core codebase. Additionally, the system incorporates a rule-based alerting engine that evaluates incoming data streams against defined thresholds to trigger automated notifications for incident response.

Beyond its core visualization and alerting capabilities, the platform provides tools for infrastructure performance monitoring and operational data analysis. It utilizes a declarative, component-driven interface to manage dashboard states and a compiled backend to process high-throughput queries and API requests. The system maintains configuration persistence and state consistency across distributed instances through a centralized metadata storage layer.
- [aschhoff/esp32-433mhz-receiver-and-tools](https://awesome-repositories.com/repository/aschhoff-esp32-433mhz-receiver-and-tools.md) (0 ⭐) — ESP32 433Mhz Receiver written in micropython and Tools for Windows
- [robusta-dev/krr](https://awesome-repositories.com/repository/robusta-dev-krr.md) (4,466 ⭐) — KRR is an open-source tool for analyzing Kubernetes resource requests and recommendations. It evaluates how pods are currently configured and provides suggestions for optimizing CPU and memory allocations based on actual usage patterns.

The project focuses on helping teams right-size their Kubernetes workloads by identifying over-provisioned and under-provisioned resources. It scans clusters and generates reports that highlight where adjustments can reduce costs or improve performance without compromising reliability.

KRR is distributed as a Python command-line tool that can be run directly against a Kubernetes cluster. Its documentation covers installation, configuration, and interpretation of the generated recommendations.
- [mastra-ai/mastra](https://awesome-repositories.com/repository/mastra-ai-mastra.md) (21,221 ⭐) — Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention.

The framework distinguishes itself through its focus on observability and secure, isolated execution. It features a built-in telemetry pipeline that captures structured execution traces, logs, and performance metrics, allowing for real-time debugging and evaluation of agent behavior. Furthermore, it utilizes sandboxed environments to isolate code execution and filesystem operations, ensuring that agent interactions remain secure and reproducible.

Mastra covers a broad capability surface, including multi-agent delegation hierarchies, schema-validated tool execution, and real-time voice interaction. It supports advanced orchestration patterns such as human-in-the-loop approvals, persistent state management for long-running workflows, and retrieval-augmented generation using vector-based semantic memory. These features are designed to work together to support the entire lifecycle of AI-powered applications, from initial development and testing to production deployment.

The project is built for TypeScript environments and provides a modular architecture that integrates with existing web stacks and infrastructure. It includes a client SDK for interacting with remote agents and supports various authentication providers to secure API endpoints and agent resources.
- [zankner/cloud](https://awesome-repositories.com/repository/zankner-cloud.md) (0 ⭐) — C ritique-out- Loud Reward Models (CLoud)
- [coollabsio/coolify](https://awesome-repositories.com/repository/coollabsio-coolify.md) (57,055 ⭐) — This project is a self-hosted platform-as-a-service that provides a centralized management interface for deploying, configuring, and monitoring containerized applications and databases on private infrastructure. It functions as a visual control plane, automating the end-to-end lifecycle of services from source code to production. By managing container orchestration, networking, and resource allocation, it allows users to maintain full control over their own hardware while streamlining the delivery of software.

The platform distinguishes itself through its agentless architecture, which uses secure shell connections to execute administrative tasks and manage remote servers without requiring persistent local software. It integrates directly with version control systems to trigger automated build and deployment pipelines, including the creation of temporary, isolated preview environments for every pull request. This workflow is supported by a declarative engine that uses templates to standardize the deployment of complex multi-container architectures and persistent database engines.

Beyond core orchestration, the system handles the operational requirements of hosted services by managing dynamic reverse-proxy routing and automated SSL certificate lifecycles. It provides a comprehensive suite of infrastructure management tools, including browser-based terminal access for debugging, automated system dependency installation, and persistent state management via a central database. These capabilities ensure that infrastructure remains synchronized and consistent across multiple remote environments.
- [cloud-hypervisor/cloud-hypervisor](https://awesome-repositories.com/repository/cloud-hypervisor-cloud-hypervisor.md) (5,285 ⭐) — Cloud Hypervisor is a Rust-based hypervisor and KVM virtual machine monitor designed to execute 64-bit guest operating systems. It functions as a user-space virtual machine manager that employs a minimal emulation layer to reduce memory overhead and latency for cloud workloads.

The project distinguishes itself through the use of a memory-safe language to implement a virtio device emulator and a user-space device model. It provides a standardized web API for managing virtual machine lifecycles and resource configurations.

The platform covers broad virtualization capabilities, including the emulation of NVMe and block storage, network connectivity via host bridging, and hardware device passthrough. It supports high-availability operations such as live migration, state snapshotting, and the dynamic resizing of CPU and memory resources through hotplugging.

The system is managed via a REST-API control plane and provides secure communication channels and shared memory interfaces between the host and guest.
- [pulumi/pulumi](https://awesome-repositories.com/repository/pulumi-pulumi.md) (24,797 ⭐) — Pulumi is an infrastructure-as-code framework that enables the definition, deployment, and management of cloud resources using general-purpose programming languages. It functions as a cloud resource orchestrator that coordinates the lifecycle of heterogeneous infrastructure by executing code to construct dependency graphs and reconciling the desired state against actual cloud environments.

The platform distinguishes itself through a language-host runtime bridge that allows developers to use standard programming languages to define infrastructure, rather than relying solely on domain-specific configuration formats. It utilizes a provider-based plugin architecture to interface with cloud APIs and incorporates a policy-as-code engine that validates infrastructure definitions against security and compliance rules during the deployment preview phase.

The project covers a broad capability surface including multi-cloud orchestration, automated state management, and drift detection. It supports complex deployment workflows through stack-based environment isolation, programmatic secret injection, and integration with continuous delivery pipelines. These features allow for the governance of infrastructure across diverse environments while maintaining consistency through version-controlled code.

The platform provides extensive documentation and a command-line interface to facilitate project initialization, infrastructure import, and deployment monitoring. It supports a wide range of cloud providers and container orchestration platforms, enabling teams to build self-service infrastructure portals and automate resource provisioning through standardized, reusable components.
- [opencost/opencost](https://awesome-repositories.com/repository/opencost-opencost.md) (6,605 ⭐) — OpenCost is an open-source tool for monitoring and allocating Kubernetes and cloud infrastructure costs. It provides real-time visibility into spending by distributing asset costs to workloads based on resource requests and usage, breaking down spend by namespace, deployment, pod, and label. The system functions as both a Kubernetes cost allocation engine and a multi-cloud cost analyzer, ingesting billing data from AWS, Azure, and GCP to present unified cost metrics alongside cluster costs.

The tool distinguishes itself through its allocation-based cost model, which compares requested versus used resources to distribute infrastructure costs to Kubernetes workloads. It integrates directly with cloud provider billing APIs to fetch dynamic pricing for accurate resource valuation, and supports custom pricing for on-premises environments through CSV imports. OpenCost also offers a Model Context Protocol server that exposes cost and allocation data for programmatic querying by AI agents and automation tools, alongside a REST API and kubectl plugin for traditional integration and command-line access.

The platform provides multiple ways to visualize and export cost data, including pre-built Grafana dashboards, an interactive web dashboard, and export pipelines to CSV and Parquet formats. It tracks historical cost trends, calculates idle costs, distributes shared costs across tenants, and reports estimated carbon footprints for cloud resources. Deployment is managed through a Helm chart with configurable storage, Prometheus, and cloud provider settings, and the system can connect to existing Prometheus-compatible stores for metrics ingestion.
- [nnikolaou/cost-sensitive-boosting-tutorial](https://awesome-repositories.com/repository/nnikolaou-cost-sensitive-boosting-tutorial.md) (0 ⭐) — The tutorial 'CalibratedAdaMEC_ExtendedVersion.ipynb' introduces the concepts of asymmetric (cost-sensitive and/or imbalanced class) learning, decision theory and boosting. It briefly describes the results of the paper:
- [harvard-edge/cs249r_book](https://awesome-repositories.com/repository/harvard-edge-cs249r-book.md) (20,217 ⭐) — This project is a comprehensive educational framework designed to teach the design, deployment, and performance optimization of machine learning systems. It provides a structured curriculum that covers the full stack of artificial intelligence engineering, ranging from the construction of core framework components like tensors and automatic differentiation engines to the orchestration of large-scale distributed training clusters.

The platform distinguishes itself through its integration of physics-grounded systems modeling and interactive simulation environments. Users can experiment with distributed training strategies, analyze communication overhead, and perform economic modeling to estimate the total cost of ownership, energy consumption, and reliability of hardware clusters. By combining these analytical tools with hands-on embedded hardware kits and browser-based notebooks, the project enables students to bridge the gap between theoretical architecture and practical deployment on resource-constrained edge devices.

Beyond core training, the project offers a broad suite of capabilities for evaluating machine learning operations. This includes tools for assessing inference latency, quantifying environmental impact, and optimizing production workloads across diverse environments. The curriculum is supported by extensive pedagogical resources, including lecture materials, assessment banks, and interview preparation scenarios that focus on hardware selection and parallel scaling strategies.

The project is maintained as an open-source repository, providing version-controlled educational content and modular software components that allow for collaborative development and adaptation by the academic community.
- [j3ssie/osmedeus](https://awesome-repositories.com/repository/j3ssie-osmedeus.md) (6,425 ⭐) — Osmedeus is an LLM security orchestration engine and AI agent framework designed to automate security workflows. It functions as a declarative workflow automator that uses YAML definitions to coordinate AI agents, shell commands, and distributed scanning tools through a directed acyclic graph.

The system distinguishes itself by deploying autonomous AI agents that use tool-calling loops and conversation memory to plan and execute complex analysis tasks. It features a specialized Agent Communication Protocol to delegate tasks to external AI binaries and supports recursive sub-agent orchestration for delegated task handling.

The platform covers a broad range of capabilities, including distributed security scanning across cloud infrastructure and the management of large-scale attack surface discovery. It incorporates a hybrid runner model to execute tasks across local shells, Docker containers, and remote SSH hosts, while persisting artifacts in S3-compatible storage and tracking findings in a centralized database.

The engine can be embedded as a Go library or managed via a REST API and web interface.
- [cost-97/reinbot](https://awesome-repositories.com/repository/cost-97-reinbot.md) (0 ⭐) — This repo contains code for the paper:
- [ayrus/afterglow-cloud](https://awesome-repositories.com/repository/ayrus-afterglow-cloud.md) (16 ⭐) — AfterGlow Cloud is a security visualization tool which lets users upload data and visualize the data as graphs on-the-fly (part of Google Summer of Code 2012).
- [heyputer/puter](https://awesome-repositories.com/repository/heyputer-puter.md) (42,318 ⭐) — Puter is a browser-based desktop environment and cloud-native development platform that provides a virtualized graphical workspace. It enables developers to build and deploy full-stack web applications by integrating cloud storage, authentication, and serverless backend logic directly into the browser, eliminating the need for traditional server infrastructure.

The platform distinguishes itself through a unified cloud storage layer and a distributed network runtime that facilitates peer-to-peer communication and cross-origin resource fetching. It features a sophisticated cross-window orchestration framework that coordinates state, user actions, and lifecycle events between isolated browser windows, allowing for complex, multi-component application workflows.

Beyond its core desktop and storage capabilities, the system includes a comprehensive suite of artificial intelligence tools, including conversational response generation, image and video creation, and speech synthesis. It also provides a serverless backend platform that executes event-driven functions and manages persistent key-value storage, all accessible through a consistent programmatic interface.

The project offers extensive documentation and examples covering AI integration, authentication, and object management to assist developers in building scalable applications.
- [griptape-ai/griptape](https://awesome-repositories.com/repository/griptape-ai-griptape.md) (2,541 ⭐) — Griptape is a Python framework for building generative AI applications, autonomous agents, and complex AI workflows. It functions as both an AI agent orchestrator and a workflow engine, capable of managing sequential pipelines and directed acyclic graphs to ensure predictable execution of AI tasks.

The framework distinguishes itself through a focus on security and governance, utilizing a Docker-based environment to execute model-generated code and shell commands in isolation. It employs a driver-based abstraction layer that allows developers to swap language model providers and vector stores without altering core logic, while using rule-based steering to enforce agent personas and output formats.

The platform covers a broad range of capabilities, including retrieval-augmented generation pipelines, multi-level memory management for conversation persistence, and schema-validated tool integration. It also supports multimodal processing for audio, image, and video data, as well as integrated observability for tracking performance and inspecting rendered prompts.
- [frappe/erpnext](https://awesome-repositories.com/repository/frappe-erpnext.md) (35,726 ⭐) — ERPNext is a comprehensive enterprise resource planning suite designed to integrate core organizational functions, including accounting, inventory, human resources, and project management, into a single unified platform. It operates as a metadata-driven business application, where data structures and application logic are defined through configuration rather than hard-coded programming to facilitate rapid customization.

The system distinguishes itself through a robust security and governance framework that enforces granular, role-based access control across all document operations. It features a dedicated data privacy layer that performs field-level masking, intercepting and transforming sensitive information at the application level based on user authorization. This ensures that private data remains protected while maintaining full operational functionality for authorized staff.

The platform manages business processes through an event-driven workflow engine that triggers automated tasks and notifications based on document status changes. Its document-oriented persistence layer handles relationships and validation logic centrally, while server-side hooks allow for the injection of custom logic into the document lifecycle. The system is documented and distributed as a configurable framework for managing complex organizational data.
- [canonical/cloud-init](https://awesome-repositories.com/repository/canonical-cloud-init.md) (3,729 ⭐) — Official upstream for the cloud-init: cloud instance initialization
- [openfaas/openfaas-cloud](https://awesome-repositories.com/repository/openfaas-openfaas-cloud.md) (0 ⭐) — OpenFaaS Cloud
- [clearml/clearml](https://awesome-repositories.com/repository/clearml-clearml.md) (6,740 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the end-to-end machine learning lifecycle, from initial experimentation to production deployment. It provides a suite of integrated tools including a pipeline orchestrator for automating workflows, an experiment tracking tool for logging hyperparameters and metrics, and a metadata-driven data versioning system for managing large-scale datasets and model artifacts.

The platform is distinguished by its advanced compute management and serving capabilities. It features a GPU compute manager that supports fractional resource slicing and priority scheduling across hybrid cloud environments. Additionally, it includes a dedicated serving framework for hosting large language models and agentic workflows through secure APIs with integrated autoscaling.

The system covers a broad range of operational capabilities, including real-time infrastructure cost tracking, multi-tenant resource isolation, and automated execution environment reproduction. It also provides observability tools for monitoring inference endpoints, auditing AI workflows, and analyzing system-level hardware utilization.

The orchestration engine can be deployed via containerized or cloud-image based installations to host the platform's lifecycle infrastructure.
- [lensapp/lens](https://awesome-repositories.com/repository/lensapp-lens.md) (23,180 ⭐) — Lens is a multi-cluster management platform and desktop application for administering Kubernetes environments. It provides a graphical interface for deploying Helm charts, editing YAML manifests, and managing the lifecycle of pods and deployments.

The project features an AI-powered cluster assistant that enables users to query cluster state, perform autonomous troubleshooting, and translate natural language requests into system commands. It also supports collaborative team access through shared spaces, utilizing encrypted cluster sharing and role-based access control to manage credentials and permissions across organizations.

Broad capabilities cover native integration with cloud providers such as AWS EKS, Azure AKS, and Google GKE, alongside real-time observability tools for streaming container logs and visualizing Prometheus metrics. The platform also includes enterprise identity management via SSO and SCIM, and security analysis tools for scanning clusters for vulnerabilities.

The application supports silent installation via command-line parameters for non-interactive setup.
- [cockroachdb/cockroach](https://awesome-repositories.com/repository/cockroachdb-cockroach.md) (32,207 ⭐) — Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures.

The system distinguishes itself through a layered architecture that separates the relational SQL abstraction from a distributed key-value store. It achieves global consistency without requiring perfectly synchronized hardware clocks by employing a hybrid logical clock synchronization mechanism. To support high-concurrency environments, it utilizes multi-version concurrency control and lock-free transaction execution, which allow for consistent snapshots and efficient conflict resolution. Furthermore, the engine is built for compatibility, implementing the standard wire protocol to support existing relational database drivers and tools.

Beyond its core transactional capabilities, the platform includes comprehensive tooling for cluster orchestration, security, and performance diagnostics. It supports a variety of deployment models, ranging from self-hosted on-premises configurations to fully managed cloud services. The system provides a command-line interface for session management and query execution, ensuring that administrators can monitor cluster health and manage workloads through standard relational interfaces.
- [signoz/signoz](https://awesome-repositories.com/repository/signoz-signoz.md) (27,355 ⭐) — SigNoz is a full-stack observability platform designed to collect, store, and visualize metrics, logs, and distributed traces in a unified environment. It leverages OpenTelemetry-based data collection to ingest telemetry from diverse sources using vendor-neutral protocols, ensuring interoperability across complex microservices architectures. The platform utilizes a high-performance columnar storage engine to enable rapid aggregation and filtering, providing a centralized backend for monitoring application health and performance.

What distinguishes the platform is its focus on automated instrumentation and semantic correlation. It allows users to capture telemetry data across various programming languages and frameworks without manual code changes, often requiring only simple environment variable updates. Once ingested, the system automatically links logs, metrics, and traces through shared identifiers, enabling seamless navigation between different telemetry types during root cause analysis. The frontend further supports this by using virtualized rendering to efficiently display complex distributed traces containing millions of spans.

The platform provides a comprehensive suite of tools for infrastructure monitoring, application performance tracking, and log management. Users can define complex alert conditions and manage monitoring configurations as version-controlled resources, ensuring consistency across deployment environments. Additionally, the system includes specialized support for monitoring large language model applications and provides visual query pipelines that translate user-defined filters into optimized database queries for real-time dashboard generation.

The entire observability stack can be deployed using container orchestration tools, with built-in utilities for verifying service status and managing data retention.
- [tensult/cloud-reports](https://awesome-repositories.com/repository/tensult-cloud-reports.md) (280 ⭐) — Scans your AWS cloud resources and generates reports. Check out free hosted version:
- [elastic/elasticsearch](https://awesome-repositories.com/repository/elastic-elasticsearch.md) (77,012 ⭐) — Elasticsearch is a distributed search engine and document store designed for the high-performance indexing and retrieval of massive volumes of unstructured data. It functions as a centralized analytics platform, providing a schema-flexible architecture that organizes information into searchable indices while maintaining global cluster state through a distributed consensus mechanism.

The platform distinguishes itself through its integrated approach to observability, security, and advanced analytics. It combines full-text, vector, and hybrid search capabilities with machine learning-driven insights, allowing users to perform complex statistical aggregations, geospatial analysis, and automated anomaly detection. Its storage architecture supports multi-tier data lifecycles, enabling efficient data placement across hot, warm, and cold nodes to balance performance with long-term retention requirements.

Beyond core search and storage, the system provides comprehensive observability tools for centralized log analysis, application performance monitoring, and infrastructure health diagnostics. It includes built-in security operations for threat detection and endpoint protection, all managed through a unified RESTful API gateway.

The system is accessible via standardized REST APIs for cluster management, data ingestion, and query execution. Extensive documentation is available to guide users through API references for search, indexing, security, and cluster administration.
- [temporalio/temporal](https://awesome-repositories.com/repository/temporalio-temporal.md) (18,411 ⭐) — Temporal is a distributed workflow orchestration engine designed to manage fault-tolerant, stateful, and long-running background processes. It functions as a platform for coordinating complex cross-service operations, ensuring consistency and reliability in distributed environments by decoupling workflow orchestration from task execution.

The platform distinguishes itself through a deterministic, event-sourced execution model that reconstructs workflow state by re-executing code from an immutable event log. This approach isolates non-deterministic side effects into managed activities, allowing the system to handle failures, retries, and long-running processes with high availability. It supports version-aware evolution, enabling developers to update logic in active workflows without disrupting ongoing executions.

The system provides a comprehensive suite of tools for microservices coordination, distributed task scheduling, and resilient system integration. It includes capabilities for managing workflow lifecycles, complex state transitions, and cross-service communication through structured service contracts. The platform also offers extensive observability, security, and administrative features, including multi-cluster replication, granular access control, and detailed execution monitoring.

Developers can interact with the platform through language-specific software development kits and a command-line interface that supports infrastructure automation, local development, and cluster management.
- [jasondavies/d3-cloud](https://awesome-repositories.com/repository/jasondavies-d3-cloud.md) (3,944 ⭐) — Create word clouds in JavaScript.
