45 dépôts
Systems for storing and retrieving unstructured data objects.
Distinguishing note: Focuses on scalable storage for files and binary data.
Explore 45 awesome GitHub repositories matching data & databases · Object Storage Services. Refine with filters or upvote what's useful.
Ce projet est une ressource éducative et un guide d'étude complet axé sur l'architecture des systèmes distribués et la conception d'infrastructures backend. Il fournit un programme structuré pour maîtriser les principes de scalabilité, de fiabilité et de performance requis pour concevoir des systèmes logiciels complexes. Le dépôt se distingue en offrant une approche méthodique de la préparation aux entretiens techniques, intégrant des modèles de conception, des compromis architecturaux et des outils de répétition espacée pour aider les utilisateurs à retenir des concepts complexes. Il met l'accent sur l'analyse axée sur les contraintes, enseignant aux utilisateurs comment évaluer des exigences concurrentes comme la latence, la cohérence et la disponibilité lors de l'élaboration de conceptions architecturales. Le contenu couvre un large spectre de capacités de conception de systèmes, notamment des stratégies pour la mise à l'échelle des bases de données, la gestion du trafic et l'optimisation de l'infrastructure. Il détaille des techniques pour la mise à l'échelle horizontale, la mise en cache multicouche, la communication asynchrone et la découverte de services, tout en fournissant des cadres pour effectuer des estimations de ressources et la planification de la capacité. La documentation est organisée comme un guide d'étude, offrant un chemin systématique à travers les fondamentaux de l'ingénierie backend et de la conception de systèmes à grande échelle.
Provides guidance on offloading static assets to object storage services to improve system scalability.
Dokploy is a self-hosted platform-as-a-service designed to simplify the deployment and management of containerized applications and databases. It provides a centralized control plane that decouples administrative management from application workloads, allowing users to oversee infrastructure across multiple server nodes through a unified web interface or a command-line tool. The platform distinguishes itself through an extensive library of pre-configured application templates, enabling the rapid deployment of databases, identity providers, and various productivity or development tools. It sup
Provides scalable storage for managing and retrieving unstructured data objects.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Streams log data into databases by staging batches in object storage.
NATS Server is a high-performance, lightweight messaging system designed for cloud-native applications, edge computing, and distributed microservices. It functions as a distributed publish-subscribe broker that routes messages using hierarchical, dot-separated subject strings, enabling decoupled communication between services without requiring centralized broker lookups. The system supports core messaging patterns including asynchronous publish-subscribe, request-reply, and load-balanced queue processing. The platform distinguishes itself through a decentralized architecture that eliminates t
Manages large binary objects by automatically chunking data for efficient storage and retrieval.
This project provides a comprehensive implementation of the AT Protocol, serving as a framework for building decentralized social networking applications. It enables the creation of distributed data repositories where users maintain cryptographic ownership of their identity and content, allowing for portable accounts that can be migrated between independent servers without central authority intervention. The platform distinguishes itself by decoupling content hosting from discovery through modular algorithmic curation. Users can select third-party services to filter and organize their feeds,
Redirects blob storage to scalable object storage services to improve production performance.
FoundationDB is an ACID-compliant distributed transactional key-value store. It functions as a scalable database engine that ensures strict serializability and data consistency across a cluster of servers using a shared-nothing architecture. The system is distinguished by its multi-region replication capabilities, allowing data to be synchronized across different datacenters for high availability and disaster recovery. It utilizes optimistic concurrency control to manage distributed transactions and employs a majority-based coordination system to maintain cluster state. The platform provides
Automatically breaks large binary objects into smaller chunks for compatible storage within the database.
VictoriaMetrics is a high-performance, scalable time series database and observability platform designed for long-term storage and analysis of metric, log, and trace data. It functions as a unified backend for monitoring ecosystems, offering full compatibility with industry-standard protocols and query languages. The system is built to handle massive data volumes through a distributed architecture that supports horizontal scaling and efficient data lifecycle management. The platform distinguishes itself through a storage engine that utilizes consistent hashing for data sharding and log-struct
Offloads long-term data to object storage while retaining local caching for high-performance query execution.
Ceph is a unified, software-defined storage platform designed to provide object, block, and file storage services from a single distributed cluster. By decoupling data management from physical hardware, it enables elastic scaling across commodity hardware, allowing organizations to build large-scale storage infrastructure without reliance on proprietary vendor equipment. The system distinguishes itself through a shared-nothing, distributed architecture that utilizes deterministic hashing for data placement. This approach eliminates centralized metadata bottlenecks, allowing the cluster to sca
Exposes object, block, and file storage interfaces from a single distributed cluster to support diverse application requirements.
Grav is a flat-file content management system that eliminates the need for a traditional database by storing site content and configuration in human-readable Markdown and YAML files. Built as a modular PHP web framework, it uses a hierarchical page routing system where the physical directory structure directly determines the site's URL paths. The platform is distinguished by its event-driven plugin architecture and a command-line interface that prioritizes system administration, deployment, and maintenance tasks. It utilizes a blueprint-driven system to generate administrative forms from stru
Provides systems for storing and retrieving unstructured data objects.
Dexie.js is a wrapper library for IndexedDB that provides a simplified interface for managing and querying structured data within the browser. It functions as a browser database manager used to maintain persistent application state and store binary blobs and records. The project serves as an offline-first data store that synchronizes browser data with remote servers to maintain consistency across sessions. It also acts as a reactive database store by monitoring data changes in real time to trigger automatic user interface updates, and functions as a client-side search engine for indexing and
Allows saving large binary objects and files directly alongside structured database records.
ABP is an opinionated architectural framework for building enterprise software solutions using .NET and ASP.NET Core. It serves as a structural toolkit for implementing domain-driven design and microservices patterns, providing a modular enterprise architecture where functionality is organized into independent, pluggable modules. The platform is specifically designed to support multi-tenant SaaS architectures, isolating data and configurations for multiple independent customers within a single application instance. It provides enterprise boilerplate infrastructure and pre-configured templates
Provides a standardized infrastructure layer for saving and retrieving large binary blobs and files.
Thanos is a distributed metrics query engine and monitoring scalability suite designed to provide a unified interface for aggregating data from multiple Prometheus servers and clusters. It functions as a high availability monitoring backend that eliminates single points of failure by deduplicating data from replicated instances. The system enables long-term retention by persisting time-series data to cloud-native object storage, allowing for unlimited historical archiving beyond the limits of local disks. It further optimizes this storage through a downsampling and retention manager that comp
Persists time-series data to cloud-native object stores to provide an unlimited historical metric archive.
Thanos is a CNCF cloud native monitoring tool that provides a highly available and scalable extension to the Prometheus ecosystem. It functions as a global query engine, a long-term storage system, and a metric downsampler. The project enables a unified interface to aggregate and query metrics across multiple distributed clusters from a single view. It maintains historical data beyond local retention limits by persisting time-series metrics in object storage and eliminates data gaps by merging metrics from redundant server pairs. The system includes capabilities for reducing the resolution o
Offloads long-term time-series data to remote cloud object storage to ensure durability and infinite retention.
Rook is a Kubernetes storage orchestrator and distributed storage operator that automates the deployment and management of storage clusters. It serves as a multi-protocol storage provider, offering block, file, and object storage capabilities to containerized workloads. The system focuses on providing a self-healing storage cluster that replicates data across hardware nodes to maintain availability and recover from failures. It uses an operator-led model to handle the installation, scaling, and upgrades of storage nodes and daemons. The orchestrator covers a broad range of provisioning servi
Deploys scalable object stores accessible via standard endpoints for data retrieval inside and outside the cluster.
Stalwart is a self-hosted email and collaboration infrastructure that provides an integrated mail server supporting SMTP, IMAP, POP3, and JMAP protocols. It functions as a comprehensive communication hub, combining email hosting with a collaboration server for shared calendars, contacts, and files. The system distinguishes itself through a distributed architecture that uses peer-to-peer cluster coordination to ensure high availability and fault tolerance. It features a built-in security suite that implements an S/MIME and OpenPGP email gateway alongside automated TLS certificate provisioning
Offloads large objects and email bodies to S3-compatible storage for scalable distributed deployments.
PredictionIO is a machine learning server designed for the deployment of predictive models to transform raw data into actionable predictions. It manages the full lifecycle of machine learning operations, from ingesting event data via APIs to hosting production-ready predictive services for real-time inference. The system supports distributed model training by spreading computational workloads across a cluster of nodes to increase processing speed. It enables the implementation of custom prediction engines using programming languages or the application of pre-built model templates for common t
Uses external object storage as a backend to persist and retrieve large serialized machine learning model files.
Phabricator is a software development suite consisting of a collection of integrated web applications designed to manage the full software engineering lifecycle. It serves as a project management platform, issue tracking system, and code review tool. The suite provides capabilities for bug tracking and coordination, allowing teams to report and manage software defects and feature requests. It also facilitates peer code review workflows to manage proposed changes before they are merged into a repository. The platform includes tools for project task organization and general software developmen
Provides a storage system for managing large binary objects and attachments via the local filesystem.
InsForge is a backend-as-a-service platform that provides an integrated suite of tools for managing relational databases, identity provision, object storage, and serverless compute. It functions as an open-source identity provider and a PostgreSQL database manager featuring integrated vector storage and row-level security. The platform serves as an LLM orchestration gateway, offering a unified endpoint to route requests across various AI providers through an OpenAI-compatible interface. It enables AI-driven application generation and connects AI agents to backend resources using a standardize
Retrieves the binary content of a specific object from a named storage bucket.
Quickwit is a cloud-native, distributed search engine designed for observability data such as logs, traces, and metrics. It functions as an observability backend that decouples compute from storage by persisting indices directly in S3-compatible cloud object stores. The system is distinguished by its compatibility with the Elasticsearch REST API, allowing it to integrate with existing clients and log shippers without reconfiguration. It also serves as an OpenTelemetry data indexer, ingesting technical data via the OpenTelemetry Protocol using gRPC and HTTP. The engine utilizes a hybrid of co
Persists index data and metadata directly in S3-compatible cloud object storage.
Azure Docs is the official technical documentation repository for Microsoft Azure, the cloud computing platform. It provides comprehensive guidance on the full spectrum of Azure services, covering everything from core infrastructure components like virtual machines, Kubernetes clusters, and serverless computing to platform services for AI, machine learning, data analytics, and storage. The documentation details how to provision, manage, and govern cloud resources at scale, including policy enforcement, identity management, and cost optimization. The documentation distinguishes Azure through i
Documents Azure's object storage service for storing and retrieving unstructured data at scale.