30 مستودعات
Storage solutions for unstructured or semi-structured data formats.
Distinguishing note: No candidates provided; grouping under Data & Databases as it pertains to JSON storage.
Explore 30 awesome GitHub repositories matching data & databases · Document Storage. Refine with filters or upvote what's useful.
Payload is a headless content management system and application framework that uses a code-first approach to define data schemas and administrative interfaces. By utilizing a centralized, type-safe configuration object, it automatically generates database schemas, API endpoints, and a fully customizable admin panel. The system is built on a database-agnostic architecture, allowing it to interface with various storage engines while providing a unified, type-safe API for server-side operations, REST, and GraphQL. What distinguishes Payload is its deep extensibility and developer-centric design.
Stores raw JSON objects with integrated syntax highlighting and schema validation.
Supermemory is an artificial intelligence memory management platform designed to provide autonomous agents with persistent, long-term knowledge bases. It functions as a centralized repository that synchronizes multimodal data, enabling agents to maintain context and historical information across complex, multi-session workflows. By serving as a knowledge graph engine and vector database orchestrator, the platform ensures that information remains accessible and relevant for automated tasks. The system distinguishes itself through its hybrid indexing approach, which combines vector similarity s
Centralizes text-based documents and information into a database for efficient management and retrieval.
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
Tracks changes to specific documents by primary key and updates components whenever the document is modified.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Exports processed observability data to Azure Blob Storage with support for batching and compression.
This project is a multi-model database system designed to store and manage information as documents, graphs, and key-value pairs within a single engine. It functions as a graph database and knowledge graph platform, providing the infrastructure to build, query, and visualize structured data models. By integrating vector search capabilities, the system serves as a vector database that supports retrieval-augmented generation for artificial intelligence applications. The platform distinguishes itself through a unified query language that allows users to perform document lookups, graph traversals
Manages data as flexible JSON-like objects to allow schema-less persistence while maintaining high performance for complex retrieval operations.
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
Transfers processed document data into specified cloud storage containers for downstream applications.
Peewee is a SQL object-relational mapper and query builder that provides an object-oriented interface for mapping application classes to relational database tables. It functions as a relational database toolkit for managing schemas, executing migrations, and handling complex table relationships. The project distinguishes itself by providing an asyncio database driver for non-blocking database operations, ensuring event loop responsiveness. It also supports semi-structured data storage, allowing the storage and querying of flexible JSON documents within traditional relational database systems.
Provides storage for flexible JSON documents within a traditional relational database system.
Quickwit is a cloud-native, distributed search engine designed for observability data such as logs, traces, and metrics. It functions as an observability backend that decouples compute from storage by persisting indices directly in S3-compatible cloud object stores. The system is distinguished by its compatibility with the Elasticsearch REST API, allowing it to integrate with existing clients and log shippers without reconfiguration. It also serves as an OpenTelemetry data indexer, ingesting technical data via the OpenTelemetry Protocol using gRPC and HTTP. The engine utilizes a hybrid of co
Calculates summary statistics and aggregations over large datasets to identify technical patterns and trends.
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
Provides storage for semi-structured JSON documents to enable flexible schema management.
go-cloud هي مجموعة أدوات من المكتبات المحايدة للسحابة توفر واجهات Go قابلة للنقل للتفاعل مع خدمات السحابة الشائعة. تتيح تطوير التطبيقات متعددة السحابة (multi-cloud) من خلال فصل منطق الأعمال عن تطبيقات API الخاصة بمزود معين. يستخدم المشروع نظاماً يعتمد على المشغلات (drivers) لتعيين استدعاءات الواجهة العامة إلى طلبات خاصة بالمورد. يسمح هذا للتطبيقات بالتبديل بين خلفيات سحابية مختلفة لتخزين الكائنات (blob storage)، وقواعد البيانات العلائقية، ومراسلة النشر والاشتراك غير المتزامنة دون تغيير كود التطبيق الأساسي. إلى جانب التخزين والمراسلة، تتضمن مجموعة الأدوات مديراً لتتبع وتحديث متغيرات التكوين الديناميكية في وقت التشغيل دون الحاجة إلى إعادة تشغيل العملية. كما توفر طبقة مراقبة قياسية للتتبع الموزع، وتسجيل طلبات السجل، وفحص الحالة الصحية.
Standardizes binary data operations and bucket management across multiple cloud blob storage providers.
FlutterFire is a collection of official plugins that integrate Firebase backend services into Flutter applications. It serves as a backend-as-a-service integration library, providing client-side wrappers for cloud authentication, databases, storage, and monitoring services. The project enables the integration of serverless backend logic and real-time data synchronization using NoSQL documents and state synchronization. It also provides capabilities for generative AI integration, including large language models, image generation, and local machine learning model management. The suite covers a
Retrieves and manages documents using pipelines, aggregated queries, and efficient document counting.
This project is a Node.js and Express backend application that provides a RESTful API for managing video content, channel subscriptions, and community engagements. It utilizes a MongoDB NoSQL database for document management and leverages a middleware-based request pipeline to handle business logic and network requests. The system implements a secure user authentication framework using password hashing and JSON Web Tokens to manage sessions and protect private API routes. It also integrates cloud-based blob storage to handle the uploading and distribution of images, documents, and video files
Integrates cloud-based blob storage to handle the uploading and distribution of large images and video files.
Flux is a Kubernetes GitOps delivery tool used to automate application deployments by synchronizing cluster state with configurations stored in Git, OCI, or Helm repositories. It functions as a set of controllers that monitor desired state in external sources and continuously reconcile the live cluster to match those definitions. The system distinguishes itself through a multi-cluster management plane that coordinates application delivery across fleets of remote clusters from a central hub. It provides a dedicated mechanism for automated image updates, which scans container registries for new
Pulls manifests from Azure blob storage containers and packages them as artifacts.
sccache is a compiler cache wrapper and distributed compilation cache designed to store and reuse compilation results. It functions as a specialized caching solution for the Rust compiler, as well as a general tool to avoid redundant build cycles and reduce total build time. The project distinguishes itself through a cloud-backed build cache and remote storage backends. It enables the synchronization of build artifacts across multiple machines or team members using distributed memory caches or cloud object storage. Supported storage backends include local file systems, WebDAV, and a wide arr
Persists build artifacts to remote Azure Blob Storage containers.
This project serves as a comprehensive educational repository and technical reference collection, documenting a wide range of software engineering practices and modern development technologies. It provides a structured learning path for developers, curating tutorials and practical examples that cover the full lifecycle of application development, from initial project scaffolding to deployment and maintenance. The repository distinguishes itself by offering deep technical insights into complex architectural patterns, including actor-based concurrency models for managing parallel tasks and cont
Provides functions for calculating metrics and counts on stored data without retrieving full records.
Fiora is a real-time communication suite and multimedia instant messenger designed as a self-hosted chat server. Built as a MERN stack messaging platform, it provides a networked environment for private and group conversations using a Socket.io based architecture. The platform is distinguished by its focus on self-hosting, allowing deployment on private Windows, Linux, or macOS servers for full control over user data. It features a highly customizable interface where users can apply themes, custom colors, and wallpapers to personalize their experience. The system covers a broad range of comm
Uses MongoDB for flexible, document-oriented storage of user profiles, chat histories, and group memberships.
CouchDB is a NoSQL document database that stores data as flexible documents and exposes a RESTful API for data management over HTTP. It functions as a distributed document store, synchronizing and replicating data across multiple nodes to ensure high availability and consistency. The system includes a full-text search engine that transforms database records into queryable documents, supporting sorting and pagination. Data synchronization is handled via multi-master replication, which exchanges revision histories to maintain consistency across distributed nodes. The database utilizes multi-ve
Persists unstructured or semi-structured data as flexible documents in a NoSQL store.
Kraken is a distributed blob store and peer-to-peer Docker registry designed for the high-throughput distribution of container images. It functions as a decentralized content delivery network that shares image layers across a network of nodes to prevent bottlenecks at a central registry. The system utilizes peer-to-peer blob distribution and distributed content addressing to maintain download speeds across large clusters. It implements asynchronous rule-based replication to synchronize image data between disparate geographical clusters. The project covers pluggable external blob storage inte
Integrates with external blob storage providers to manage the underlying data layer for container images.
OpenStack is an open-source cloud computing platform for building and managing public and private cloud infrastructure at scale. It provides a framework for deploying, configuring, and operating cloud services, orchestrating compute, storage, and networking resources across a datacenter through a unified management layer. The platform is built on a decoupled service architecture where individual cloud services are developed and versioned independently within their own repositories. This meta-repository tracks interoperable versions of all OpenStack components as verified submodules, with each
Provisions compute, storage, and networking resources across a datacenter through a unified dashboard and API.
Tortoise ORM is an asynchronous object-relational mapper for Python that mirrors Django's model and queryset API while running on asyncio. It defines database tables as Python classes with typed fields and supports foreign key, many-to-many, and one-to-one relations, providing a chainable query API for filtering, annotating, grouping, and prefetching related objects without blocking the event loop. The ORM includes a built-in migration engine that detects model changes, generates migration files, and applies or reverts schema changes through a command-line tool. It connects to PostgreSQL, MyS
Filters, annotates, groups, and aggregates records using a composable async query API.