What are the best Awesome Data Management GitHub Repositories?

Tools and utilities for maintaining, organizing, protecting, and migrating data throughout its operational lifecycle. Explore 311 awesome GitHub repositories matching data & databases · Data Management. Refine with filters or upvote what's useful. Top picks: donnemartin/system-design-primer, chalarangelo/30-seconds-of-code, immich-app/immich, doocs/advanced-java, caddyserver/caddy, apache/superset, appflowy-io/appflowy, protocolbuffers/protobuf, leonardomso/33-js-concepts, nocodb/nocodb.

Why is donnemartin/system-design-primer a recommended Data Management GitHub Repositories repository?

Provides utilities for creating globally unique identifiers to ensure data consistency across distributed systems.

Why is chalarangelo/30-seconds-of-code a recommended Data Management GitHub Repositories repository?

Provides utilities for generating universally unique identifiers for data tracking.

Why is immich-app/immich a recommended Data Management GitHub Repositories repository?

Identifies duplicate files using checksum verification during the backup process to prevent redundant storage and optimize bandwidth usage.

Why is doocs/advanced-java a recommended Data Management GitHub Repositories repository?

The system assigns each service its own private database or data store to avoid tight coupling.

Why is caddyserver/caddy a recommended Data Management GitHub Repositories repository?

Data portability utilities facilitate the safe export and import of storage contents, enabling seamless migration between different server environments.

Why is apache/superset a recommended Data Management GitHub Repositories repository?

Demonstration datasets and example dashboards are included to facilitate immediate exploration of core functionality upon initial installation.

Why is appflowy-io/appflowy a recommended Data Management GitHub Repositories repository?

Enables advanced filtering, sorting, and grouping of tabular data to help users visualize and analyze complex datasets.

Why is protocolbuffers/protobuf a recommended Data Management GitHub Repositories repository?

Facilitates the evolution of data structures over time while preserving backward and forward compatibility for distributed systems.

Why is leonardomso/33-js-concepts a recommended Data Management GitHub Repositories repository?

Master local storage mechanisms like IndexedDB and Web Storage APIs to maintain application state across browser sessions.

Why is nocodb/nocodb a recommended Data Management GitHub Repositories repository?

Empowers users to filter, sort, and group tabular data for clearer analysis and efficient information management.

311 repository-uri

Awesome GitHub RepositoriesData Management

Tools and utilities for maintaining, organizing, protecting, and migrating data throughout its operational lifecycle.

Explore 311 awesome GitHub repositories matching data & databases · Data Management. Refine with filters or upvote what's useful.

Găsește cele mai bune repo-uri cu AI.Vom căuta cele mai potrivite repository-uri folosind AI.

donnemartin/system-design-primer
donnemartin/system-design-primer
353,387Vezi pe GitHub
Acest proiect este o resursă educațională cuprinzătoare și un ghid de studiu axat pe arhitectura sistemelor distribuite și designul infrastructurii backend. Oferă un curriculum structurat pentru stăpânirea principiilor de scalabilitate, fiabilitate și performanță necesare pentru a proiecta sisteme software complexe. Repository-ul se distinge prin oferirea unei abordări metodice pentru pregătirea interviurilor tehnice, încorporând tipare de design, compromisuri arhitecturale și instrumente de repetiție spațiată pentru a ajuta utilizatorii să rețină concepte complexe. Pune accent pe analiza bazată pe constrângeri, învățând utilizatorii cum să evalueze cerințele concurente precum latența, consistența și disponibilitatea atunci când schițează design-uri arhitecturale. Conținutul acoperă un spectru larg de capabilități de design de sistem, inclusiv strategii pentru scalarea bazelor de date, gestionarea traficului și optimizarea infrastructurii. Detaliază tehnici pentru scalarea orizontală, caching-ul pe mai multe niveluri, comunicarea asincronă și descoperirea serviciilor, oferind în același timp framework-uri pentru efectuarea estimărilor de resurse și planificarea capacității. Documentația este organizată ca un ghid de studiu, oferind o cale sistematică prin fundamentele ingineriei backend și designul sistemelor la scară largă.
Provides utilities for creating globally unique identifiers to ensure data consistency across distributed systems.
Pythondesigndesign-patternsdesign-system
Vezi pe GitHub353,387
chalarangelo/30-seconds-of-code
Chalarangelo/30-seconds-of-code
128,121Vezi pe GitHub
30-seconds-of-code is a comprehensive knowledge base and programming snippet library designed to support software engineering education and professional development. It provides a curated collection of reusable code units and technical guides that help developers master core language mechanics, design patterns, and architectural philosophies. The project distinguishes itself by offering a wide-ranging library of algorithmic solutions and web development patterns that are organized into modular, independently testable units. It emphasizes functional programming paradigms and declarative logic,
Provides utilities for generating universally unique identifiers for data tracking.
JavaScriptastroawesome-listcss
Vezi pe GitHub128,121
immich-app/immich
immich-app/immich
104,236Vezi pe GitHub
Immich is a self-hosted media management platform designed to provide a centralized, private repository for photos and videos. It functions as a comprehensive system for organizing, backing up, and viewing personal media collections across mobile devices, web browsers, and external storage locations. By maintaining full control over data ownership and storage infrastructure, the platform ensures that users retain sovereignty over their digital assets. The system distinguishes itself through a distributed architecture that coordinates background media synchronization, real-time filesystem moni
Identifies duplicate files using checksum verification during the backup process to prevent redundant storage and optimize bandwidth usage.
TypeScriptbackup-toolfluttergoogle-photos
Vezi pe GitHub104,236
doocs/advanced-java
doocs/advanced-java
78,987Vezi pe GitHub
This project is a comprehensive Java backend engineering guide and technical reference focused on high-concurrency design, distributed systems, and microservices architecture. It provides detailed strategies for decomposing monolithic applications, managing service discovery, and implementing the architectural patterns required for scalable backend environments. The repository distinguishes itself through an extensive collection of big data algorithmic references and database scaling strategies. It covers memory-efficient techniques for analyzing massive datasets, such as Top-K element extrac
The system assigns each service its own private database or data store to avoid tight coupling.
Javaadvanced-javadistributed-search-enginedistributed-systems
Vezi pe GitHub78,987
caddyserver/caddy
caddyserver/caddy
73,492Vezi pe GitHub
Caddy is an extensible, modular web server platform designed for high-performance traffic management and automated security. At its core, it functions as a dynamic HTTP gateway that handles request routing, static asset delivery, and reverse proxying through a chain of configurable handler modules. The system is built on a modular architecture that allows developers to extend server functionality by registering custom components, all managed through a unified lifecycle and provisioning framework. What distinguishes Caddy is its focus on automated infrastructure and zero-downtime operations. I
Data portability utilities facilitate the safe export and import of storage contents, enabling seamless migration between different server environments.
Goacmeautomatic-httpscaddy
Vezi pe GitHub73,492
apache/superset
apache/superset
73,451Vezi pe GitHub
Superset is a web-based business intelligence platform designed for data exploration, visualization, and interactive dashboarding. It functions as a query-driven analytics engine that connects to various SQL databases, allowing users to perform ad-hoc analysis, define virtual metrics, and build complex data visualizations through a centralized interface. The platform distinguishes itself through a robust semantic layer that transforms raw database schemas into calculated columns and virtual metrics, enabling consistent business logic across an organization. It features a plugin-based visualiz
Demonstration datasets and example dashboards are included to facilitate immediate exploration of core functionality upon initial installation.
TypeScriptanalyticsapacheapache-superset
Vezi pe GitHub73,451
appflowy-io/appflowy
AppFlowy-IO/AppFlowy
72,474Vezi pe GitHub
AppFlowy is a local-first knowledge base and collaborative workspace platform designed for structured information management. It functions as a modular productivity suite where users organize content through a block-based document model, allowing for flexible nesting and granular manipulation of data. The system prioritizes data sovereignty by enabling self-hosted storage, ensuring that sensitive information remains under user control while maintaining offline accessibility. The platform distinguishes itself through a decoupled architecture that separates its high-performance, memory-safe cor
Enables advanced filtering, sorting, and grouping of tabular data to help users visualize and analyze complex datasets.
Dartblogconfluence-alternativecontent-management
Vezi pe GitHub72,474
protocolbuffers/protobuf
protocolbuffers/protobuf
71,359Vezi pe GitHub
Protocol Buffers este un mecanism neutru față de limbaj și platformă pentru serializarea datelor structurate. Oferă un toolchain bazat pe schemă care compilează definiții de date declarative în cod sursă type-safe, permițând o comunicare consistentă și contracte API puternic tipizate între servicii scrise în diferite limbaje de programare. Proiectul se distinge printr-un format binar de rețea extrem de eficient care utilizează codificarea bazată pe tag-uri și compresia întregilor cu lățime variabilă pentru a minimiza dimensiunea payload-ului și overhead-ul de procesare. Suportă gestionarea robustă a schemelor evolutive, permițând dezvoltatorilor să actualizeze structurile de date incremental, menținând în același timp compatibilitatea inversă și înainte. Acest lucru este susținut în continuare de un sistem de ediții versionate care gestionează seturile de funcționalități și logica de serializare între componentele software distribuite. Dincolo de serializarea binară de bază, proiectul include capabilități pentru conversia JSON canonică cu validarea schemei, controlul granular al vizibilității simbolurilor și urmărirea prezenței câmpurilor pentru a distinge între valorile implicite și cele nesetate. De asemenea, oferă optimizări specializate, cum ar fi gestionarea memoriei bazată pe arenă pentru implementările C++, pentru a îmbunătăți performanța în timpul creării și curățării arborilor de mesaje complecși.
Facilitates the evolution of data structures over time while preserving backward and forward compatibility for distributed systems.
C++marshallingprotobufprotobuf-runtime
Vezi pe GitHub71,359
leonardomso/33-js-concepts
leonardomso/33-js-concepts
66,467Vezi pe GitHub
This project is a comprehensive educational repository designed to help developers master the core mechanics, runtime behaviors, and browser-native capabilities of the JavaScript language. It provides a structured knowledge base that covers fundamental language features, such as prototype-based inheritance and event-loop-based concurrency, alongside advanced topics like JIT-compiled execution and memory management. The repository distinguishes itself by offering deep-dive technical guides that bridge the gap between abstract language concepts and practical browser implementation. It features
Master local storage mechanisms like IndexedDB and Web Storage APIs to maintain application state across browser sessions.
JavaScriptangularconceptses6
Vezi pe GitHub66,467
nocodb/nocodb
nocodb/nocodb
63,466Vezi pe GitHub
NocoDB is a visual platform that transforms relational databases into collaborative, spreadsheet-style workspaces. By acting as a headless database backend, it provides a unified environment for designing database structures, managing record relationships, and interacting with data without requiring manual SQL queries. The platform normalizes interactions across various SQL and NoSQL data sources, allowing users to manage complex datasets through a centralized interface. The project distinguishes itself by automatically generating RESTful and GraphQL APIs from existing database schemas, enabl
Empowers users to filter, sort, and group tabular data for clearer analysis and efficient information management.
TypeScriptairtableairtable-alternativeautomatic-api
Vezi pe GitHub63,466
xingshaocheng/architect-awesome
xingshaocheng/architect-awesome
60,821Vezi pe GitHub
This project serves as a comprehensive knowledge base and reference for distributed systems engineering and enterprise software architecture. It provides a structured collection of technical resources, design patterns, and methodologies intended to assist in the design, maintenance, and scaling of complex, high-performance software environments. The repository distinguishes itself by offering deep dives into core architectural concepts such as actor-based concurrency, aspect-oriented interception, and inversion-of-control containers. It emphasizes the practical application of distributed syst
Generate globally unique identifiers to ensure data consistency and prevent collisions across distributed nodes.
Vezi pe GitHub60,821
minio/minio
minio/minio
60,346Vezi pe GitHub
MinIO is a software-defined, cloud-native object storage server designed to manage large volumes of unstructured data. It functions as a distributed storage cluster that aggregates multiple independent nodes into a unified, scalable pool, providing a high-performance infrastructure compatible with standard cloud storage protocols and application programming interfaces. The system utilizes a shared-nothing architecture that eliminates central metadata servers, relying instead on a decentralized hash table to map objects across the cluster. Data availability and resilience are maintained throug
Splits data into fragments and distributes them across multiple drives to ensure availability during hardware or node failures.
Goamazon-s3cloudcloudnative
Vezi pe GitHub60,346
solido/awesome-flutter
Solido/awesome-flutter
60,327Vezi pe GitHub
This project is a community-curated directory of resources, libraries, and tools designed to support developers working with the Flutter framework. It functions as a centralized knowledge base, organizing high-quality external references into a structured, human-readable format to assist in the discovery of technical materials for cross-platform application development. The directory distinguishes itself through a comprehensive index of the global Flutter ecosystem, including local user groups, meetups, and communication channels that connect developers to international support networks. It m
Links to robust implementations for connecting applications to GraphQL services and managing remote data synchronization.
Dartandroidawesomeawesome-list
Vezi pe GitHub60,327
rails/rails
rails/rails
58,690Vezi pe GitHub
This project is a full-stack web framework designed for building database-backed applications through a standardized architectural pattern. It provides a comprehensive suite of integrated libraries that manage the entire request-response lifecycle, from routing incoming web traffic to rendering dynamic server-side templates. By utilizing an object-relational mapping layer, the framework allows developers to define domain models that map database tables directly to application objects, simplifying data persistence, schema migrations, and complex relationship management. The framework is distin
Links binary assets directly to database records, managing the storage and retrieval of files through integrated model associations.
Rubyactivejobactiverecordframework
Vezi pe GitHub58,690
pmndrs/zustand
pmndrs/zustand
58,371Vezi pe GitHub
Zustand is a state management library that provides a centralized store for managing shared application data. It functions as a reactive container that connects application state to components, allowing them to subscribe to specific slices of data and trigger updates automatically. By utilizing selector-based data access and immutable state updates, the library ensures that components only re-render when their observed data changes, maintaining a predictable and efficient data flow. The library distinguishes itself through a pluggable, middleware-based architecture that allows for the extensi
Simplifies updates to complex nested data by automating the creation of new, immutable object structures.
TypeScripthacktoberfesthooksreact
Vezi pe GitHub58,371
rclone/rclone
rclone/rclone
57,877Vezi pe GitHub
This project is a command-line storage manager that provides a unified interface for performing file operations across local filesystems and diverse cloud storage providers. It functions as a cross-platform storage abstraction, utilizing a modular backend architecture to map heterogeneous cloud storage APIs into a standard set of file system operations. This allows for consistent data management and movement regardless of the underlying storage service. The tool serves as a network data transfer engine designed for automated data migration and cloud storage synchronization. It distinguishes i
Maintains consistency between disparate data stores by automatically propagating file changes and metadata.
Goazure-blobazure-blob-storageazure-files
Vezi pe GitHub57,877
zylon-ai/private-gpt
zylon-ai/private-gpt
57,278Vezi pe GitHub
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to provide context-aware responses for chat and completion requests. The system distinguishes itself through a database-agnostic abstraction layer that supports various storage backends, ranging from local disk storage to enterprise-grade vector databases. It offers flexible deployment
Exposes metadata and identifiers for all stored documents to allow precise filtering and context selection during retrieval tasks.
Python
Vezi pe GitHub57,278
deepfakes/faceswap
deepfakes/faceswap
55,289Vezi pe GitHub
Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames. The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated process
Rebuilds alignment files by scanning directories of previously extracted faces to recover lost or corrupted metadata.
Pythondeep-face-swapdeep-learningdeep-neural-networks
Vezi pe GitHub55,289
ngosang/trackerslist
ngosang/trackerslist
54,183Vezi pe GitHub
This project is a curated, community-driven registry of public BitTorrent trackers designed to facilitate peer-to-peer file sharing. It serves as a centralized resource for network endpoints that coordinate connections between distributed clients, helping users discover and maintain reliable infrastructure for decentralized communication protocols. The repository distinguishes itself through a fully automated orchestration pipeline that ensures the lists remain current and accurate. Every day, background tasks perform distributed health monitoring to verify connectivity and filter out unrespo
Ensures list accuracy through continuous automated health monitoring and synchronization of network resources.
bittorrentbittorrent-trackerbittorrent-trackers
Vezi pe GitHub54,183
maybe-finance/maybe
maybe-finance/maybe
53,999Vezi pe GitHub
Maybe is a self-hosted financial platform designed for private deployment, providing a centralized interface to track investments, budgets, and net worth. By running the application on your own infrastructure, you maintain full control over your sensitive financial data and privacy. The platform is delivered as a containerized application suite, utilizing a declarative configuration framework to manage service lifecycles. It distinguishes itself through a structured approach to version control, allowing users to pin specific release tags to ensure environment consistency and perform controlle
Manages persistent data storage volumes during setup and maintenance to ensure state remains intact across service restarts.
Rubyfinancehotwirepersonal-finance
Vezi pe GitHub53,999