# qovery/replibyte

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/qovery-replibyte).**

4,381 stars · 137 forks · Rust · gpl-3.0

## Links

- GitHub: https://github.com/Qovery/Replibyte
- Homepage: https://www.replibyte.com
- awesome-repositories: https://awesome-repositories.com/repository/qovery-replibyte.md

## Topics

`aws` `backup` `cloud` `cloudnative` `database` `mongodb` `mysql` `postgres` `postgresql` `rust` `rust-lang` `s3`

## Description

Replibyte is a tool that automates the lifecycle of database snapshots for non-production environments, handling the export, anonymization, subsetting, and restoration of data. It is designed to support privacy-compliant development workflows by replacing sensitive production data with synthetic values and extracting consistent subsets of rows while preserving referential integrity.

The tool operates through a configurable pipeline defined in a YAML file, orchestrating stages such as dump, anonymize, subset, and restore. Each operation runs as an isolated, ephemeral container job, and snapshots are stored as encrypted files in remote object storage services like S3 or GCS. Replibyte also manages snapshot retention by automatically removing dumps based on age or count, and it can seed development databases with realistic, anonymized production data.

The project provides a command-line interface for configuring and triggering these operations, with support for running as a lifecycle job within deployment environments.

## Tags

### Artificial Intelligence & ML

- [Relational Database Subsetting](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-subset-extractions/relational-database-subsetting.md) — Extracts a configurable percentage of rows while following foreign-key relationships to maintain referential integrity.

### Data & Databases

- [Encrypted Snapshots](https://awesome-repositories.com/f/data-databases/compressed-database-dumps/encrypted-snapshots.md) — Stores database dumps as encrypted snapshots in remote object storage for secure data at rest.
- [Encrypted Snapshot Dumpers](https://awesome-repositories.com/f/data-databases/containerized-database-administration/encrypted-snapshot-dumpers.md) — Creates encrypted snapshots of databases and stores them in remote cloud object storage.
- [Non-Production Data Pipelines](https://awesome-repositories.com/f/data-databases/data-pipeline-automation/non-production-data-pipelines.md) — Automates export, anonymization, and restoration of database content for non-production environments.
- [Data Seeding Utilities](https://awesome-repositories.com/f/data-databases/data-seeding-utilities.md) — Restores production database snapshots into development environments for realistic testing. ([source](https://www.replibyte.com))
- [Production Snapshot Seeding](https://awesome-repositories.com/f/data-databases/database-seeding-tools/production-snapshot-seeding.md) — Restores anonymized production snapshots into non-production environments for realistic development testing.
- [Production Data Subsetting](https://awesome-repositories.com/f/data-databases/production-data-subsetting.md) — Extracts a configurable percentage of rows from database tables while preserving referential integrity.
- [Remote Object Storage Integrations](https://awesome-repositories.com/f/data-databases/object-storage/remote-object-storage-integrations.md) — Transfers and retrieves database snapshots directly from cloud object storage services like S3 or GCS.

### DevOps & Infrastructure

- [Database Snapshots](https://awesome-repositories.com/f/devops-infrastructure/cloud-backups/database-snapshots.md) — Automates dumping, encrypted storage, and lifecycle cleanup of database snapshots in cloud object storage.

### Security & Cryptography

- [Data Anonymization](https://awesome-repositories.com/f/security-cryptography/data-anonymization.md) — Replaces sensitive production data with synthetic values to meet privacy regulations before moving data to non-production environments.
- [Column-Level Anonymization](https://awesome-repositories.com/f/security-cryptography/data-anonymization/column-level-anonymization.md) — Automatically replaces specified columns with anonymized values during the restore process to protect private information. ([source](https://www.replibyte.com))
- [Database Anonymization Tools](https://awesome-repositories.com/f/security-cryptography/privacy-and-anonymity-tools/database-anonymization-tools.md) — Replaces sensitive production data with synthetic values during database restores to ensure privacy compliance.

### Software Engineering & Architecture

- [Configuration Workflows](https://awesome-repositories.com/f/software-engineering-architecture/configuration-workflows.md) — Defines the entire data lifecycle pipeline through a YAML configuration file for repeatable operations.
- [Database Lifecycle Pipelines](https://awesome-repositories.com/f/software-engineering-architecture/data-orchestration-pipelines/database-lifecycle-pipelines.md) — Orchestrates database operations as a sequence of dump, anonymize, subset, and restore stages.

### Testing & Quality Assurance

- [Production Data Seeding](https://awesome-repositories.com/f/testing-quality-assurance/data-snapshotting-for-testing/production-data-seeding.md) — Restores realistic, anonymized production snapshots into development or staging databases for accurate testing.

### Web Development

- [Anonymization Rules](https://awesome-repositories.com/f/web-development/data-validation/column-level-validation/anonymization-rules.md) — Replaces sensitive database column values with synthetic data during restore for privacy compliance.

### Development Tools & Productivity

- [Ephemeral Job Runners](https://awesome-repositories.com/f/development-tools-productivity/job-scheduling-resources/containerized-batch-job-schedulers/ephemeral-job-runners.md) — Executes each database lifecycle stage as an isolated, ephemeral container job.

### System Administration & Monitoring

- [Backup Lifecycle Automation](https://awesome-repositories.com/f/system-administration-monitoring/backup-management/backup-lifecycle-automation.md) — Automates retention policies by removing old or excess database dumps based on age or count.
