1 Repo
Data processing frameworks that execute pipeline steps inside isolated containers for reproducibility and portability.
Distinct from Data Processing Frameworks: Distinct from Data Processing Frameworks: emphasizes container-based execution and language-agnostic pipeline steps rather than general data transformation libraries.
Explore 1 awesome GitHub repository matching data & databases · Containerized Pipelines. Refine with filters or upvote what's useful.
Pachyderm is a containerized, versioned, and lineage-tracked data pipeline platform that runs natively on Kubernetes. It combines a distributed file system backend with immutable data versioning, so every commit to a data repository creates an auditable snapshot, and every pipeline step executes as an isolated container. The platform is defined by a data-centric pipeline model where pipelines are specified by their input and output data repositories rather than explicit task sequences, and provenance is recorded as a directed acyclic graph of commits linking output data to its input sources an
Runs language-agnostic data pipelines inside containers for reproducible and portable data transformations.