Kedro is a data science pipeline framework and orchestration tool designed to build reproducible and modular data engineering workflows. It functions as an MLOps project template and Python data workflow tool that enforces software engineering best practices to move projects from prototype to production.
The system distinguishes itself through a centralized data catalog manager that abstracts data access and versioning across various file formats and cloud storage systems. It further separates processing logic from data access via a lazy-loading data registry and provides a standardized project structure to ensure consistency and maintainability across teams.
The framework covers pipeline orchestration through automatic dependency resolution and visualization, alongside configuration management for environment-specific settings. It includes capabilities for multi-platform deployment across local machines and distributed clusters, as well as integration with interactive notebooks for data exploration.
The toolkit provides a command line interface for workflow execution and includes utilities for commit performance benchmarking and regression analysis.