What are the best Awesome Declarative Pipeline Construction GitHub Repositories?

Defining data workflows as static graphs optimized before execution. Explore 10 awesome GitHub repositories matching data & databases · Declarative Pipeline Construction. Refine with filters or upvote what's useful. Top picks: pathwaycom/pathway, ffmpeg/ffmpeg, mikefarah/yq, taskflow/taskflow, benthosdev/benthos, fluent-ffmpeg/node-fluent-ffmpeg, fastai/course-v3, ucbepic/docetl, fastai/course22, astronomer/dag-factory.

Why is pathwaycom/pathway a recommended Declarative Pipeline Construction GitHub Repositories repository?

Defines complex data transformation workflows as static, optimized graphs before execution.

Why is ffmpeg/ffmpeg a recommended Declarative Pipeline Construction GitHub Repositories repository?

Constructs non-linear processing pipelines that support multiple inputs and outputs to perform advanced tasks like video overlaying or audio mixing.

Why is mikefarah/yq a recommended Declarative Pipeline Construction GitHub Repositories repository?

Chains multiple data operations through standard input and output streams to enable complex transformations via shell piping.

Why is taskflow/taskflow a recommended Declarative Pipeline Construction GitHub Repositories repository?

Builds multi-stage data processing pipelines where stages execute either serially or in parallel to transform data.

Why is benthosdev/benthos a recommended Declarative Pipeline Construction GitHub Repositories repository?

Defines data workflows as static graphs via a single configuration file that is optimized before execution.

Why is fluent-ffmpeg/node-fluent-ffmpeg a recommended Declarative Pipeline Construction GitHub Repositories repository?

Enables the construction of non-linear processing pipelines using complex filtergraphs for media mixing and overlays.

Why is fastai/course-v3 a recommended Declarative Pipeline Construction GitHub Repositories repository?

Utilizes structured data block blueprints to declaratively define how raw data is assembled into model-ready batches.

Why is ucbepic/docetl a recommended Declarative Pipeline Construction GitHub Repositories repository?

Implements a declarative interface for defining complex data operations and workflows to transform unstructured datasets into tables.

Why is fastai/course22 a recommended Declarative Pipeline Construction GitHub Repositories repository?

Constructs custom data processing pipelines using a declarative block API.

Why is astronomer/dag-factory a recommended Declarative Pipeline Construction GitHub Repositories repository?

Constructs data pipelines by parsing configuration files, allowing users to define workflow structures without manual procedural code.

10 repositorios

Awesome GitHub RepositoriesDeclarative Pipeline Construction

Defining data workflows as static graphs optimized before execution.

Explore 10 awesome GitHub repositories matching data & databases · Declarative Pipeline Construction. Refine with filters or upvote what's useful.

Encuentra los mejores repositorios con IA.Buscaremos los repositorios que mejor coincidan usando IA.

pathwaycom/pathway
pathwaycom/pathway
62,959Ver en GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
Defines complex data transformation workflows as static, optimized graphs before execution.
Pythonbatch-processingdata-analyticsdata-pipelines
Ver en GitHub62,959
ffmpeg/ffmpeg
FFmpeg/FFmpeg
61,176Ver en GitHub
FFmpeg is a cross-platform multimedia framework designed for the recording, conversion, and streaming of audio and video content. It functions as a comprehensive toolkit that provides both a command-line utility for direct media manipulation and a collection of low-level libraries for integration into custom applications. At its core, the project utilizes a packet-based stream engine and a format-agnostic abstraction layer to handle diverse media standards, containers, and network protocols. The framework distinguishes itself through a modular, graph-based filter execution model that allows f
Constructs non-linear processing pipelines that support multiple inputs and outputs to perform advanced tasks like video overlaying or audio mixing.
Caudiocffmpeg
Ver en GitHub61,176
mikefarah/yq
mikefarah/yq
14,913Ver en GitHub
This tool is a command-line processor designed for querying, updating, and transforming structured data files. It functions as a versatile engine for manipulating YAML, JSON, TOML, and XML documents, allowing users to perform complex operations directly from the terminal. By utilizing a path-based expression language, it enables precise navigation and modification of data structures within configuration files and infrastructure-as-code workflows. What distinguishes this tool is its ability to perform in-place document mutations while preserving original formatting, comments, and metadata. It
Chains multiple data operations through standard input and output streams to enable complex transformations via shell piping.
Gobashclicsv
Ver en GitHub14,913
taskflow/taskflow
taskflow/taskflow
12,013Ver en GitHub
Taskflow is a C++ task-parallel framework designed to build high-performance parallel workflows and complex dependency graphs. It provides a programming model that organizes computational work into directed acyclic graphs, enabling developers to manage concurrency, resource scheduling, and task dependencies across multi-core CPUs and GPU accelerators. The framework distinguishes itself through its ability to orchestrate heterogeneous systems, allowing for the integration of hardware-accelerated kernels and memory operations into unified execution pipelines. It supports dynamic runtime subflow
Builds multi-stage data processing pipelines where stages execute either serially or in parallel to transform data.
C++concurrent-programmingcuda-programminggpu-programming
Ver en GitHub12,013
benthosdev/benthos
benthosdev/benthos
8,681Ver en GitHub
Benthos is a stream processing engine and data integration pipeline used for routing, transforming, and connecting data streams between diverse sources and sinks. It functions as event routing middleware and a change data capture tool, streaming real-time database modifications as discrete events for downstream processing. The system utilizes a declarative pipeline configuration, where data flow and processing logic are defined in a single static file. It features a specialized domain-specific language for mapping, filtering, and enriching data payloads, allowing for complex transformations w
Defines data workflows as static graphs via a single configuration file that is optimized before execution.
Go
Ver en GitHub8,681
fluent-ffmpeg/node-fluent-ffmpeg
fluent-ffmpeg/node-fluent-ffmpeg
8,251Ver en GitHub
node-fluent-ffmpeg es un envoltorio de Node.js para FFmpeg que proporciona una interfaz fluida para ejecutar comandos multimedia y procesar archivos. Funciona como un gestor de procesos que maneja el ciclo de vida de los binarios externos de FFmpeg, permitiendo la transcodificación multimedia programática, la generación de miniaturas de video y la extracción de metadatos a través de ffprobe. La biblioteca se distingue por un constructor de comandos que traduce llamadas a métodos de JavaScript en argumentos de línea de comandos. Cuenta con monitoreo de progreso basado en eventos para rastrear fotogramas procesados y rendimiento, así como la capacidad de enrutar datos multimedia procesados directamente a flujos escribibles para su manejo en tiempo real. El proyecto cubre amplias capacidades de procesamiento multimedia, incluyendo la configuración de codificación para propiedades de audio y video, definiciones complejas de filtergraph para efectos visuales y de audio, y gestión de entrada para concatenar múltiples fuentes. También incluye herramientas para sondear contenedores y flujos multimedia para recuperar metadatos técnicos.
Enables the construction of non-linear processing pipelines using complex filtergraphs for media mixing and overlays.
JavaScript
Ver en GitHub8,251
fastai/course-v3
fastai/course-v3
4,914Ver en GitHub
Este repositorio es un programa educativo integral y un framework de deep learning diseñado para enseñar aprendizaje profundo práctico usando PyTorch a través de notebooks y ejemplos de código. Sirve como una librería de alto nivel para construir, entrenar y desplegar redes neuronales, actuando como un orquestador de entrenamiento de modelos que coordina modelos de PyTorch, optimizadores y funciones de pérdida. El proyecto proporciona kits de herramientas especializados para visión artificial, procesamiento de lenguaje natural y preprocesamiento de datos tabulares. Se distingue por controles de entrenamiento avanzados como tasas de aprendizaje discriminativas, un sistema de callbacks bidireccional para personalizar la lógica de entrenamiento y una abstracción de learner de alto nivel que automatiza la colocación en dispositivos y los bucles de entrenamiento. El framework cubre una amplia superficie de capacidades, incluyendo la construcción automatizada de pipelines de datos, análisis de arquitectura de modelos y evaluación de rendimiento en tareas de clasificación, regresión y segmentación. También incluye utilidades para entrenamiento distribuido en múltiples GPUs, entrenamiento de precisión mixta para optimización de memoria y soporte especializado para datos de imágenes médicas. El proyecto se entrega como una serie de Jupyter Notebooks.
Utilizes structured data block blueprints to declaratively define how raw data is assembled into model-ready batches.
Jupyter Notebookdata-sciencedeep-learningfastai
Ver en GitHub4,914
ucbepic/docetl
ucbepic/docetl
3,597Ver en GitHub
docetl is an AI-powered document ETL tool and map-reduce orchestrator designed to transform large collections of unstructured documents into structured, queryable tables using language models. It provides a declarative pipeline framework for extracting, cleaning, and transforming data from sources such as PDFs and text files into predefined schemas. The project distinguishes itself through a semantic data integration suite that enables joining datasets and resolving duplicate entities based on embedding-based similarity. It includes an interactive prompt playground for developing and optimizi
Implements a declarative interface for defining complex data operations and workflows to transform unstructured datasets into tables.
Pythonagentsdatadata-pipelines
Ver en GitHub3,597
fastai/course22
fastai/course22
3,398Ver en GitHub
This is a structured deep learning curriculum for programmers, delivered as a collection of Jupyter notebooks. It teaches the fundamentals of training neural networks for computer vision, natural language processing, tabular data analysis, and collaborative filtering using PyTorch and the fastai library. The course is designed to be hands-on, guiding learners from building a training loop from scratch to fine-tuning pretrained models for a variety of practical tasks. The curriculum distinguishes itself by covering the full lifecycle of a deep learning project, from data preparation and augmen
Constructs custom data processing pipelines using a declarative block API.
Jupyter Notebookdeep-learningfastaijupyter-notebooks
Ver en GitHub3,398
astronomer/dag-factory
astronomer/dag-factory
1,440Ver en GitHub
Dag-factory es un framework para construir y gestionar pipelines de datos de Apache Airflow a través de archivos de configuración declarativos. Al reemplazar el código procedimental manual con definiciones YAML estructuradas, permite la generación programática de estructuras de flujo de trabajo complejas, dependencias de tareas y cronogramas de ejecución. El proyecto destaca por mapear claves de configuración directamente a constructores de clases y operadores de Python, permitiendo la instanciación dinámica de objetos y lógica personalizada. Admite la herencia de configuración jerárquica para estandarizar la configuración en todos los entornos y proporciona mecanismos para inyectar especificaciones de pods de Kubernetes directamente en las definiciones de tareas para asegurar una ejecución aislada y escalable. El framework cubre el ciclo de vida completo del pipeline, incluyendo el descubrimiento automatizado de archivos, el mapeo dinámico a nivel de tarea para el procesamiento paralelo y la adjunción de metadatos para la integración con sistemas externos. También incluye utilidades de línea de comandos para validar configuraciones, activar ejecuciones y gestionar migraciones de entorno.
Constructs data pipelines by parsing configuration files, allowing users to define workflow structures without manual procedural code.
Pythonairflowapache-airflowdags
Ver en GitHub1,440

Awesome Declarative Pipeline Construction GitHub Repositories

pathwaycom/pathway

FFmpeg/FFmpeg

mikefarah/yq

taskflow/taskflow

benthosdev/benthos

fluent-ffmpeg/node-fluent-ffmpeg

fastai/course-v3

ucbepic/docetl

fastai/course22

astronomer/dag-factory

Explorar subetiquetas