Why is alibaba/datax a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Uses a plugin-based connector architecture to decouple reader and writer logic, allowing extensions for new heterogeneous data sources.

Why is apache/flink-cdc a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Implements a distributed streaming ETL framework for filtering, transforming, and routing data in flight.

Why is apache/pinot a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Connects distributed processing frameworks to the datastore to enable reading and writing data within complex streaming pipelines.

Why is dlt-hub/dlt a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Provides a pluggable framework that automates schema evolution, incremental loading, and normalization for ETL workflows.

5 repository-uri

Awesome GitHub RepositoriesPlugin-Based ETL Frameworks

ETL systems that use a plugin architecture for readers and writers to extend connectivity to new data sources.

Distinct from ETL Workflows: Focuses on the plugin-based extensibility of the ETL process, whereas candidates focus on specific ETL types like Reverse ETL or Vector ETL.

Explore 5 awesome GitHub repositories matching data & databases · Plugin-Based ETL Frameworks. Refine with filters or upvote what's useful.

Găsește cele mai bune repo-uri cu AI.Vom căuta cele mai potrivite repository-uri folosind AI.

alibaba/datax
alibaba/DataX
17,241Vezi pe GitHub
DataX is a distributed data integration framework and plugin-based ETL tool designed for synchronizing large datasets between heterogeneous sources and destinations. It functions as a JDBC data migration engine and offline synchronization tool, enabling the movement of data between relational databases, NoSQL stores, and object storage. The system utilizes a plugin-based connector architecture that decouples reader and writer logic, allowing it to map and transform data types across different storage engines using a standardized internal representation. This design supports heterogeneous data
Uses a plugin-based connector architecture to decouple reader and writer logic, allowing extensions for new heterogeneous data sources.
Java
Vezi pe GitHub17,241
pentaho/pentaho-kettle
pentaho/pentaho-kettle
8,353Vezi pe GitHub
Pentaho Kettle este o platformă enterprise de integrare a datelor ETL, concepută pentru a extrage, transforma și încărca date între surse disparate și baze de date țintă. Funcționează ca un orchestrator bazat pe metadate care utilizează un designer vizual de flux de lucru pentru a crea și gestiona secvențe complexe de sarcini de date și pipeline-uri de transformare. Sistemul se distinge prin motorul său de procesare distribuită a datelor, care execută sarcinile de lucru pe clustere de noduri de server pentru a crește throughput-ul. Utilizează o arhitectură bazată pe plugin-uri, permițând extinderea platformei prin fișiere JAR externe pentru a oferi conectivitate către diverse baze de date și servicii cloud. Platforma acoperă o gamă largă de capabilități de integrare a datelor, inclusiv încărcarea în masă, gestionarea fișierelor la distanță și transformarea structurii datelor. Oferă instrumente pentru validarea calității datelor, automatizarea pipeline-urilor și gestionarea ciclului de viață al joburilor, alături de utilitare de monitorizare pentru urmărirea stării de sănătate a serverului și a stării de execuție în timp real.
Provides an ETL system using a plugin architecture for readers and writers to extend connectivity to new data sources.
Java
Vezi pe GitHub8,353
apache/flink-cdc
apache/flink-cdc
6,430Vezi pe GitHub
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
Implements a distributed streaming ETL framework for filtering, transforming, and routing data in flight.
Javabatchcdcchange-data-capture
Vezi pe GitHub6,430
apache/pinot
apache/pinot
6,098Vezi pe GitHub
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Connects distributed processing frameworks to the datastore to enable reading and writing data within complex streaming pipelines.
Java
Vezi pe GitHub6,098
dlt-hub/dlt
dlt-hub/dlt
5,472Vezi pe GitHub
dlt este un instrument de ingestie a datelor Python și un framework de pipeline ETL conceput pentru a prelua date din surse diverse și a le persista în destinații structurate. Funcționează ca un motor de inferență a schemei care detectează automat tipurile de date și aplatizează structurile JSON imbricate în tabele relaționale, mutând datele din surse către lakehouse-uri, depozite de date sau baze de date vectoriale. Proiectul se distinge prin generarea de pipeline-uri bazată pe AI, utilizând modele lingvistice mari pentru a crea codul de extracție și conectorii pentru API-urile REST. De asemenea, suportă stocarea vectorială multimodală și popularea specializată a bazelor de date vectoriale pentru a susține aplicațiile AI și machine learning. Framework-ul acoperă o gamă largă de capabilități, inclusiv evoluția automată a schemei, încărcarea incrementală a datelor prin urmărirea stării și validarea calității datelor prin aplicarea contractelor de date. Oferă instrumente pentru normalizarea datelor relaționale, transformări pre- și post-încărcare și o varietate de adaptoare de destinație pentru baze de date SQL și stocare de obiecte în cloud. Observabilitatea este gestionată prin dashboard-uri de execuție a pipeline-ului, urmărirea lineage-ului coloanelor și verificarea versiunii schemei folosind hash-uri bazate pe conținut.
Provides a pluggable framework that automates schema evolution, incremental loading, and normalization for ETL workflows.
Pythondatadata-engineeringdata-lake
Vezi pe GitHub5,472

Awesome Plugin-Based ETL Frameworks GitHub Repositories

alibaba/DataX

pentaho/pentaho-kettle

apache/flink-cdc

apache/pinot

dlt-hub/dlt

Explorează sub-etichetele