Why is alibaba/datax a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Uses a plugin-based connector architecture to decouple reader and writer logic, allowing extensions for new heterogeneous data sources.

Why is apache/flink-cdc a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Implements a distributed streaming ETL framework for filtering, transforming, and routing data in flight.

Why is apache/pinot a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Connects distributed processing frameworks to the datastore to enable reading and writing data within complex streaming pipelines.

Why is dlt-hub/dlt a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Provides a pluggable framework that automates schema evolution, incremental loading, and normalization for ETL workflows.

5 مستودعات

Awesome GitHub RepositoriesPlugin-Based ETL Frameworks

ETL systems that use a plugin architecture for readers and writers to extend connectivity to new data sources.

Distinct from ETL Workflows: Focuses on the plugin-based extensibility of the ETL process, whereas candidates focus on specific ETL types like Reverse ETL or Vector ETL.

Explore 5 awesome GitHub repositories matching data & databases · Plugin-Based ETL Frameworks. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

alibaba/datax
alibaba/DataX
17,241عرض على GitHub
DataX is a distributed data integration framework and plugin-based ETL tool designed for synchronizing large datasets between heterogeneous sources and destinations. It functions as a JDBC data migration engine and offline synchronization tool, enabling the movement of data between relational databases, NoSQL stores, and object storage. The system utilizes a plugin-based connector architecture that decouples reader and writer logic, allowing it to map and transform data types across different storage engines using a standardized internal representation. This design supports heterogeneous data
Uses a plugin-based connector architecture to decouple reader and writer logic, allowing extensions for new heterogeneous data sources.
Java
عرض على GitHub17,241
pentaho/pentaho-kettle
pentaho/pentaho-kettle
8,353عرض على GitHub
Pentaho Kettle هو منصة مؤسسية لدمج البيانات (ETL) مصممة لاستخراج وتحويل وتحميل البيانات بين المصادر المتباينة وقواعد البيانات المستهدفة. يعمل كمنظم قائم على البيانات الوصفية يستخدم مصمماً مرئياً لسير العمل لإنشاء وإدارة تسلسلات معقدة من مهام البيانات وخطوط أنابيب التحويل. يتميز النظام بمحرك معالجة بيانات موزع، يقوم بتنفيذ أعباء العمل عبر مجموعات من عقد الخادم لزيادة الإنتاجية. يستخدم بنية قائمة على الإضافات، مما يسمح بتوسيع المنصة عبر ملفات JAR خارجية لتوفير الاتصال بقواعد بيانات وخدمات سحابية متنوعة. تغطي المنصة مجموعة واسعة من قدرات دمج البيانات، بما في ذلك التحميل بالجملة، وإدارة الملفات عن بُعد، وتحويل هيكل البيانات. توفر أدوات للتحقق من جودة البيانات، وأتمتة خطوط الأنابيب، وإدارة دورة حياة الوظائف، إلى جانب أدوات مراقبة لتتبع صحة الخادم وحالة التنفيذ في الوقت الفعلي.
Provides an ETL system using a plugin architecture for readers and writers to extend connectivity to new data sources.
Java
عرض على GitHub8,353
apache/flink-cdc
apache/flink-cdc
6,430عرض على GitHub
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
Implements a distributed streaming ETL framework for filtering, transforming, and routing data in flight.
Javabatchcdcchange-data-capture
عرض على GitHub6,430
apache/pinot
apache/pinot
6,098عرض على GitHub
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Connects distributed processing frameworks to the datastore to enable reading and writing data within complex streaming pipelines.
Java
عرض على GitHub6,098
dlt-hub/dlt
dlt-hub/dlt
5,472عرض على GitHub
dlt هي أداة لاستيعاب البيانات بلغة Python وإطار عمل لخط أنابيب ETL مصمم لجلب البيانات من مصادر متنوعة وحفظها في وجهات مهيكلة. تعمل كمحرك لاستنتاج المخطط (schema inference) يكتشف تلقائياً أنواع البيانات ويسطح هياكل JSON المتداخلة في جداول علائقية، ناقلاً البيانات من المصادر إلى بحيرات البيانات، أو المستودعات، أو قواعد بيانات المتجهات. يتميز المشروع بتوليد خط أنابيب مدعوم بالذكاء الاصطناعي، باستخدام نماذج لغات كبيرة لسقالات كود الاستخراج والموصلات لـ REST APIs. كما يدعم تخزين المتجهات متعدد الوسائط والتعبئة المتخصصة لقواعد بيانات المتجهات لدعم تطبيقات الذكاء الاصطناعي والتعلم الآلي. يغطي إطار العمل مجموعة واسعة من القدرات بما في ذلك تطور المخطط المؤتمت، وتحميل البيانات التزايدي عبر تتبع الحالة، والتحقق من جودة البيانات من خلال فرض عقود البيانات. يوفر أدوات لتطبيع البيانات العلائقية، وتحويلات ما قبل وما بعد التحميل، ومجموعة متنوعة من محولات الوجهة لقواعد بيانات SQL ومخازن الكائنات السحابية. تتم إدارة المراقبة من خلال لوحات معلومات تنفيذ خط الأنابيب، وتتبع نسب الأعمدة، والتحقق من إصدار المخطط باستخدام التجزئات القائمة على المحتوى.
Provides a pluggable framework that automates schema evolution, incremental loading, and normalization for ETL workflows.
Pythondatadata-engineeringdata-lake
عرض على GitHub5,472

Awesome Plugin-Based ETL Frameworks GitHub Repositories

alibaba/DataX

pentaho/pentaho-kettle

apache/flink-cdc

apache/pinot

dlt-hub/dlt

استكشف الوسوم الفرعية