Why is alibaba/datax a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Uses a plugin-based connector architecture to decouple reader and writer logic, allowing extensions for new heterogeneous data sources.

Why is apache/flink-cdc a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Implements a distributed streaming ETL framework for filtering, transforming, and routing data in flight.

Why is apache/pinot a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Connects distributed processing frameworks to the datastore to enable reading and writing data within complex streaming pipelines.

Why is dlt-hub/dlt a recommended Plugin-Based ETL Frameworks GitHub Repositories repository?

Provides a pluggable framework that automates schema evolution, incremental loading, and normalization for ETL workflows.

5 रिपॉजिटरी

Awesome GitHub RepositoriesPlugin-Based ETL Frameworks

ETL systems that use a plugin architecture for readers and writers to extend connectivity to new data sources.

Distinct from ETL Workflows: Focuses on the plugin-based extensibility of the ETL process, whereas candidates focus on specific ETL types like Reverse ETL or Vector ETL.

Explore 5 awesome GitHub repositories matching data & databases · Plugin-Based ETL Frameworks. Refine with filters or upvote what's useful.

AI के साथ बेहतरीन रिपॉजिटरी खोजें।हम AI का उपयोग करके सबसे सटीक रिपॉजिटरी खोजेंगे।

alibaba/datax
alibaba/DataX
17,241GitHub पर देखें
DataX is a distributed data integration framework and plugin-based ETL tool designed for synchronizing large datasets between heterogeneous sources and destinations. It functions as a JDBC data migration engine and offline synchronization tool, enabling the movement of data between relational databases, NoSQL stores, and object storage. The system utilizes a plugin-based connector architecture that decouples reader and writer logic, allowing it to map and transform data types across different storage engines using a standardized internal representation. This design supports heterogeneous data
Uses a plugin-based connector architecture to decouple reader and writer logic, allowing extensions for new heterogeneous data sources.
Java
GitHub पर देखें17,241
pentaho/pentaho-kettle
pentaho/pentaho-kettle
8,353GitHub पर देखें
Pentaho Kettle एक एंटरप्राइज़ ETL डेटा इंटीग्रेशन प्लेटफ़ॉर्म है जिसे अलग-अलग स्रोतों और टारगेट डेटाबेस के बीच डेटा को एक्सट्रैक्ट, ट्रांसफ़ॉर्म और लोड करने के लिए डिज़ाइन किया गया है। यह एक मेटाडेटा-संचालित ऑर्केस्ट्रेटर के रूप में कार्य करता है जो डेटा कार्यों और ट्रांसफ़ॉर्मेशन पाइपलाइन्स के जटिल अनुक्रमों को बनाने और प्रबंधित करने के लिए एक विज़ुअल वर्कफ़्लो डिज़ाइनर का उपयोग करता है। यह सिस्टम अपने वितरित डेटा प्रोसेसिंग इंजन द्वारा विशिष्ट है, जो थ्रूपुट बढ़ाने के लिए सर्वर नोड्स के क्लस्टर्स पर वर्कलोड निष्पादित करता है। यह प्लगइन-आधारित आर्किटेक्चर का उपयोग करता है, जिससे प्लेटफ़ॉर्म को विविध डेटाबेस और क्लाउड सर्विसेज से कनेक्टिविटी प्रदान करने के लिए बाहरी JAR फाइलों के माध्यम से विस्तारित किया जा सकता है। यह प्लेटफ़ॉर्म बल्क लोडिंग, रिमोट फाइल मैनेजमेंट और डेटा स्ट्रक्चर ट्रांसफ़ॉर्मेशन सहित डेटा इंटीग्रेशन क्षमताओं की एक विस्तृत श्रृंखला को कवर करता है। यह सर्वर हेल्थ और रीयल-टाइम निष्पादन स्थिति को ट्रैक करने के लिए मॉनिटरिंग यूटिलिटीज़ के साथ-साथ डेटा क्वालिटी वैलिडेशन, पाइपलाइन ऑटोमेशन और जॉब लाइफसाइकिल मैनेजमेंट के लिए टूल्स प्रदान करता है।
Provides an ETL system using a plugin architecture for readers and writers to extend connectivity to new data sources.
Java
GitHub पर देखें8,353
apache/flink-cdc
apache/flink-cdc
6,430GitHub पर देखें
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
Implements a distributed streaming ETL framework for filtering, transforming, and routing data in flight.
Javabatchcdcchange-data-capture
GitHub पर देखें6,430
apache/pinot
apache/pinot
6,098GitHub पर देखें
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Connects distributed processing frameworks to the datastore to enable reading and writing data within complex streaming pipelines.
Java
GitHub पर देखें6,098
dlt-hub/dlt
dlt-hub/dlt
5,472GitHub पर देखें
dlt एक Python डेटा इंजेक्शन टूल और ETL पाइपलाइन फ्रेमवर्क है जिसे विविध स्रोतों से डेटा लाने और इसे संरचित गंतव्यों में बनाए रखने के लिए डिज़ाइन किया गया है। यह एक स्कीमा इंफरेंस इंजन के रूप में कार्य करता है जो स्वचालित रूप से डेटा प्रकारों का पता लगाता है और नेस्टेड JSON संरचनाओं को रिलेशनल टेबल में समतल (flatten) करता है, डेटा को स्रोतों से लेकहाउस, वेयरहाउस या वेक्टर डेटाबेस में ले जाता है। यह प्रोजेक्ट AI-संचालित पाइपलाइन निर्माण के माध्यम से खुद को अलग करता है, जो REST API के लिए एक्सट्रैक्शन कोड और कनेक्टर को स्कैफ़ोल्ड करने के लिए लार्ज लैंग्वेज मॉडल का उपयोग करता है। यह AI और मशीन लर्निंग एप्लिकेशन का समर्थन करने के लिए मल्टीमॉडल वेक्टर स्टोरेज और वेक्टर डेटाबेस की विशेष आबादी का भी समर्थन करता है। यह फ्रेमवर्क स्वचालित स्कीमा इवोल्यूशन, स्टेट ट्रैकिंग के माध्यम से इंक्रीमेंटल डेटा लोडिंग, और डेटा कॉन्ट्रैक्ट्स के प्रवर्तन के माध्यम से डेटा गुणवत्ता वैलिडेशन सहित क्षमताओं की एक विस्तृत श्रृंखला को कवर करता है। यह रिलेशनल डेटा नॉर्मलाइज़ेशन, प्री- और पोस्ट-लोड ट्रांसफ़ॉर्मेशन, और SQL डेटाबेस व क्लाउड ऑब्जेक्ट स्टोर के लिए विभिन्न डेस्टिनेशन एडेप्टर के लिए उपकरण प्रदान करता है। ऑब्जर्वेबिलिटी को पाइपलाइन निष्पादन डैशबोर्ड, कॉलम लाइनएज ट्रैकिंग और कंटेंट-आधारित हैश का उपयोग करके स्कीमा वर्ज़न वेरिफिकेशन के माध्यम से संभाला जाता है।
Provides a pluggable framework that automates schema evolution, incremental loading, and normalization for ETL workflows.
Pythondatadata-engineeringdata-lake
GitHub पर देखें5,472

Awesome Plugin-Based ETL Frameworks GitHub Repositories

alibaba/DataX

pentaho/pentaho-kettle

apache/flink-cdc

apache/pinot

dlt-hub/dlt

सब-टैग एक्सप्लोर करें