Why is plausible/analytics a recommended Data Warehouse Integrations GitHub Repositories repository?

Pipes analytics information into external data storage systems to support long-term data warehousing and complex analytical pipelines.

Why is beekeeper-studio/beekeeper-studio a recommended Data Warehouse Integrations GitHub Repositories repository?

Establishes secure connections to managed cloud database services and data warehouses using enterprise authentication.

Why is prefecthq/prefect a recommended Data Warehouse Integrations GitHub Repositories repository?

Provides secure connectivity for executing SQL queries and managing datasets in cloud data warehouses.

Why is cube-js/cube a recommended Data Warehouse Integrations GitHub Repositories repository?

Provides secure connectivity to enterprise data warehouses for consistent analytics.

Why is plotly/plotly.py a recommended Data Warehouse Integrations GitHub Repositories repository?

Links analytical applications to external data warehouses to enable automated processing and reporting across infrastructure.

Why is prestodb/presto a recommended Data Warehouse Integrations GitHub Repositories repository?

Provides secure, real-time interactive connectivity to cloud-hosted data warehouse clusters for SQL analysis.

Why is quarkusio/quarkus a recommended Data Warehouse Integrations GitHub Repositories repository?

Integrates managed relational database services like MySQL and PostgreSQL for persistent data storage.

Why is unstructured-io/unstructured a recommended Data Warehouse Integrations GitHub Repositories repository?

Transfers processed document data into specified databases, schemas, and tables within cloud-based data warehouses.

Why is dbt-labs/dbt-core a recommended Data Warehouse Integrations GitHub Repositories repository?

Establishes secure connectivity between the transformation engine and various cloud-hosted data warehouse platforms.

Why is heroiclabs/nakama a recommended Data Warehouse Integrations GitHub Repositories repository?

Streams raw player and system event data to external data warehouses for analytics.

38 Repos

Awesome GitHub RepositoriesData Warehouse Integrations

Tools for synchronizing analytics metrics into centralized data warehouses.

Distinguishing note: Focuses on long-term storage and warehouse synchronization rather than real-time reporting.

Explore 38 awesome GitHub repositories matching data & databases · Data Warehouse Integrations. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

plausible/analytics
plausible/analytics
24,245Auf GitHub ansehen
This project is an open-source, privacy-focused web analytics platform designed for high-throughput data ingestion and multi-tenant data management. It provides a cookie-less tracking engine that captures visitor interactions using ephemeral request metadata, ensuring comprehensive traffic visibility while maintaining strict privacy standards. The architecture utilizes an event-driven ingestion pipeline and aggregated metric storage to decouple data collection from processing, enabling efficient long-term retrieval and responsive dashboard performance. What distinguishes this platform is its
Pipes analytics information into external data storage systems to support long-term data warehousing and complex analytical pipelines.
Elixiranalyticschartsclickhouse
Auf GitHub ansehen24,245
beekeeper-studio/beekeeper-studio
beekeeper-studio/beekeeper-studio
22,030Auf GitHub ansehen
Beekeeper Studio is a cross-platform desktop application designed for database management and SQL development. It provides a unified graphical interface to connect to, query, and modify data across a wide range of relational and NoSQL database systems. The application functions as a comprehensive workspace, integrating tools for schema design, record editing, and data visualization. The project distinguishes itself through a focus on secure, flexible connectivity and AI-assisted workflows. It supports advanced authentication methods, including enterprise single sign-on, multi-factor authentic
Establishes secure connections to managed cloud database services and data warehouses using enterprise authentication.
TypeScriptbigquerycassandracockroachdb
Auf GitHub ansehen22,030
prefecthq/prefect
PrefectHQ/prefect
21,640Auf GitHub ansehen
Prefect is a workflow orchestration platform designed to define, schedule, and monitor complex data pipelines as Python code. It functions as a container-native engine that wraps individual tasks in isolated environments, ensuring consistent dependencies and resource allocation across diverse infrastructure. By utilizing a state-machine-based orchestration model, the system tracks execution progress through discrete transitions and persistent event logs to maintain reliable and observable task processing. The platform distinguishes itself through a decoupled worker-API architecture, which sep
Provides secure connectivity for executing SQL queries and managing datasets in cloud data warehouses.
Pythonautomationdatadata-engineering
Auf GitHub ansehen21,640
cube-js/cube
cube-js/cube
20,251Auf GitHub ansehen
Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools. The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orches
Provides secure connectivity to enterprise data warehouses for consistent analytics.
Rustagentic-analyticsagentsai
Auf GitHub ansehen20,251
plotly/plotly.py
plotly/plotly.py
18,270Auf GitHub ansehen
Plotly.py is a comprehensive framework for building production-ready data applications and interactive dashboards directly from Python code. It functions as both a high-performance visualization library for browser-based charts and a full-stack tool for transforming analytical scripts into responsive, web-based interfaces. By abstracting away the need for manual HTML or JavaScript, it allows developers to define complex layouts and functional logic using modular, reusable components. The framework distinguishes itself through a robust architecture that handles event orchestration and state sy
Links analytical applications to external data warehouses to enable automated processing and reporting across infrastructure.
Pythond3dashboarddeclarative
Auf GitHub ansehen18,270
prestodb/presto
prestodb/presto
16,711Auf GitHub ansehen
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Provides secure, real-time interactive connectivity to cloud-hosted data warehouse clusters for SQL analysis.
Javabig-datadatahadoop
Auf GitHub ansehen16,711
quarkusio/quarkus
quarkusio/quarkus
15,479Auf GitHub ansehen
Quarkus is a Kubernetes-native Java framework designed for building high-performance, memory-efficient applications. It utilizes ahead-of-time native compilation to transform Java code into standalone, optimized binaries that eliminate the need for a virtual machine, enabling rapid startup and reduced memory consumption. By performing code augmentation during the build phase, it shifts heavy processing tasks away from runtime, ensuring that applications are optimized for cloud-native environments. The framework distinguishes itself through a unified approach to reactive and imperative program
Integrates managed relational database services like MySQL and PostgreSQL for persistent data storage.
Javacloud-nativehacktoberfestjava
Auf GitHub ansehen15,479
unstructured-io/unstructured
Unstructured-IO/unstructured
14,019Auf GitHub ansehen
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
Transfers processed document data into specified databases, schemas, and tables within cloud-based data warehouses.
HTMLdata-pipelinesdeep-learningdocument-image-analysis
Auf GitHub ansehen14,019
dbt-labs/dbt-core
dbt-labs/dbt-core
13,051Auf GitHub ansehen
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
Establishes secure connectivity between the transformation engine and various cloud-hosted data warehouse platforms.
Rustanalyticsbusiness-intelligencedata-modeling
Auf GitHub ansehen13,051
heroiclabs/nakama
heroiclabs/nakama
12,754Auf GitHub ansehen
Nakama is a distributed server framework designed for real-time multiplayer games and social applications. It provides an authoritative runtime environment for executing game logic, ensuring consistent state and cheat-resistant gameplay across diverse client platforms. The system acts as a centralized backend, managing persistent player identities, social graphs, and real-time communication channels to support complex multiplayer interactions. The platform distinguishes itself through an integrated suite of LiveOps tools that allow developers to manage game economies, schedule time-bound even
Streams raw player and system event data to external data warehouses for analytics.
Gobackendbackend-as-a-servicechat-server
Auf GitHub ansehen12,754
debezium/debezium
debezium/debezium
12,421Auf GitHub ansehen
Debezium is a distributed change data capture platform that streams row-level database modifications as real-time events. By parsing database transaction logs, the system broadcasts structural and data changes to message brokers, enabling reactive processing and data integration across distributed architectures. The platform utilizes log-based capture to extract modifications directly from transaction logs, ensuring minimal impact on source system performance while maintaining the original commit order of operations. It employs database-specific connector adapters to translate proprietary bin
Maintains analytical stores by streaming live database updates into data warehouses for real-time intelligence.
Javaapache-kafkacdcchange-data-capture
Auf GitHub ansehen12,421
datahub-project/datahub
datahub-project/datahub
12,141Auf GitHub ansehen
DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations. The platform distinguishes itself through its focus on grounding artificial intelligence and autono
Executes generated queries across multiple warehouse types using standard drivers while maintaining a unified interface.
Pythondata-catalogdata-discoverydata-governance
Auf GitHub ansehen12,141
google/go-cloud
google/go-cloud
9,891Auf GitHub ansehen
go-cloud is a toolkit of cloud-agnostic libraries that provide portable Go interfaces for interacting with common cloud services. It enables multi-cloud application development by decoupling business logic from specific provider API implementations. The project utilizes a driver-based system to map generic interface calls to vendor-specific requests. This allows applications to switch between different cloud backends for blob storage, relational databases, and asynchronous publish-subscribe messaging without changing the core application code. Beyond storage and messaging, the toolkit includ
Integrates relational database services using portable connectors that prevent vendor lock-in.
Go
Auf GitHub ansehen9,891
pycaret/pycaret
pycaret/pycaret
9,811Auf GitHub ansehen
PyCaret is a Python AutoML platform and MLOps lifecycle manager designed to automate machine learning workflows. It functions as a low-code environment that leverages a scikit-learn native engine to execute preprocessing, training, and evaluation for tabular data. The platform distinguishes itself as an LLM-powered ML copilot, using large language model agents to analyze datasets, design experiment configurations, and explain model results. It also serves as a Kubernetes ML orchestrator and model registry, enabling the versioning of trained pipelines and their promotion to production API endp
Provides secure connectivity to cloud-hosted data warehouses and managed database services for importing experimentation data.
Pythonanomaly-detectionautomlclassification
Auf GitHub ansehen9,811
netflix/metaflow
Netflix/metaflow
9,764Auf GitHub ansehen
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Writes predictions and computation outputs to data warehouses or caches to power downstream systems.
Pythonagentsaiaws
Auf GitHub ansehen9,764
jeecgboot/jimureport
jeecgboot/jimureport
8,059Auf GitHub ansehen
JimuReport is an open-source reporting and dashboard engine designed to be embedded directly into Spring Boot applications. Its core identity centers on generating data reports and full-screen dashboards from natural language descriptions, eliminating the need for manual design. The platform also provides a conversational query interface that translates plain-language questions into database queries, returning results as tables and charts without requiring SQL knowledge. What distinguishes JimuReport is its integration of AI skills that can be installed with a single command, enabling report
Connects to Apache Doris data warehouse as a data source for reports and dashboards.
Javaaibibigscreen
Auf GitHub ansehen8,059
anthropics/knowledge-work-plugins
anthropics/knowledge-work-plugins
7,583Auf GitHub ansehen
This project is a plugin framework and agentic workflow library designed to connect large language models to professional toolstacks. It provides a system for integrating language models with external data warehouses, CRMs, and other enterprise software to retrieve and manipulate real-time business data. The framework enables the automation of specialized professional tasks through a file-based plugin definition system. It allows for the customization of domain expertise and plugin behavior to align with internal company processes, supported by an enterprise data connector that links models t
Provides secure connectivity modules for linking language models to cloud-hosted data warehouses and BI tools.
Python
Auf GitHub ansehen7,583
supabase/realtime
supabase/realtime
7,488Auf GitHub ansehen
Realtime is a real-time data distribution and synchronization engine that enables applications to stream database changes and coordinate state between clients. It functions as a synchronization layer that monitors database write-ahead logs to provide change data capture and pushes updates to authorized clients via WebSockets. The project features a real-time presence server for tracking the online status of active users and a broadcast service for sending ephemeral messages without database persistence. It organizes communication through channel-based message routing and uses a structured JSO
Streams database changes to external data warehouses in real time without manual pipelines.
Elixircdcchange-data-capturecrdt
Auf GitHub ansehen7,488
growthbook/growthbook
growthbook/growthbook
7,351Auf GitHub ansehen
GrowthBook is a feature flagging and experimentation platform that utilizes a warehouse-native approach to data analysis. It serves as a system for managing feature rollouts and conducting A/B tests by executing SQL queries directly against existing data warehouses to calculate experiment results. The platform is distinguished by its integration of a Model Context Protocol server, which allows AI coding assistants and IDEs to manage flags and query analytics using natural language. It also provides specialized capabilities for AI model optimization, enabling the testing of prompts and models
Provides secure connectivity to external data warehouses to enable warehouse-native analysis and experimentation.
TypeScriptab-testingabtestabtesting
Auf GitHub ansehen7,351
feast-dev/feast
feast-dev/feast
6,727Auf GitHub ansehen
Feast is an open-source feature store for machine learning that provides a central platform for defining, storing, and serving features across both training and inference workflows. It operates as a declarative system where feature definitions are written as code in Python files, synchronized to a central registry, and made available for low-latency online retrieval or point-in-time correct historical joins for training datasets. The project abstracts storage behind a pluggable architecture, allowing offline and online backends to be swapped without changing retrieval logic, and coordinates ma
Converts retrieved feature data into dataframes, Arrow tables, SQL, data lakes, or data warehouses for downstream use.
Pythonbig-datadata-engineeringdata-quality
Auf GitHub ansehen6,727