8 مستودعات
Unified SQL interfaces for querying data stored across distributed storage systems like HDFS and HBase.
Distinct from Distributed Storage: Distinct from Distributed Storage: focuses on the SQL query layer over distributed storage, not the storage architecture itself.
Explore 8 awesome GitHub repositories matching data & databases · SQL Query Interfaces. Refine with filters or upvote what's useful.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Provides a SQL interface to query aggregated event data, enabling unified views across distributed microservice architectures.
Apache Hive is a SQL-on-Hadoop data warehouse that enables querying and managing petabytes of data stored in distributed storage such as HDFS and cloud storage services. It provides a familiar SQL interface for batch analytics and reporting, supported by a core set of components including the HiveServer2 Thrift service for remote query execution, the Hive Metastore Service for central metadata management, the Hive ACID Transaction Engine for concurrent read-write operations, and the Hive LLAP Interactive Engine for low-latency analytical processing. The WebHCat REST API offers an HTTP interfac
Queries files stored in HDFS, HBase, or other storage systems through a unified SQL interface.
GreptimeDB is a distributed, open-source time-series database built for unified observability. It stores and queries metrics, logs, and traces together in a single columnar engine, supporting both SQL and PromQL for analysis. The database is designed as a Kubernetes-native operator with a decoupled compute and storage architecture, enabling horizontal scaling and multi-region deployment. What distinguishes GreptimeDB is its role as a multi-protocol ingestion gateway, accepting data through OpenTelemetry, Prometheus Remote Write, InfluxDB, Loki, Elasticsearch, Kafka, and MQTT protocols without
Stores and queries metrics, logs, and traces together in a single columnar database with SQL and PromQL.
Helicone is an AI gateway and observability platform designed to intercept, manage, and monitor interactions with large language models. By acting as a reverse-proxy, it provides a centralized layer for routing requests across multiple AI providers, allowing developers to maintain consistent application logic while gaining deep visibility into model performance, usage, and costs. The platform distinguishes itself through a robust suite of traffic management and prompt engineering tools. It enables policy-driven control, including automatic failover between providers, rate limiting, and edge-b
Stores and retrieves frequently used SQL queries for data analysis and reporting.
Uptrace is an OpenTelemetry-based observability platform designed to collect, store, and analyze distributed traces, metrics, and logs. It functions as a centralized logging backend, a distributed tracing system, and a metrics engine to monitor application performance and system health. The platform is distinguished by AI-powered operational capabilities, allowing users to query telemetry data and manage monitoring dashboards using natural language. It specifically includes specialized monitoring for generative AI pipelines, tracking token usage and response quality for LLM interactions and r
Filters and aggregates spans, logs, and events using a unified query language for performance analysis.
Logfire is an OpenTelemetry observability platform and Python application monitoring tool. It provides a suite of tools for collecting, storing, and querying spans, logs, and metrics to monitor application performance and execution. The platform features a specialized monitor for Pydantic data validation, tracking data flow and validation outcomes in real time. It also includes a telemetry analysis tool that uses standard SQL to query observability data and connect to business intelligence tools. The system provides automatic instrumentation for Python libraries and frameworks, allowing for
Allows querying metrics, logs, and traces together using standard SQL for deep telemetry analysis and business intelligence.
s5cmd is a command line interface for high-performance data transfers and management tasks across S3 compatible storage services. It functions as a parallel data transfer tool and bucket synchronization utility, designed to accelerate the uploading and downloading of large volumes of files using concurrent workers. The tool acts as a batch command processor capable of executing thousands of object management operations in parallel from command files or piped streams. It also serves as an S3 Select query client, allowing the execution of SQL expressions against stored JSON records to retrieve
Provides a SQL interface to filter and retrieve specific JSON records directly from distributed object storage.
This is an open-source framework for building stateful, durable AI agents that run on Cloudflare Workers. It provides a runtime for long-lived agents that maintain a persistent identity, local SQL storage, and real-time connections, utilizing a lifecycle where agents hibernate when idle and wake on demand. The project distinguishes itself through its multi-channel orchestration, allowing a single agent to be deployed across voice, email, and chat interfaces with unified state. It implements the Model Context Protocol for standardized tool and data exchange and includes a dedicated framework f
Runs SQL queries against durable storage for custom data retrieval and state access.