14 dépôts
Tools for handling, timestamping, and normalizing incoming event streams for analytics.
Distinguishing note: Focuses on the ingestion and temporal normalization of event data rather than general database management.
Explore 14 awesome GitHub repositories matching data & databases · Event Data Processing. Refine with filters or upvote what's useful.
Umami is a self-hosted, privacy-focused web analytics platform designed to provide full control over infrastructure and user data. It captures website traffic and visitor behavior through anonymous tracking methods that avoid cookies, browser fingerprinting, and the storage of personally identifiable information. The platform distinguishes itself through a comprehensive suite of behavioral analysis tools, including session replays, heatmaps, and cohort-based retention reporting. It features a multi-tenant architecture that allows teams to manage multiple websites within a single, collaborativ
Allows overriding default event timestamps to ensure accurate historical data reporting.
This project is an open-source, privacy-focused web analytics platform designed for high-throughput data ingestion and multi-tenant data management. It provides a cookie-less tracking engine that captures visitor interactions using ephemeral request metadata, ensuring comprehensive traffic visibility while maintaining strict privacy standards. The architecture utilizes an event-driven ingestion pipeline and aggregated metric storage to decouple data collection from processing, enabling efficient long-term retrieval and responsive dashboard performance. What distinguishes this platform is its
Validates event ingestion by inspecting request headers and debug responses to ensure accurate data processing.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Records the precise time of occurrence for each metric event to maintain accurate temporal ordering and historical analysis.
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
Specifies occurrence times for data records to enable incremental processing and advanced dataset comparison.
Any-rule is a multi-platform regular expression tool that provides a curated catalog of over 70 ready-to-use patterns for validating and extracting common data formats. The project separates its static regex collection from editor-specific plugins, allowing the same pattern library to be accessed through VS Code, IntelliJ IDEA, Alfred Workflow, and a web interface. The tool enables keyword-based pattern retrieval, letting users search for the correct regex by typing descriptive terms rather than remembering exact syntax. It covers a broad range of validation needs including email addresses, U
Provides a regex pattern to validate timestamps in YYYYMMDD HH:mm:ss format.
Falco is an eBPF runtime security monitor and cloud native detection engine that identifies abnormal behavior and security threats across hosts and containers. It functions as a Linux kernel event auditor, capturing system calls and kernel events in real-time to detect malicious activity. The system distinguishes itself through a rule-based threat detection model that evaluates system activity against a library of community-maintained rules and custom security definitions. It enriches raw kernel events with container and Kubernetes metadata to provide observability into isolated environments
Processes raw event content prior to field extraction to prepare data for the detection engine.
ElastAlert is an alerting framework and query monitor for Elasticsearch. It functions as a real-time log monitoring tool and event notification engine that scans indices for specific patterns to trigger automated alerts when predefined rules are matched. The system distinguishes itself through specialized detection logic, including event spike detection, event frequency monitoring, field change tracking, and the identification of new terms within data fields. It handles notification noise via stateful alert suppression to prevent redundant messages and provides time-windowed aggregation to gr
Enables specifying the field and format for event timing to adjust query delays for non-real-time data.
Sui is a blockchain platform featuring an object-centric state model and resource-oriented smart contracts. It utilizes parallel transaction execution to increase network throughput and supports programmable transaction blocks that bundle multiple operations into single atomic units. The platform distinguishes itself with a capability-based access control system and zero-knowledge login mechanisms, enabling users to authenticate via identity providers without seed phrases. It also implements deterministic object addressing to allow predictable state lookups and supports the creation of soulbo
Converts raw binary blockchain event data into strongly-typed Rust structures for processing.
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Defines event occurrence times using source timestamps or ingestion time to manage temporal processing.
Countly is a self-hosted product analytics and engagement platform that tracks user behavior across mobile, web, and desktop applications. It collects and analyzes device properties, user actions, and session lifecycle data to understand engagement patterns, while also providing crash reporting, push notification delivery, and A/B testing capabilities. The platform is designed for privacy-first deployment, with built-in consent management and the ability to run entirely on private infrastructure. The platform distinguishes itself through its comprehensive feature set that combines analytics w
Attaches a unique millisecond timestamp, local hour, day of week, and timezone offset to each event.
CloudEvents is an open specification for describing event data in a common format across cloud platforms and services. It defines a standard structure and set of metadata attributes for events, enabling interoperability across different systems so producers and consumers can exchange events without custom translation. The specification provides a protocol-agnostic serialization framework that maps CloudEvents attributes and payloads to multiple serialization formats including JSON, Avro, and Protobuf, and defines transport bindings for mapping events onto protocols like HTTP, AMQP, Kafka, MQTT
Provides the core capability to construct and validate CloudEvent objects against the specification.
Cortex is an open-source, horizontally scalable metrics platform that ingests, stores, and queries Prometheus-compatible time-series data with multi-tenant isolation. It accepts metrics via Prometheus remote write and OpenTelemetry, executes PromQL queries against both recent and historical data, and provides a Prometheus-compatible alerting and recording rule engine with an integrated Alertmanager. The system is built as a set of independently scalable microservices that use hash-ring-based sharding, gossip-based cluster membership, and tenant-aware object storage to distribute workloads acro
Cortex rejects samples with timestamps too far in the past or future based on configurable age and grace period limits.
CRI-O is an open-source container runtime that implements the Kubernetes Container Runtime Interface (CRI) to manage container images, pods, and containers on cluster nodes using OCI-compatible runtimes. It serves as a node-level container manager that handles image pulling, container lifecycle, and resource monitoring for Kubernetes clusters, running containers according to the Open Container Initiative specifications. The runtime distinguishes itself through live configuration reloading that applies changes to runtime definitions, registry mirrors, and TLS certificates without restarting th
Reports pod sandbox status timestamps in nanosecond resolution for evented PLEG compatibility.
Ce projet fournit un pipeline de données d'observabilité conçu pour collecter, transformer et router les logs, métriques et traces depuis diverses sources vers des formats standardisés pour analyse. Il fonctionne comme une architecture de composants basée sur des plugins utilisant des récepteurs, processeurs et exportateurs modulaires pour déplacer les données de télémétrie à travers des chaînes de traitement séquentielles. Le système utilise un modèle de composants piloté par interface qui permet des connecteurs interchangeables et des extensions contribuées par la communauté. Il se distingue par un langage spécifique au domaine pour le filtrage de télémétrie, l'attribution de ressources basée sur les métadonnées pour la détection d'infrastructure, et la résolution dynamique de secrets depuis des gestionnaires cloud externes. Le collecteur couvre un large éventail de capacités incluant l'ingestion de télémétrie depuis des fournisseurs cloud et des bases de données, la transformation et la réagrégation de données, et l'exportation sécurisée vers des backends de stockage tiers. Il incorpore des fonctionnalités de gestion du trafic telles que le routage round-robin et le partitionnement de messages, ainsi que des primitives de sécurité pour la gestion des identités et des accès via OAuth2 et OIDC. Le projet inclut un framework d'assurance qualité pour la simulation de données synthétiques, les tests de performance de bout en bout et la vérification de l'intégrité des données.
Sets the start timestamp of cumulative metric points based on specific reset strategies.