5 个仓库
Methods for filtering and grouping raw analytics data into actionable subsets.
Distinguishing note: Focuses on the logical partitioning of analytics data rather than database sharding or physical storage.
Explore 5 awesome GitHub repositories matching data & databases · Data Segmentation. Refine with filters or upvote what's useful.
Umami is a privacy-focused web analytics platform and open-source visitor tracking tool. It functions as a self-hosted alternative to commercial tracking services, allowing the operation of a private analytics platform on independent infrastructure to maintain ownership and control over visitor data. The platform focuses on monitoring website traffic and analyzing user behavior without using invasive data collection methods. It provides capabilities for mapping user journeys and performing audience segmentation to compare how different visitor cohorts interact with site content.
Provides mechanisms to filter and group raw analytics data into specific user cohorts for behavioral comparison.
Umami is a self-hosted, privacy-focused web analytics platform designed to provide full control over infrastructure and user data. It captures website traffic and visitor behavior through anonymous tracking methods that avoid cookies, browser fingerprinting, and the storage of personally identifiable information. The platform distinguishes itself through a comprehensive suite of behavioral analysis tools, including session replays, heatmaps, and cohort-based retention reporting. It features a multi-tenant architecture that allows teams to manage multiple websites within a single, collaborativ
Applies filters to traffic data to create custom reports for deeper analysis.
This project is an open-source, privacy-focused web analytics platform designed for high-throughput data ingestion and multi-tenant data management. It provides a cookie-less tracking engine that captures visitor interactions using ephemeral request metadata, ensuring comprehensive traffic visibility while maintaining strict privacy standards. The architecture utilizes an event-driven ingestion pipeline and aggregated metric storage to decouple data collection from processing, enabling efficient long-term retrieval and responsive dashboard performance. What distinguishes this platform is its
Filters and drills down into specific traffic segments by applying multiple criteria such as source or location to isolate insights.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Merges small segments into larger ones, rolls up data at coarser granularity, and converts real-time segments into optimized offline segments.
BilibiliSponsorBlock is a content filtering system and API server designed to identify and remove sponsored segments and filler content from Bilibili video playback. It utilizes a crowdsourced segment database where users contribute and vote on timestamps to create a shared repository of skippable video sections. The project features a video metadata synchronizer that links equivalent videos across different platforms, allowing skip markers and timing data to be shared between mirrored content. It implements a reputation-based permission system to manage submissions and edits, alongside a pri
Allows users to submit manual timestamps for video segments to a shared database to help others skip unwanted content.