Why is umami-software/umami a recommended Data Processing GitHub Repositories repository?

Aggregates raw event logs into meaningful insights on the server to minimize client-side overhead.

Why is vectordotdev/vector a recommended Data Processing GitHub Repositories repository?

Performs local data aggregation to reduce network traffic and compute load before forwarding to global nodes.

Why is zhanymkanov/fastapi-best-practices a recommended Data Processing GitHub Repositories repository?

Performs complex data joins and aggregations directly within the database engine for native performance.

Why is emqx/emqx a recommended Data Processing GitHub Repositories repository?

Filters, aggregates, and transforms data streams locally to reduce bandwidth consumption and enable low-latency responses.

Why is boto/boto3 a recommended Data Processing GitHub Repositories repository?

Runs custom serverless code during object requests to filter or modify data in real-time.

Why is treeverse/lakefs a recommended Data Processing GitHub Repositories repository?

Updates embeddings by processing only the added, removed, or modified data between two commits.

Why is shujiahuang/cpp-primer-plus-6th a recommended Data Processing GitHub Repositories repository?

Implements logic to aggregate and calculate totals from multidimensional grid-based data structures.

7 مستودعات

Awesome GitHub RepositoriesData Processing

Utilities for transforming, aggregating, and analyzing raw data streams.

Distinguishing note: Focuses on server-side computation, distinct from client-side event collection.

Explore 7 awesome GitHub repositories matching data & databases · Data Processing. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

umami-software/umami
umami-software/umami
37,285عرض على GitHub
Umami is a self-hosted, privacy-focused web analytics platform designed to provide full control over infrastructure and user data. It captures website traffic and visitor behavior through anonymous tracking methods that avoid cookies, browser fingerprinting, and the storage of personally identifiable information. The platform distinguishes itself through a comprehensive suite of behavioral analysis tools, including session replays, heatmaps, and cohort-based retention reporting. It features a multi-tenant architecture that allows teams to manage multiple websites within a single, collaborativ
Aggregates raw event logs into meaningful insights on the server to minimize client-side overhead.
TypeScriptanalyticsaudience-segmentationcharts
عرض على GitHub37,285
vectordotdev/vector
vectordotdev/vector
22,071عرض على GitHub
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Performs local data aggregation to reduce network traffic and compute load before forwarding to global nodes.
Rusteventsforwarderhacktoberfest
عرض على GitHub22,071
zhanymkanov/fastapi-best-practices
zhanymkanov/fastapi-best-practices
16,515عرض على GitHub
This project provides a comprehensive guide to architectural patterns and best practices for building scalable, maintainable, and performant web applications using FastAPI. It focuses on standardizing development approaches for Python web services, emphasizing robust request validation, dependency injection, and automated documentation standards to ensure consistent API design. The guide distinguishes itself by promoting domain-driven modular packaging, which organizes application logic into isolated, feature-based directories to support long-term codebase scalability. It also details strateg
Performs complex data joins and aggregations directly within the database engine for native performance.
best-practicesfastapi
عرض على GitHub16,515
emqx/emqx
emqx/emqx
16,422عرض على GitHub
This project is a high-performance MQTT broker and IoT data platform designed to manage millions of concurrent device connections. It provides a scalable infrastructure for ingesting, processing, and routing telemetry data across distributed systems, utilizing an actor-based concurrency model to maintain high availability and state synchronization across cluster nodes. The platform distinguishes itself through integrated stream processing and edge computing capabilities. It allows users to execute declarative SQL-based rules directly against incoming message streams for real-time filtering, t
Filters, aggregates, and transforms data streams locally to reduce bandwidth consumption and enable low-latency responses.
Erlangaiotbrokercoap
عرض على GitHub16,422
boto/boto3
boto/boto3
9,834عرض على GitHub
Boto3 is the AWS SDK for Python, providing a programmatic interface for managing and automating AWS cloud infrastructure and services. It serves as a cloud management API client and resource manager for provisioning, configuring, and scaling virtual servers, databases, and storage. The library enables the implementation of infrastructure-as-code through declarative templates and scripts, allowing for the deployment of identical resource stacks across multiple accounts and geographic regions. It also provides a framework for coordinating distributed workflows, serverless functions, and contain
Runs custom serverless code during object requests to filter or modify data in real-time.
Pythonawsaws-sdkcloud
عرض على GitHub9,834
treeverse/lakefs
treeverse/lakeFS
5,406عرض على GitHub
lakeFS هو نظام إصدارات لبحيرات البيانات يوفر تفرعاً (branching) والتزامات (commits) تشبه Git لمجموعات البيانات الكبيرة المخزنة في تخزين الكائنات. يعمل كطبقة تحكم في الإصدار، مما يتيح إنشاء لقطات غير قابلة للتغيير، والتزامات ذرية، وتفرعاً بدون نسخ (zero-copy) لإنشاء بيئات معزولة لتجارب البيانات دون تكرار الملفات الفيزيائية. يعمل النظام كبوابة تخزين متوافقة مع S3 وفهرس Iceberg REST، مما يسمح لبروتوكولات التخزين السحابي القياسية والعملاء المتوافقين بإدارة الجداول ذات الإصدارات. يعمل كحارس لجودة البيانات باستخدام نظام خطافات (hooks) قائم على الأحداث للتحقق من مجموعات البيانات مقابل سياسات الحوكمة قبل دمج التغييرات في الإنتاج. تغطي المنصة قدرات واسعة لحوكمة البيانات، بما في ذلك التعاون عبر طلبات السحب (pull requests)، والتحكم في الوصول القائم على الأدوار، وتتبع أصل البيانات. يوفر تكاملاً لتنسيق سير العمل، وخطوط أنابيب التعلم الآلي، ومحركات حوسبة البيانات الضخمة المختلفة، ويدعم اتصال التخزين متعدد السحابة ومزامنة الهوية عبر SSO وSCIM. يمكن تثبيت البرنامج باستخدام ملفات ثنائية، أو حاويات، أو Helm charts للنشر على Kubernetes.
Updates embeddings by processing only the added, removed, or modified data between two commits.
Go
عرض على GitHub5,406
shujiahuang/cpp-primer-plus-6th
ShujiaHuang/Cpp-Primer-Plus-6th
3,106عرض على GitHub
This project is a C++ learning resource and study guide consisting of structured notes and programming examples. It provides practical implementations and exercise solutions covering core language syntax, data types, and control flow. The repository features specialized samples for object-oriented design, including class inheritance, polymorphism, and abstract classes. It includes demonstrations of memory management techniques such as dynamic allocation, move semantics, and placement new, as well as template programming examples for creating generic functions and data structures. The codebas
Implements logic to aggregate and calculate totals from multidimensional grid-based data structures.
C++cppprogramming
عرض على GitHub3,106

Awesome Data Processing GitHub Repositories

umami-software/umami

vectordotdev/vector

zhanymkanov/fastapi-best-practices

emqx/emqx

boto/boto3

treeverse/lakeFS

ShujiaHuang/Cpp-Primer-Plus-6th

استكشف الوسوم الفرعية