9 مستودعات
Mechanisms to identify and prevent duplicate API requests to ensure data consistency and prevent redundant processing.
Distinct from Request Deduplication: Distinct from Request Deduplication [f0_mt1], which focuses on collapsing concurrent network requests at the client/browser level, whereas this is a server-side idempotency check.
Explore 9 awesome GitHub repositories matching web development · API Request Deduplication. Refine with filters or upvote what's useful.
Parse Server is a backend-as-a-service solution and Node.js framework that provides a ready-to-use REST and GraphQL API for mobile and web applications. It functions as a core backend infrastructure for managing database schemas, user authentication, and API routing. The system distinguishes itself with a real-time data engine that pushes database updates to clients via WebSockets and a GraphQL server that automatically generates schemas based on application data models. It also features an adapter-based storage layer that abstracts interactions with various cloud and local backends. The pla
Provides a mechanism to prevent duplicate object creation or updates by identifying identical requests via unique headers.
Cortex is an open-source, horizontally scalable metrics platform that ingests, stores, and queries Prometheus-compatible time-series data with multi-tenant isolation. It accepts metrics via Prometheus remote write and OpenTelemetry, executes PromQL queries against both recent and historical data, and provides a Prometheus-compatible alerting and recording rule engine with an integrated Alertmanager. The system is built as a set of independently scalable microservices that use hash-ring-based sharding, gossip-based cluster membership, and tenant-aware object storage to distribute workloads acro
Deduplicates rule group state from multiple replicas for consistent API responses during resharding.
Scrapy-Redis is a library that transforms Scrapy into a distributed web crawling framework by replacing its in-memory scheduler with a Redis-backed component. This allows multiple Scrapy spider workers to coordinate through a shared request queue, enabling them to consume URLs concurrently while a Redis set tracks seen URLs across all workers to prevent duplicate crawls. The system persists crawl state—including pending requests and already-crawled URLs—in Redis, so a paused or crashed spider can resume from where it left off without losing progress. The library provides a Redis-based duplica
Uses a Redis set to filter duplicate URLs across all running spiders, preventing the same page from being crawled twice.
هذا المشروع عبارة عن إطار عمل موزع لكشط الويب يتيح التوسع الأفقي لمهام الكشط. يستخدم Redis كمدير طابور طلبات مركزي ومخزن حالة لتنسيق تقدم الكشط وبيانات تعريف الطلب عبر مثيلات خادم متعددة. يوزع النظام أعباء عمل الكشط من خلال مشاركة طابور طلبات واحد ويستخدم مرشح تكرار موزع لمنع العمال المتعددين من زيارة نفس الصفحة. ويحتفظ بحالة الطلب المعقدة وبيانات التعريف كسلاسل JSON داخل المخزن البعيد المشترك. يوفر إطار العمل أيضاً إمكانيات لمعالجة البيانات الموزعة عن طريق دفع العناصر المكتشطة إلى طابور مشترك للاستهلاك المتوازي بواسطة عمال معالجة منفصلين.
Prevents redundant crawling of the same page by tracking visited URLs in a shared Redis set.
Blackbird is an open-source OSINT investigation tool that searches across hundreds of online platforms to discover accounts linked to a given username or email address. It functions as a username and email search engine, consolidating discovered profiles into a single list with low false positives for investigative analysis. The tool incorporates an AI-enhanced profile analyzer that uses a built-in AI API to generate behavioral and technical summaries of discovered online profiles. It also provides a documentation query interface that accepts natural-language questions via HTTP GET requests t
Holds incoming profile records in a set-based buffer keyed by platform and identifier to eliminate duplicates.
Libpostal هي مكتبة C مصممة لتحليل وتطبيع العناوين الدولية. تستخدم معالجة اللغات الطبيعية (NLP) الإحصائية ومصنف لغات لتفكيك سلاسل العناوين العالمية غير المهيكلة إلى مكونات منظمة، وتوحيد عناوين الشوارع من خلال توسيع الاختصارات وحل اختلافات التسمية الإقليمية عبر لغات متعددة. يوفر المشروع أدوات لتحويل النصوص (Transliteration)، وتحويل النصوص المختلفة إلى صيغ Latin-ASCII أو NFD موحدة. كما يتضمن قدرات لإزالة تكرار العناوين، باستخدام مطابقة تقريبية متماثلة لتحديد ما إذا كانت سجلات العناوين المختلفة تشير إلى نفس الموقع الفعلي. تغطي المكتبة احتياجات معالجة النصوص الأوسع مثل تطبيع UTF-8 وتحويل الأرقام المكتوبة والأرقام الرومانية إلى تمثيلات رقمية قياسية. وتسمح بإضافات للتعرف على العناوين من خلال ملفات تهيئة خارجية لإضافة لغات ومرادفات جديدة.
Identifies and merges address records that refer to the same real-world physical location using fuzzy matching.
Dedupe is a machine learning tool for entity resolution that identifies and merges duplicate records in structured datasets. It uses active learning to train a matching model from human-labeled examples, learning which field-level similarities are most important for detecting duplicates without requiring manual rule writing. The system combines fingerprint-based blocking to reduce pairwise comparisons, enabling efficient matching on large datasets, and groups scored record pairs into clusters using a configurable similarity threshold. The tool provides multiple interfaces for different workfl
Identifies and merges entries that refer to the same real-world entity, even when names or addresses differ slightly.
CrawlerTutorial is a comprehensive Python web scraping tutorial and framework designed for extracting data from static and dynamic websites. It functions as a web data extraction pipeline and an HTTP request orchestrator, covering the full lifecycle of scraping applications from initial fetching to final data storage. The project provides specialized guidance on anti-bot bypass techniques and web API reverse engineering. It includes methods for evading browser detection through identity masking and proxy rotation, as well as techniques for identifying hidden API endpoints by analyzing network
Prevents redundant crawling by filtering and deduplicating extracted URLs using a tracking system.
ClawRouter is an AI model router and API gateway designed to classify query complexity and assign prompts to the most efficient model tier. It operates as a multi-model AI proxy that orchestrates traffic between various large language models and AI media generators through a unified interface. The project distinguishes itself by integrating a non-custodial micropayment processor using the x402 protocol. This allows for per-request API access and USDC settlement on Base and Solana chains, replacing static API keys with wallet-based authentication and real-time budget enforcement. The system c
Prevents duplicate billing by hashing request bodies to identify and replay cached responses.