9 Repos
Mechanisms to identify and prevent duplicate API requests to ensure data consistency and prevent redundant processing.
Distinct from Request Deduplication: Distinct from Request Deduplication [f0_mt1], which focuses on collapsing concurrent network requests at the client/browser level, whereas this is a server-side idempotency check.
Explore 9 awesome GitHub repositories matching web development · API Request Deduplication. Refine with filters or upvote what's useful.
Parse Server is a backend-as-a-service solution and Node.js framework that provides a ready-to-use REST and GraphQL API for mobile and web applications. It functions as a core backend infrastructure for managing database schemas, user authentication, and API routing. The system distinguishes itself with a real-time data engine that pushes database updates to clients via WebSockets and a GraphQL server that automatically generates schemas based on application data models. It also features an adapter-based storage layer that abstracts interactions with various cloud and local backends. The pla
Provides a mechanism to prevent duplicate object creation or updates by identifying identical requests via unique headers.
Cortex is an open-source, horizontally scalable metrics platform that ingests, stores, and queries Prometheus-compatible time-series data with multi-tenant isolation. It accepts metrics via Prometheus remote write and OpenTelemetry, executes PromQL queries against both recent and historical data, and provides a Prometheus-compatible alerting and recording rule engine with an integrated Alertmanager. The system is built as a set of independently scalable microservices that use hash-ring-based sharding, gossip-based cluster membership, and tenant-aware object storage to distribute workloads acro
Deduplicates rule group state from multiple replicas for consistent API responses during resharding.
Scrapy-Redis is a library that transforms Scrapy into a distributed web crawling framework by replacing its in-memory scheduler with a Redis-backed component. This allows multiple Scrapy spider workers to coordinate through a shared request queue, enabling them to consume URLs concurrently while a Redis set tracks seen URLs across all workers to prevent duplicate crawls. The system persists crawl state—including pending requests and already-crawled URLs—in Redis, so a paused or crashed spider can resume from where it left off without losing progress. The library provides a Redis-based duplica
Uses a Redis set to filter duplicate URLs across all running spiders, preventing the same page from being crawled twice.
This project is a distributed web crawling framework that enables the horizontal scaling of scraping tasks. It uses Redis as a centralized request queue manager and state store to coordinate crawl progress and request metadata across multiple server instances. The system distributes crawling workloads by sharing a single request queue and utilizes a distributed duplicate filter to prevent multiple workers from visiting the same page. It persists complex request state and metadata as JSON strings within the shared remote store. The framework also provides capabilities for distributed data pro
Prevents redundant crawling of the same page by tracking visited URLs in a shared Redis set.
Blackbird is an open-source OSINT investigation tool that searches across hundreds of online platforms to discover accounts linked to a given username or email address. It functions as a username and email search engine, consolidating discovered profiles into a single list with low false positives for investigative analysis. The tool incorporates an AI-enhanced profile analyzer that uses a built-in AI API to generate behavioral and technical summaries of discovered online profiles. It also provides a documentation query interface that accepts natural-language questions via HTTP GET requests t
Holds incoming profile records in a set-based buffer keyed by platform and identifier to eliminate duplicates.
Libpostal ist eine C-Bibliothek für das Parsen und Normalisieren internationaler Adressen. Sie nutzt statistisches NLP und einen Sprachklassifikator, um unstrukturierte globale Adress-Strings in strukturierte Komponenten zu zerlegen und Straßenadressen durch das Auflösen von Abkürzungen und regionalen Namensvariationen über mehrere Sprachen hinweg zu standardisieren. Das Projekt bietet Tools für die Texttransliteration, um verschiedene Schriftsysteme in standardisierte Latin-ASCII- oder NFD-Formen zu konvertieren. Es enthält zudem Funktionen zur Adress-Deduplizierung, wobei symmetrisches Fuzzy-Matching verwendet wird, um zu identifizieren, ob verschiedene Adressdatensätze denselben physischen Ort referenzieren. Die Bibliothek deckt breitere Textverarbeitungsanforderungen ab, wie UTF-8-Normalisierung sowie die Konvertierung von ausgeschriebenen Zahlen und römischen Ziffern in numerische Darstellungen. Sie ermöglicht Erweiterungen der Adresserkennung durch externe Konfigurationsdateien, um neue Sprachen und Synonyme hinzuzufügen.
Identifies and merges address records that refer to the same real-world physical location using fuzzy matching.
Dedupe is a machine learning tool for entity resolution that identifies and merges duplicate records in structured datasets. It uses active learning to train a matching model from human-labeled examples, learning which field-level similarities are most important for detecting duplicates without requiring manual rule writing. The system combines fingerprint-based blocking to reduce pairwise comparisons, enabling efficient matching on large datasets, and groups scored record pairs into clusters using a configurable similarity threshold. The tool provides multiple interfaces for different workfl
Identifies and merges entries that refer to the same real-world entity, even when names or addresses differ slightly.
CrawlerTutorial is a comprehensive Python web scraping tutorial and framework designed for extracting data from static and dynamic websites. It functions as a web data extraction pipeline and an HTTP request orchestrator, covering the full lifecycle of scraping applications from initial fetching to final data storage. The project provides specialized guidance on anti-bot bypass techniques and web API reverse engineering. It includes methods for evading browser detection through identity masking and proxy rotation, as well as techniques for identifying hidden API endpoints by analyzing network
Prevents redundant crawling by filtering and deduplicating extracted URLs using a tracking system.
ClawRouter is an AI model router and API gateway designed to classify query complexity and assign prompts to the most efficient model tier. It operates as a multi-model AI proxy that orchestrates traffic between various large language models and AI media generators through a unified interface. The project distinguishes itself by integrating a non-custodial micropayment processor using the x402 protocol. This allows for per-request API access and USDC settlement on Base and Solana chains, replacing static API keys with wallet-based authentication and real-time budget enforcement. The system c
Prevents duplicate billing by hashing request bodies to identify and replay cached responses.