# apify/crawlee

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/apify-crawlee).**

21,762 stars · 1,210 forks · TypeScript · apache-2.0

## Links

- GitHub: https://github.com/apify/crawlee
- Homepage: https://crawlee.dev
- awesome-repositories: https://awesome-repositories.com/repository/apify-crawlee.md

## Topics

`apify` `automation` `crawler` `crawling` `headless` `headless-chrome` `javascript` `nodejs` `npm` `playwright` `puppeteer` `scraper` `scraping` `typescript` `web-crawler` `web-crawling` `web-scraping`

## Description

Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture.

The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a robust session-based fingerprint isolation system that manages unique browser contexts, TLS fingerprints, and proxy rotation to mimic human behavior and bypass anti-bot protections. These capabilities are supported by a persistent request queueing system that ensures crawl operations can survive process restarts and resume from their last state.

The framework offers a comprehensive suite of tools for the entire scraping lifecycle, including event-driven lifecycle hooks for custom logic, a middleware-based request pipeline for handling authentication and data transformation, and a pluggable storage backend interface that decouples data persistence from application logic. It supports advanced automation tasks such as AI-driven navigation, sitemap discovery, and multi-engine browser orchestration, while providing extensive observability through performance metrics, error snapshots, and configurable logging.

The project is implemented in TypeScript and provides a command-line interface for scaffolding, managing, and deploying scraping projects to cloud or serverless environments.

## Tags

### Web Development

- [Web Crawling](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-crawling.md) — Provides a systematic framework for discovering, navigating, and extracting data from web pages at scale. ([source](https://crawlee.dev/js/api/browser-crawler/class/BrowserCrawler.md))
- [Web Scraping Frameworks](https://awesome-repositories.com/f/web-development/web-scraping-frameworks.md) — Provides a comprehensive framework for building scalable web crawlers that support both lightweight HTTP requests and headless browser automation.
- [Browser Automation](https://awesome-repositories.com/f/web-development/browser-automation.md) — Controls headless browsers to navigate pages, scroll, and interact with dynamic elements for data extraction. ([source](https://crawlee.dev/index.md))
- [Headless Rendering Engines](https://awesome-repositories.com/f/web-development/headless-browsers/headless-rendering-engines.md) — Executes client-side scripts using headless browsers to scrape dynamic content. ([source](https://crawlee.dev/blog/scrapy-vs-crawlee.md))
- [Browser Session Managers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/browser-automation/browser-session-managers.md) — Manages isolated browser sessions, cookies, and proxy rotation to maintain state across scraping requests. ([source](https://crawlee.dev/js/api/core/interface/CreateSession.md))
- [Large-Scale Domain Crawlers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-crawling/large-scale-domain-crawlers.md) — Builds and manages high-performance, distributed web crawlers that extract structured data from thousands of pages.
- [Concurrent Crawling Engines](https://awesome-repositories.com/f/web-development/concurrent-crawling-engines.md) — Dynamically scales concurrency and resource usage based on system health to maximize throughput. ([source](https://crawlee.dev/js/api/core.md))
- [Headless Browser Orchestrators](https://awesome-repositories.com/f/web-development/web-automation-scraping/browser-orchestration-systems/headless-browser-orchestrators.md) — Orchestrates headless browser instances to render dynamic JavaScript and interact with web elements during scraping. ([source](https://crawlee.dev/blog/how-to-scrape-amazon.md))
- [Browser Automation](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/browser-automation.md) — Provides a unified interface for managing and scaling headless browser automation instances. ([source](https://crawlee.dev/js/api/browser-pool/class/BrowserPlugin.md))
- [Crawling Optimization](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/crawling-optimization.md) — Manages concurrency, request timeouts, browser types, and proxy settings to optimize performance and minimize the risk of being blocked by target servers. ([source](https://crawlee.dev/blog/scrape-tiktok-python.md))
- [Crawler Configuration Managers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/web-crawlers/crawler-configuration-managers.md) — Manages concurrency, retries, and browser impersonation to minimize blocking during web scraping. ([source](https://crawlee.dev/blog/scrape-google-search.md))
- [Web Crawling Orchestrators](https://awesome-repositories.com/f/web-development/web-crawling-orchestrators.md) — Orchestrates recursive crawling by processing URLs from queues to discover and visit linked pages automatically. ([source](https://crawlee.dev/js/api/browser-crawler.md))
- [Web Scraping Engines](https://awesome-repositories.com/f/web-development/web-scraping-engines.md) — Integrates multiple scraping and browser automation tools through a unified interface. ([source](https://crawlee.dev/js.md))
- [Isolated Browser Contexts](https://awesome-repositories.com/f/web-development/browser-integration-utilities/browser-infrastructure/isolated-browser-contexts.md) — Creates isolated browser contexts with unique cookies, proxies, and fingerprints to mimic human behavior and bypass anti-bot protections.
- [Browser Session Persistence](https://awesome-repositories.com/f/web-development/browser-session-persistence.md) — Stores cookies, local storage, and cache to maintain user state across multiple scraping sessions. ([source](https://crawlee.dev/js/api/browser-pool/interface/LaunchContextOptions.md))
- [Custom Page Frameworks](https://awesome-repositories.com/f/web-development/custom-page-frameworks.md) — Executes custom logic on each visited page to extract data and perform navigation tasks. ([source](https://crawlee.dev/js/api/browser-crawler/interface/BrowserCrawlerOptions.md))
- [Request Handling](https://awesome-repositories.com/f/web-development/request-handling.md) — Tracks and manages the lifecycle of web requests to ensure all target pages are processed. ([source](https://crawlee.dev/js/api/core/interface/IRequestManager.md))
- [Request Interception Middleware](https://awesome-repositories.com/f/web-development/request-interception-middleware.md) — Provides middleware for intercepting and modifying network traffic during the crawling process. ([source](https://crawlee.dev/js/api/cheerio-crawler.md))
- [Crawl Task Managers](https://awesome-repositories.com/f/web-development/task-execution-engines/crawl-task-managers.md) — Fetches pending URLs for execution, tracks completion status, and allows for the reclamation of failed tasks. ([source](https://crawlee.dev/js/api/core/class/RequestProvider.md))
- [Browser Isolation Strategies](https://awesome-repositories.com/f/web-development/web-automation-scraping/browser-environment-configurations/browser-isolation-strategies.md) — Creates isolated, ephemeral browser sessions to ensure clean states and prevent data leakage between scraping tasks. ([source](https://crawlee.dev/js/api/browser-pool/interface/LaunchContextOptions.md))
- [Browser Lifecycle Managers](https://awesome-repositories.com/f/web-development/web-automation-scraping/browser-orchestration-systems/browser-lifecycle-managers.md) — Orchestrates the launching, retirement, and teardown of browser instances for efficient resource management. ([source](https://crawlee.dev/js/api/browser-pool.md))
- [Crawler Health Monitoring](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/crawler-health-monitoring.md) — Logs runtime metrics like throughput and success rates to monitor the health of data extraction tasks. ([source](https://crawlee.dev/js/api/core/class/Statistics.md))
- [Browser Automation Engines](https://awesome-repositories.com/f/web-development/browser-automation-engines.md) — Manages multiple browser engines through a unified interface to switch between rendering environments. ([source](https://crawlee.dev/js/api/browser-pool/interface/BrowserPoolOptions.md))
- [Browser Cookie Management](https://awesome-repositories.com/f/web-development/browser-integration-utilities/browser-apis/browser-storage/browser-cookie-management.md) — Retrieves and injects session cookies to maintain authentication states across automated scraping tasks. ([source](https://crawlee.dev/js/api/browser-pool/class/PuppeteerController.md))
- [Navigation Hooks](https://awesome-repositories.com/f/web-development/browser-navigation-utilities/navigation-hooks.md) — Enables custom logic execution before or after page navigation to handle anti-bot challenges or state modification. ([source](https://crawlee.dev/js/api/browser-crawler/interface/BrowserCrawlerOptions.md))
- [Pagination Crawlers](https://awesome-repositories.com/f/web-development/custom-page-frameworks/page-content-injections/pagination-navigators/pagination-crawlers.md) — Extracts links from web pages using selectors to traverse multi-page search results and site structures. ([source](https://crawlee.dev/blog/scrape-google-search.md))
- [DOM Element Selectors](https://awesome-repositories.com/f/web-development/dom-element-selectors.md) — Selects and traverses elements within a document using CSS-style selectors to extract data or manipulate the DOM. ([source](https://crawlee.dev/js/api/basic-crawler/interface/CheerioAPI.md))
- [Rendering Strategies](https://awesome-repositories.com/f/web-development/rendering-templating/rendering-strategies.md) — Optimizes rendering strategies by switching between network requests and headless browsers to improve load speeds. ([source](https://crawlee.dev/js.md))
- [Request Routing](https://awesome-repositories.com/f/web-development/request-routing.md) — Directs crawling requests to specific processing logic based on page type to handle multi-step workflows. ([source](https://crawlee.dev/blog/launching-crawlee-python.md))
- [Adaptive Crawling Engines](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-crawling/adaptive-crawling-engines.md) — Dynamically adjusts crawling strategies based on website structure and content requirements to improve navigation effectiveness. ([source](https://crawlee.dev/blog/crawlee-blog-launch.md))
- [State Persistence](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/state-persistence.md) — Maintains persistent crawl state to allow continuous monitoring and task injection even after the initial queue is exhausted. ([source](https://crawlee.dev/js/api/cheerio-crawler/interface/CheerioCrawlerOptions.md))
- [Crawler Identity Masking](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/web-crawlers/crawler-configuration-managers/crawler-identity-masking.md) — Randomizes browser fingerprints and HTTP headers to simulate human behavior and bypass anti-bot detection mechanisms during web scraping sessions. ([source](https://crawlee.dev/blog/crawlee-for-python-v06.md))
- [API Servers](https://awesome-repositories.com/f/web-development/api-servers.md) — Transforms scraping tasks into persistent server processes that expose extracted data via HTTP endpoints. ([source](https://crawlee.dev/blog/superscraper-with-crawlee.md))
- [Remote Browser Infrastructure Management](https://awesome-repositories.com/f/web-development/browser-integration-utilities/browser-infrastructure/remote-browser-infrastructure-management.md) — Scales the number of parallel browser instances based on system resources to optimize performance. ([source](https://crawlee.dev/js/api/browser-crawler/interface/BrowserCrawlerOptions.md))
- [Identity Customization](https://awesome-repositories.com/f/web-development/browser-session-persistence/identity-customization.md) — Sets custom user agent strings and persistent user data directories to mimic human browsing behavior and maintain state across multiple scraping sessions. ([source](https://crawlee.dev/js/api/browser-crawler/interface/BrowserLaunchContext.md))
- [Crawl Request Metadata Trackers](https://awesome-repositories.com/f/web-development/request-metadata/crawl-request-metadata-trackers.md) — Analyzes and summarizes failed requests to help identify and resolve issues with target websites. ([source](https://crawlee.dev/blog/crawlee-blog-launch.md))
- [Route Organization Patterns](https://awesome-repositories.com/f/web-development/route-organization-patterns.md) — Maps specific URL patterns or labels to dedicated handler functions for modular and maintainable data extraction. ([source](https://crawlee.dev/blog/scraping-dynamic-websites-using-python.md))
- [Pattern-Matching Routers](https://awesome-repositories.com/f/web-development/routing-systems/pattern-matching-routers.md) — Excludes specific URLs from the crawl queue by matching them against patterns and triggering custom skip logic. ([source](https://crawlee.dev/js/api/core/function/filterRequestsByPatterns.md))
- [Browser Environment Configurations](https://awesome-repositories.com/f/web-development/web-automation-scraping/browser-environment-configurations.md) — Configures browser initialization parameters including proxy routing and stealth headers for automated sessions. ([source](https://crawlee.dev/js/api/browser-pool/interface/BrowserPluginOptions.md))
- [Crawler Lifecycle Controllers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/crawler-middleware/crawler-lifecycle-controllers.md) — Provides programmatic control to shut down crawling processes immediately upon encountering critical failures. ([source](https://crawlee.dev/js/api/core/class/CriticalError.md))
- [Sitemap Generators](https://awesome-repositories.com/f/web-development/web-standards/search-engine-optimization/sitemap-generators.md) — Parses website sitemaps to automatically discover and queue target pages for large-scale extraction. ([source](https://crawlee.dev/blog/crawlee-for-python-v1.md))
- [Remote Data Fetching](https://awesome-repositories.com/f/web-development/remote-data-fetching.md) — Provides utilities for retrieving and parsing data from remote network resources. ([source](https://crawlee.dev/js/api/core/class/GotScrapingHttpClient.md))
- [Request Lifecycle Hooks](https://awesome-repositories.com/f/web-development/request-lifecycle-hooks.md) — Monitors the progress of individual web requests through various stages to ensure reliable data extraction. ([source](https://crawlee.dev/js/api/core/enum/RequestState.md))
- [Robots Exclusion Compliance](https://awesome-repositories.com/f/web-development/robots-exclusion-compliance.md) — Checks and adheres to site-specific crawling rules defined in robots.txt files to ensure ethical and compliant automated data collection. ([source](https://crawlee.dev/blog/crawlee-for-python-v1.md))

### Data & Databases

- [Resource-Aware Scaling Controllers](https://awesome-repositories.com/f/data-databases/horizontal-database-scaling/resource-scaling-strategies/resource-aware-scaling-controllers.md) — Dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. ([source](https://cdn.jsdelivr.net/gh/apify/crawlee@master/README.md))
- [Web Data Extraction](https://awesome-repositories.com/f/data-databases/web-data-extraction.md) — Automates the parsing and collection of structured data from websites into standardized formats. ([source](https://crawlee.dev/blog/how-to-scrape-amazon.md))
- [Content Extraction](https://awesome-repositories.com/f/data-databases/content-extraction.md) — Extracts structured data from HTML pages using CSS selectors to isolate specific content. ([source](https://crawlee.dev/blog/netflix-show-recommender.md))
- [Web Content Scrapers](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/web-extraction-engines/web-content-scrapers.md) — Extracts information from web pages and converts retrieved content into structured formats. ([source](https://crawlee.dev/blog/crawlee-blog-launch.md))
- [Data Pipeline Orchestration](https://awesome-repositories.com/f/data-databases/data-pipeline-orchestration.md) — Orchestrates modular, scalable workflows that discover, queue, process, and export web content into structured datasets.
- [Distributed Crawling Systems](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/distributed-crawling-systems.md) — Persists crawl progress to allow resuming interrupted jobs from the last processed state. ([source](https://crawlee.dev/js/api/core/interface/RequestListState.md))
- [Persistent Storage Backends](https://awesome-repositories.com/f/data-databases/persistent-storage-backends.md) — Saves extracted information into structured formats and storage backends to ensure reliable data capture. ([source](https://crawlee.dev/blog/netflix-show-recommender.md))
- [State Persistence](https://awesome-repositories.com/f/data-databases/state-persistence.md) — Maintains mutable state across crawler executions to track progress and share information. ([source](https://crawlee.dev/js/api/core/interface/CrawlingContext.md))
- [Structured Data Extraction](https://awesome-repositories.com/f/data-databases/structured-data-extraction.md) — Parses raw HTML or JSON responses using selectors to transform unstructured content into clean data. ([source](https://crawlee.dev/blog/scrape-crunchbase-python.md))
- [Caching and Performance](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/caching-performance.md) — Switches between lightweight requests and browser rendering to minimize resource consumption during data collection. ([source](https://crawlee.dev/blog/crawlee-for-python-v06.md))
- [Data Exporters](https://awesome-repositories.com/f/data-databases/data-exporters.md) — Saves collected information into structured formats like JSON or CSV for external analysis. ([source](https://crawlee.dev/js/api/basic-crawler/class/BasicCrawler.md))
- [Persistent Application State](https://awesome-repositories.com/f/data-databases/persistent-application-state.md) — Maintains and automatically saves data across crawler executions by storing values in a persistent key-value store. ([source](https://crawlee.dev/js/api/core/function/useState.md))
- [Persistent Storage Management](https://awesome-repositories.com/f/data-databases/persistent-storage-management.md) — Defines storage identifiers and persistence intervals to ensure scraped data is saved reliably. ([source](https://crawlee.dev/js/api/basic-crawler/interface/BasicCrawlingContext.md))
- [Collection Lifecycle Management](https://awesome-repositories.com/f/data-databases/data-collections-datasets/collection-lifecycle-management.md) — Provides utilities for opening, inspecting, and managing the lifecycle of data collections. ([source](https://crawlee.dev/js/api/core/class/Dataset.md))
- [Data Persistence and Storage](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage.md) — Saves arbitrary data, files, or crawler states to local or cloud storage using unique keys. ([source](https://crawlee.dev/js/api/core/class/KeyValueStore.md))
- [Storage Adapters](https://awesome-repositories.com/f/data-databases/data-scraping-tools/storage-adapters.md) — Persists extracted tabular data and binary files to local disk or cloud storage backends through a unified interface. ([source](https://cdn.jsdelivr.net/gh/apify/crawlee@master/README.md))
- [Shared State Persisters](https://awesome-repositories.com/f/data-databases/key-value-persistence-stores/shared-state-persisters.md) — Maintains mutable data across multiple crawler executions by storing it in a persistent key-value store. ([source](https://crawlee.dev/js/api/basic-crawler/interface/BasicCrawlingContext.md))
- [Key-Value Stores](https://awesome-repositories.com/f/data-databases/key-value-stores.md) — Manages persistent key-value storage for configuration and state associated with crawling tasks. ([source](https://crawlee.dev/js/api/browser-crawler/interface/BrowserCrawlingContext.md))
- [Pluggable Storage Drivers](https://awesome-repositories.com/f/data-databases/pluggable-storage-drivers.md) — Decouples data persistence from application logic to allow swapping between local, memory, or cloud-based storage backends.
- [Task Result Storage](https://awesome-repositories.com/f/data-databases/task-result-storage.md) — Saves extracted data to internal storage during execution and exports the final collection to standard file formats. ([source](https://crawlee.dev/blog/scrape-crunchbase-python.md))
- [Collection Iterators](https://awesome-repositories.com/f/data-databases/collection-iterators.md) — Supports asynchronous iteration over large datasets to process records efficiently without memory exhaustion. ([source](https://crawlee.dev/blog.md))
- [Sequential Iterators](https://awesome-repositories.com/f/data-databases/collection-iterators/sequential-iterators.md) — Provides sequential iteration methods for processing stored records with mapping and reduction support. ([source](https://crawlee.dev/js/api/core/class/Dataset.md))
- [Data Collections & Datasets](https://awesome-repositories.com/f/data-databases/data-collections-datasets.md) — Organizes extracted information into structured collections that support separate storage for different data types. ([source](https://crawlee.dev/blog/scrape-bluesky-using-python.md))
- [Dataset Processors](https://awesome-repositories.com/f/data-databases/data-collections-datasets/dataset-processors.md) — Executes a custom function for every item in a collection, providing access to the data and its index. ([source](https://crawlee.dev/js/api/core/interface/DatasetConsumer.md))
- [Request Source Integrators](https://awesome-repositories.com/f/data-databases/data-sources/request-source-integrators.md) — Integrates external data sources with internal queues to control how URLs are accessed and processed during a crawl. ([source](https://crawlee.dev/blog/crawlee-for-python-v05.md))
- [Storage Backend Adapters](https://awesome-repositories.com/f/data-databases/storage-backend-adapters.md) — Provides a consistent interface for datasets and queues, allowing data to be stored in memory, local files, or databases. ([source](https://crawlee.dev/blog/crawlee-for-python-v1.md))
- [Storage Lifecycle Management](https://awesome-repositories.com/f/data-databases/storage-lifecycle-management.md) — Provides lifecycle management for data stores to maintain clean persistence for crawler runs. ([source](https://crawlee.dev/js/api/core/class/KeyValueStore.md))
- [Browser-Simulated Parsers](https://awesome-repositories.com/f/data-databases/web-data-extraction/browser-simulated-parsers.md) — Simulates a browser environment using a lightweight DOM implementation to extract data from web pages. ([source](https://crawlee.dev/blog/scrape-using-jsdom.md))

### Development Tools & Productivity

- [Headless Browser Automation](https://awesome-repositories.com/f/development-tools-productivity/headless-browser-automation.md) — Controls automated browser instances to render dynamic JavaScript content and interact with complex web interfaces.
- [AI-Driven Interaction Agents](https://awesome-repositories.com/f/development-tools-productivity/browser-automation/ai-driven-interaction-agents.md) — Performs page actions and extracts structured data using AI-driven navigation without manual selector maintenance. ([source](https://crawlee.dev/blog/crawlee-v3-16.md))
- [Browser Impersonators](https://awesome-repositories.com/f/development-tools-productivity/browser-capability-configuration/browser-impersonators.md) — Mimics browser behavior and headers to reduce the likelihood of being blocked by anti-bot systems. ([source](https://crawlee.dev/blog/crawlee-for-python-v1.md))
- [Lifecycle Event Hooks](https://awesome-repositories.com/f/development-tools-productivity/lifecycle-event-hooks.md) — Executes custom logic at specific stages of the browser and page lifecycle to manage initialization and cleanup. ([source](https://crawlee.dev/js/api/browser-pool/interface/BrowserPoolHooks.md))
- [Project Scaffolding and Configuration](https://awesome-repositories.com/f/development-tools-productivity/project-scaffolding-config-code-generation/project-scaffolding-configuration.md) — Provides a command-line interface to initialize, scaffold, and execute crawling projects, simplifying the development workflow. ([source](https://crawlee.dev/blog/scrapy-vs-crawlee.md))
- [Targeting Utilities](https://awesome-repositories.com/f/development-tools-productivity/targeting-utilities.md) — Configures target URLs with custom HTTP methods, headers, and payloads for specific scraping tasks. ([source](https://crawlee.dev/js/api/core/class/Request.md))
- [Task Scheduling](https://awesome-repositories.com/f/development-tools-productivity/task-scheduling.md) — Configures automated execution intervals for scripts to perform periodic data collection. ([source](https://crawlee.dev/blog/crawlee-python-price-tracker.md))

### DevOps & Infrastructure

- [Asynchronous Crawl Queues](https://awesome-repositories.com/f/devops-infrastructure/scheduling/asynchronous-crawl-queues.md) — Provides a persistent, asynchronous queueing system to manage and process large-scale web crawling tasks. ([source](https://crawlee.dev/index.md))
- [Cloud Deployment Platforms](https://awesome-repositories.com/f/devops-infrastructure/cloud-deployment-platforms.md) — Publishes automation scripts to a managed platform to execute tasks remotely and scale data collection. ([source](https://crawlee.dev/blog/crawlee-python-price-tracker.md))
- [Durable Crawl Queues](https://awesome-repositories.com/f/devops-infrastructure/scheduling/asynchronous-crawl-queues/durable-crawl-queues.md) — Maintains a durable record of URLs to be crawled, allowing the process to resume or scale without losing progress. ([source](https://crawlee.dev/blog/launching-crawlee-python.md))
- [Parallel Execution Strategies](https://awesome-repositories.com/f/devops-infrastructure/automation-orchestration/task-execution-frameworks/automation-frameworks/action-execution/parallel-execution-strategies.md) — Scales concurrent task execution dynamically based on CPU, memory, and event loop health to maximize throughput. ([source](https://crawlee.dev/js/api/core/class/AutoscaledPool.md))
- [Web Interaction Agents](https://awesome-repositories.com/f/devops-infrastructure/automation-orchestration/task-execution-frameworks/automation-frameworks/ai-agent-control/web-interaction-agents.md) — Interprets page elements using AI to navigate complex interfaces and extract data like a human user. ([source](https://crawlee.dev/blog/archive.md))
- [Crawling](https://awesome-repositories.com/f/devops-infrastructure/infrastructure/distributed-data-platforms/crawling.md) — Saves the progress of a crawl to storage so that interrupted tasks can resume from the last processed URL. ([source](https://crawlee.dev/js/api/core/interface/RequestListOptions.md))
- [Serverless Deployment](https://awesome-repositories.com/f/devops-infrastructure/serverless-deployment.md) — Packages scraping logic into isolated, cloud-ready units with managed infrastructure, storage, and proxy support. ([source](https://crawlee.dev/js.md))
- [Task Queues](https://awesome-repositories.com/f/devops-infrastructure/task-queues.md) — Organizes URLs into dynamic queues to facilitate systematic site traversal and prevent duplicate processing. ([source](https://crawlee.dev/js/api/basic-crawler.md))
- [Request Retries](https://awesome-repositories.com/f/devops-infrastructure/api-service-management/api-resilience/request-retries.md) — Marks crawl tasks as handled or failed to ensure reliable retries during subsequent processing cycles. ([source](https://crawlee.dev/js/api/core/class/RequestQueueV1.md))
- [Cloud Deployment](https://awesome-repositories.com/f/devops-infrastructure/cloud-deployment.md) — Packages and uploads local automation scripts to a managed infrastructure for remote execution and scheduling. ([source](https://crawlee.dev/blog/scrape-bluesky-using-python.md))
- [Containerized Deployments](https://awesome-repositories.com/f/devops-infrastructure/containerized-deployments.md) — Includes pre-configured container settings to simplify the packaging and deployment of crawling tasks. ([source](https://crawlee.dev/blog/scrapy-vs-crawlee.md))
- [Execution Environment Configurations](https://awesome-repositories.com/f/devops-infrastructure/execution-environments/execution-environment-configurations.md) — Adjusts resource limits, logging verbosity, and browser automation settings through configuration objects to control how scraping tasks run. ([source](https://crawlee.dev/js/api/core/class/Configuration.md))
- [Execution Flow Controls](https://awesome-repositories.com/f/devops-infrastructure/automation-orchestration/task-execution-frameworks/execution-flow-controls.md) — Provides controls to start, pause, resume, or abort task processing during long-running scraping operations. ([source](https://crawlee.dev/js/api/core/class/AutoscaledPool.md))
- [Queue State Configurations](https://awesome-repositories.com/f/devops-infrastructure/task-queue-management/queue-state-configurations.md) — Allows configuring whether to clear request history or resume from previous states. ([source](https://crawlee.dev/js/api/basic-crawler/interface/CrawlerRunOptions.md))

### Software Engineering & Architecture

- [Distributed Crawling Engines](https://awesome-repositories.com/f/software-engineering-architecture/distributed-systems/distributed-crawling-engines.md) — Manages large-scale data extraction tasks with automatic request queuing, proxy rotation, and persistent state management.
- [Rendering Strategy Automation](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/performance-optimization/frontend-rendering-loading/rendering-optimizations/rendering-strategy-automation.md) — Switches dynamically between lightweight HTTP requests and full browser rendering based on page content to optimize speed and resource usage. ([source](https://crawlee.dev/blog/crawlee-for-python-v1.md))
- [Queue Injection Utilities](https://awesome-repositories.com/f/software-engineering-architecture/queues/queue-injection-utilities.md) — Adds discovered URLs to a request queue for processing, supporting filtering by patterns or selectors. ([source](https://crawlee.dev/js/api/core/function/enqueueLinks.md))
- [Retry Policies](https://awesome-repositories.com/f/software-engineering-architecture/retry-policies.md) — Implements automated retry policies to handle transient network or server failures during data extraction. ([source](https://crawlee.dev/blog/scrapy-vs-crawlee.md))
- [Cross-Browser Abstractions](https://awesome-repositories.com/f/software-engineering-architecture/application-frameworks/general-purpose-frameworks/cross-browser-abstractions.md) — Provides a consistent interface for managing multiple headless browser engines to enable seamless switching between rendering environments.
- [Crawler Lifecycle Hooks](https://awesome-repositories.com/f/software-engineering-architecture/application-lifecycle-management/lifecycle-event-systems/crawler-lifecycle-hooks.md) — Provides event-driven hooks to manage crawler state changes and lifecycle events. ([source](https://crawlee.dev/js/api/core/enum/EventType.md))
- [Request Context Managers](https://awesome-repositories.com/f/software-engineering-architecture/architectural-design-patterns/state-management/request-context-managers.md) — Maintains state and metadata across the request lifecycle to facilitate navigation and data parsing. ([source](https://crawlee.dev/blog/scraping-dynamic-websites-using-python.md))
- [Automated Retry Strategies](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/reliability-patterns/automated-retry-strategies.md) — Forces automatic retries for failed requests to ensure data extraction succeeds despite transient errors. ([source](https://crawlee.dev/js/api/core/class/RetryRequestError.md))
- [Crawling Request Throttlers](https://awesome-repositories.com/f/software-engineering-architecture/request-throttling/crawling-request-throttlers.md) — Limits the number of concurrent tasks and requests per minute to ensure stable data collection and prevent server overloading. ([source](https://crawlee.dev/blog/scrape-bluesky-using-python.md))
- [Browser Task Limiters](https://awesome-repositories.com/f/software-engineering-architecture/concurrent-task-runners/concurrent-task-limiters/browser-task-limiters.md) — Scales the number of active browser pages based on available system resources to prevent memory exhaustion. ([source](https://crawlee.dev/js/api/browser-crawler.md))
- [Execution Flow Control](https://awesome-repositories.com/f/software-engineering-architecture/execution-flow-control.md) — Manages crawler execution flow by allowing graceful or immediate start, pause, and stop operations. ([source](https://crawlee.dev/js/api/basic-crawler/class/BasicCrawler.md))
- [Overload Signal Handlers](https://awesome-repositories.com/f/software-engineering-architecture/middleware/custom-middleware-implementations/overload-signal-handlers.md) — Defines custom logic to report resource pressure and manage crawler concurrency based on system health. ([source](https://crawlee.dev/js/api/core/interface/LoadSignal.md))
- [Robots Policy Enforcers](https://awesome-repositories.com/f/software-engineering-architecture/robots-policy-enforcers.md) — Checks and adheres to website robots.txt files automatically to ensure compliance with site crawling policies. ([source](https://crawlee.dev/js/api/browser-crawler/interface/BrowserCrawlerOptions.md))
- [Workflow Input Schemas](https://awesome-repositories.com/f/software-engineering-architecture/workflow-input-schemas.md) — Creates structured interfaces for crawler configuration, allowing users to provide dynamic parameters like target URLs and limits at runtime. ([source](https://crawlee.dev/blog/scrape-tiktok-python.md))
- [Event-Driven Hooks](https://awesome-repositories.com/f/software-engineering-architecture/event-driven-hooks.md) — Executes custom user logic at specific stages of the crawling process, such as navigation or browser launch.
- [Service Configuration Management](https://awesome-repositories.com/f/software-engineering-architecture/service-configuration-management.md) — Swaps core infrastructure components like storage clients or event managers to adapt the crawler to different execution environments. ([source](https://crawlee.dev/blog/crawlee-for-python-v05.md))

### Artificial Intelligence & ML

- [Autonomous Web Browsing Agents](https://awesome-repositories.com/f/artificial-intelligence-ml/autonomous-web-browsing-agents.md) — Enables AI-driven interaction with web pages using natural language instructions instead of manual selectors. ([source](https://crawlee.dev/blog.md))
- [Crawl Progress Persisters](https://awesome-repositories.com/f/artificial-intelligence-ml/workflow-state-persistences/crawl-progress-persisters.md) — Saves the progress of a URL list to storage automatically, allowing crawlers to resume interrupted tasks. ([source](https://crawlee.dev/js.md))

### Networking & Communication

- [Proxy and Fingerprint Rotation](https://awesome-repositories.com/f/networking-communication/proxy-rotation-services/proxy-and-fingerprint-rotation.md) — Applies randomized browser fingerprints and proxy configurations by default to bypass anti-scraping protections and prevent IP blocking during data collection. ([source](https://crawlee.dev/blog/scrapy-vs-crawlee.md))
- [Middleware-Based Request Pipelines](https://awesome-repositories.com/f/networking-communication/communication-protocols-architectures/request-processing-architectures/request-processing/middleware-based-request-pipelines.md) — Processes network requests through a sequence of modular functions to handle authentication, transformation, and proxy rotation.
- [Crawl Queue Batchers](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-routing-traffic-management/network-traffic-management/request-batching/crawl-queue-batchers.md) — Adds individual or batched URLs to the crawl queue while automatically deduplicating requests. ([source](https://crawlee.dev/js/api/core/class/RequestProvider.md))
- [Proxy Rotation Services](https://awesome-repositories.com/f/networking-communication/proxy-rotation-services.md) — Distributes network traffic across multiple proxy servers to maintain connectivity and bypass rate limits during large-scale scraping. ([source](https://crawlee.dev/index.md))
- [Proxy Configurations](https://awesome-repositories.com/f/networking-communication/proxy-servers/proxy-configurations.md) — Routes all browser connections through a pool of proxies to bypass restrictions and distribute traffic across different IP addresses. ([source](https://crawlee.dev/js/api/browser-crawler/interface/BrowserCrawlerOptions.md))
- [Traffic Routing Proxies](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-infrastructure-configuration/network-infrastructure/traffic-routing-proxies.md) — Directs network traffic through intermediary proxy servers to manage connection paths and bypass geographic restrictions. ([source](https://crawlee.dev/js/api/browser-pool.md))
- [Route Middleware](https://awesome-repositories.com/f/networking-communication/communication-protocols-architectures/request-processing-architectures/request-processing/route-middleware.md) — Executes registered functions sequentially before request handlers to perform logging or data transformation. ([source](https://crawlee.dev/js/api/core/class/Router.md))
- [Request Execution](https://awesome-repositories.com/f/networking-communication/communication-protocols-architectures/request-processing-architectures/request-execution.md) — Provides tools for configuring and executing network requests to fetch data from target URLs. ([source](https://crawlee.dev/js/api/core/interface/BaseHttpClient.md))
- [Request Locking Mechanisms](https://awesome-repositories.com/f/networking-communication/communication-protocols-architectures/request-processing-architectures/request-processing/request-locking-mechanisms.md) — Prevents concurrent processing of the same request by locking it during execution to maintain data integrity. ([source](https://crawlee.dev/js/api/basic-crawler/interface/CrawlerExperiments.md))
- [Proxy Management](https://awesome-repositories.com/f/networking-communication/proxy-management.md) — Routes traffic through specified proxy servers and isolates browser instances to improve anonymity. ([source](https://crawlee.dev/js/api/browser-crawler/interface/BrowserLaunchContext.md))

### Security & Cryptography

- [Anti-Bot Evasion](https://awesome-repositories.com/f/security-cryptography/bot-detection/anti-bot-evasion.md) — Provides specialized HTTP clients that mimic browser TLS fingerprints and headers to evade detection by security services like Cloudflare. ([source](https://crawlee.dev/blog/scraping-dynamic-websites-using-python.md))
- [Anti-Abuse Systems](https://awesome-repositories.com/f/security-cryptography/anti-abuse-systems.md) — Implements advanced techniques like proxy rotation and fingerprinting to bypass security challenges and anti-scraping protections.
- [Circumvention Strategies](https://awesome-repositories.com/f/security-cryptography/bot-management/circumvention-strategies.md) — Detects and attempts to circumvent common anti-bot measures like rate limiting or challenge pages to ensure successful data extraction. ([source](https://crawlee.dev/js/api/cheerio-crawler/interface/CheerioCrawlerOptions.md))
- [Stateful Session Persistence](https://awesome-repositories.com/f/security-cryptography/identity-access-management/session-management/stateful-session-persistence.md) — Maintains browser context and authentication state across multi-step web interactions to ensure reliable scraping sessions. ([source](https://crawlee.dev/js/api/cheerio-crawler/interface/CheerioCrawlerOptions.md))
- [Browser Impersonation](https://awesome-repositories.com/f/security-cryptography/network-infrastructure-security/web-network-security/network-security/traffic-inspection-manipulation/request-impersonation-tools/browser-impersonation.md) — Masks HTTP requests with browser-specific fingerprints to bypass automated access protections and security challenges during web scraping. ([source](https://crawlee.dev/blog/scrape-crunchbase-python.md))
- [Challenge Resolution](https://awesome-repositories.com/f/security-cryptography/security/utilities/security-hardening-and-protection/challenge-resolution.md) — Detects and resolves common security challenges like Cloudflare to maintain uninterrupted access to protected web content. ([source](https://crawlee.dev/blog.md))
- [Session & Cookie Handlers](https://awesome-repositories.com/f/security-cryptography/session-cookie-handlers.md) — Extracts and injects session cookies to maintain authentication state across multiple web scraping requests. ([source](https://crawlee.dev/js/api/browser-pool/class/BrowserController.md))
- [Fingerprint Configuration](https://awesome-repositories.com/f/security-cryptography/device-fingerprinting/fingerprint-configuration.md) — Generates realistic browser headers, TLS fingerprints, and rendering characteristics to prevent detection by modern anti-bot security systems. ([source](https://cdn.jsdelivr.net/gh/apify/crawlee@master/README.md))
- [Fingerprint Randomization](https://awesome-repositories.com/f/security-cryptography/device-fingerprinting/fingerprint-randomization.md) — Creates randomized browser fingerprints including headers, user agents, and screen resolutions to help automated scrapers mimic human behavior and avoid detection by anti-bot systems. ([source](https://crawlee.dev/js/api/browser-pool/interface/FingerprintGenerator.md))
- [Stealth Navigation](https://awesome-repositories.com/f/security-cryptography/security/utilities/security-hardening-and-protection/stealth-navigation.md) — Employs browser fingerprinting and stealth techniques to mimic human behavior and prevent detection by anti-scraping systems. ([source](https://crawlee.dev/js.md))
- [Fingerprint Injection](https://awesome-repositories.com/f/security-cryptography/browser-fingerprinting-services/fingerprint-injection.md) — Injects realistic device signals and browser attributes into automated sessions to prevent detection by anti-bot systems that monitor for headless browser patterns. ([source](https://crawlee.dev/blog/crawlee-for-python-v1.md))
- [Device Fingerprinting](https://awesome-repositories.com/f/security-cryptography/device-fingerprinting.md) — Generates realistic browser headers and TLS fingerprints to mimic human behavior and evade detection by security services.
- [Request Limiters](https://awesome-repositories.com/f/security-cryptography/request-size-limiters/request-limiters.md) — Sets a hard limit on the number of pages processed during a crawl to prevent infinite loops and manage resource consumption. ([source](https://crawlee.dev/js/api/cheerio-crawler/interface/CheerioCrawlerOptions.md))
- [Callback-Based Bypass Logic](https://awesome-repositories.com/f/security-cryptography/security-detection-logic/callback-based-bypass-logic.md) — Detects and solves automated bot protection challenges with configurable callbacks for custom detection logic and interaction behavior. ([source](https://crawlee.dev/blog/crawlee-v3-16.md))
- [Session Authentication](https://awesome-repositories.com/f/security-cryptography/session-authentication.md) — Automates the retrieval and storage of verification headers or tokens from web pages to maintain authenticated state across subsequent API requests. ([source](https://crawlee.dev/blog/scrape-using-jsdom.md))
- [Fingerprint Caching](https://awesome-repositories.com/f/security-cryptography/device-fingerprinting/fingerprint-caching.md) — Links specific browser fingerprints to individual sessions to ensure consistent identity across multiple requests and improve the reliability of automated scraping tasks. ([source](https://crawlee.dev/js/api/browser-pool/interface/FingerprintOptions.md))

### Testing & Quality Assurance

- [Browser Automation Interfaces](https://awesome-repositories.com/f/testing-quality-assurance/software-testing/testing-frameworks/test-frameworks/browser-and-ui-testing/browser-automation-frameworks/browser-automation-interfaces.md) — Provides a consistent interface for common browser operations across different automation engines. ([source](https://crawlee.dev/js/api/browser-pool.md))
- [Device and Network Emulators](https://awesome-repositories.com/f/testing-quality-assurance/automation-interaction-tools/user-interaction-simulation/device-and-network-emulators.md) — Configures browser automation to mimic specific hardware profiles like desktop or mobile for accurate content rendering. ([source](https://crawlee.dev/js/api/browser-pool/enum/DeviceCategory.md))
- [Browser Page Management](https://awesome-repositories.com/f/testing-quality-assurance/general-testing-utilities/test-utilities-assertions/browser-ui-interaction/browser-environment-emulation/browser-page-management.md) — Spawns new pages within browser instances to handle concurrent web navigation tasks. ([source](https://crawlee.dev/js/api/browser-pool/class/BrowserPool.md))
- [URL Pattern Matchers](https://awesome-repositories.com/f/testing-quality-assurance/general-testing-utilities/test-utilities-assertions/network-api-mocking/url-pattern-matchers.md) — Restricts crawling to specific URL patterns to ensure the crawler stays within defined domain boundaries. ([source](https://crawlee.dev/js/api/core/enum/EnqueueStrategy.md))
- [Cross-Browser Execution Engines](https://awesome-repositories.com/f/testing-quality-assurance/testing-infrastructure-management/test-infrastructure/cross-browser-execution-engines.md) — Opens pages across multiple browser engines simultaneously to facilitate cross-browser testing or parallel data extraction. ([source](https://crawlee.dev/js/api/browser-pool/class/BrowserPool.md))
- [Page Lifecycle Monitors](https://awesome-repositories.com/f/testing-quality-assurance/general-testing-utilities/test-utilities-assertions/browser-ui-interaction/browser-environment-emulation/browser-page-management/page-lifecycle-monitors.md) — Executes callbacks when browser pages are created or closed to track activity and manage sessions. ([source](https://crawlee.dev/js/api/browser-pool/interface/BrowserPoolEvents.md))

### Operating Systems & Systems Programming

- [Multi-Instance Process Isolations](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/multi-instance-process-isolations.md) — Supports running multiple isolated crawler instances with unique proxy and session configurations to prevent cross-request interference. ([source](https://crawlee.dev/blog/superscraper-with-crawlee.md))

### System Administration & Monitoring

- [Performance & Resource Management](https://awesome-repositories.com/f/system-administration-monitoring/performance-monitoring-tools/performance-resource-management.md) — Dynamically adjusts active browser instances based on system capacity to prevent resource exhaustion. ([source](https://crawlee.dev/js/api/browser-crawler/class/BrowserCrawler.md))
- [Error Snapshots](https://awesome-repositories.com/f/system-administration-monitoring/error-tracking/error-snapshots.md) — Saves a screenshot and the HTML content of a web page when an error occurs to assist with debugging. ([source](https://crawlee.dev/js/api/core/class/ErrorSnapshotter.md))
- [Metric and Performance Monitors](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors.md) — Tracks and logs performance metrics and request status to provide visibility into crawl progress. ([source](https://crawlee.dev/js/api/browser-crawler/class/BrowserCrawler.md))
- [Session Health Monitors](https://awesome-repositories.com/f/system-administration-monitoring/session-tracking/session-health-monitors.md) — Monitors session health by tracking usage and error scores to automatically retire blocked or unreliable sessions. ([source](https://crawlee.dev/js/api/core/class/Session.md))
- [Task Status Monitors](https://awesome-repositories.com/f/system-administration-monitoring/task-status-monitors.md) — Provides real-time metrics on pending and handled requests to track the progress of crawling tasks. ([source](https://crawlee.dev/js/api/core/class/RequestProvider.md))
- [Error Reporting](https://awesome-repositories.com/f/system-administration-monitoring/error-reporting.md) — Captures and reports application-level runtime errors and stack traces during scraping operations. ([source](https://crawlee.dev/js/api/core/interface/ErrorTrackerOptions.md))
- [Error Tracking](https://awesome-repositories.com/f/system-administration-monitoring/error-tracking.md) — Aggregates and summarizes runtime errors to identify failure patterns during automated scraping tasks. ([source](https://crawlee.dev/js/api/core/class/ErrorTracker.md))
- [System Usage Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/system-usage-monitoring.md) — Monitors memory consumption and triggers alerts to prevent process crashes during data extraction. ([source](https://crawlee.dev/js/api/core/class/MemoryLoadSignal.md))
- [Event Loop Latency Monitors](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/operational-health-alerting/event-monitoring-systems/event-loop-latency-monitors.md) — Tracks event loop latency to trigger overload signals and prevent system instability. ([source](https://crawlee.dev/js/api/core/function/createEventLoopLoadSignal.md))
- [Page Lifecycle Trackers](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-status-pages/page-lifecycle-trackers.md) — Assigns unique identifiers to browser pages to monitor their state and retrieve specific instances during scraping tasks. ([source](https://crawlee.dev/js/api/browser-pool/class/BrowserPool.md))
- [Performance Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/performance-monitoring.md) — Monitors browser instance resource usage to detect overload and trigger automated recovery. ([source](https://crawlee.dev/js/api/core/interface/ClientLoadSignalOptions.md))
- [Rate Limit Overload Monitors](https://awesome-repositories.com/f/system-administration-monitoring/rate-limit-monitoring-tools/rate-limit-overload-monitors.md) — Tracks HTTP 429 error frequency to trigger overload signals and manage request flow. ([source](https://crawlee.dev/js/api/core/function/createClientLoadSignal.md))
- [Resource Monitoring Tools](https://awesome-repositories.com/f/system-administration-monitoring/resource-monitoring-tools.md) — Captures system resource snapshots to identify potential overload states during automated data collection. ([source](https://crawlee.dev/js/api/core/interface/MemorySnapshot.md))
- [System Load Monitors](https://awesome-repositories.com/f/system-administration-monitoring/system-monitoring/system-load-monitors.md) — Tracks processor utilization and triggers signals to prevent performance degradation during high-load scraping. ([source](https://crawlee.dev/js/api/core/function/createCpuLoadSignal.md))
- [Web Performance Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/web-performance-monitoring.md) — Records diagnostic traces of browser activity to monitor execution and debug performance issues. ([source](https://crawlee.dev/js/api/browser-pool/class/PlaywrightBrowser.md))

### User Interface & Experience

- [Link Discovery Engines](https://awesome-repositories.com/f/user-interface-experience/links/link-discovery-engines.md) — Automatically identifies and adds links from pages to the crawl queue using pattern-based filtering. ([source](https://crawlee.dev/js/api/cheerio-crawler/interface/CheerioCrawlingContext.md))
- [Links](https://awesome-repositories.com/f/user-interface-experience/links.md) — Adds discovered URLs to the crawl queue with support for pattern-based filtering. ([source](https://crawlee.dev/js/api/basic-crawler/interface/BasicCrawlingContext.md))
- [Element Availability Synchronizers](https://awesome-repositories.com/f/user-interface-experience/element-locators/element-availability-synchronizers.md) — Pauses execution until a specific element appears in the document to ensure content is fully loaded before extraction. ([source](https://crawlee.dev/js/api/cheerio-crawler/interface/CheerioCrawlingContext.md))
- [Element Property Inspection](https://awesome-repositories.com/f/user-interface-experience/element-property-inspection.md) — Retrieves or updates specific attributes, data properties, and input values from matched DOM elements to extract or modify page content. ([source](https://crawlee.dev/js/api/basic-crawler/class/Cheerio.md))
- [Infinite Scroll Components](https://awesome-repositories.com/f/user-interface-experience/infinite-scroll-components.md) — Automates repeated scrolling to the bottom of webpages to capture all dynamic content during a crawl. ([source](https://crawlee.dev/blog/infinite-scroll-using-python.md))