# omkarcloud/botasaurus

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/omkarcloud-botasaurus).**

3,970 stars · 342 forks · Python · mit

## Links

- GitHub: https://github.com/omkarcloud/botasaurus
- Homepage: https://www.omkar.cloud/botasaurus/
- awesome-repositories: https://awesome-repositories.com/repository/omkarcloud-botasaurus.md

## Topics

`anti-bot` `anti-detect` `anti-detect-browser` `anti-detection` `antidetect-browser` `bot-detection` `bypass-cloudflare` `cloudflare-bypass` `cloudflare-scrape` `python-scraper` `python-web-scraper` `python-web-scraping` `scraping-framework` `scraping-python` `scraping-tool` `undetectable` `undetected` `undetected-chromedriver` `web-crawling` `web-scraping-python`

## Description

Botasaurus is a Python web scraping framework and headless browser automation system used to build scalable data extraction tools. It functions as a web data extraction tool and OCR document parser, converting website content, images, and PDF files into structured formats such as JSON, CSV, and Excel.

The framework distinguishes itself by providing a scraper management interface that allows Python functions to be wrapped in a web-based UI or deployed as standalone desktop applications. This enables non-technical users to trigger extraction jobs and manage tasks via a graphical interface or REST API without writing code.

The system covers a broad range of capabilities including bot detection bypass, proxy rotation, and resource-aware parallel execution to manage large-scale data collection. It provides integrated utilities for session persistence, asynchronous task orchestration, and document text extraction via optical character recognition.

Data management is supported through interchangeable database backends, result caching, and interactive filtering and sorting tools for viewing extracted data.

## Tags

### Web Development

- [Web Automation and Scraping](https://awesome-repositories.com/f/web-development/web-automation-scraping.md) — Provides a comprehensive framework for programmatic browser control, data extraction, and automated web interactions. ([source](https://cdn.jsdelivr.net/gh/omkarcloud/botasaurus@master/README.md))
- [Browser Session Persistence](https://awesome-repositories.com/f/web-development/browser-session-persistence.md) — Saves browser fingerprints and cookies in lightweight profiles to maintain session state and bypass detection across runs.
- [Web Scraping Frameworks](https://awesome-repositories.com/f/web-development/web-scraping-frameworks.md) — Offers a comprehensive Python framework for building scalable web scrapers with parallel execution and bot bypass.
- [Scraper Application Development](https://awesome-repositories.com/f/web-development/desktop-development/scraper-application-development.md) — Enables building standalone local applications with GUIs for running data extraction tasks.
- [Resource Blocking](https://awesome-repositories.com/f/web-development/performance-optimizations/initial-page-load-optimizations/resource-blocking.md) — Blocks images and heavy assets during the scraping process to reduce load times and save system resources. ([source](https://www.omkar.cloud/botasaurus/docs/what-is-botasaurus))
- [Browser Session Managers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/browser-automation/browser-session-managers.md) — Controls browser profiles, network proxies, and authenticated sessions while keeping drivers alive to reduce overhead. ([source](https://www.omkar.cloud/botasaurus/docs/what-is-botasaurus))
- [Task Triggering Interfaces](https://awesome-repositories.com/f/web-development/web-infrastructure-deployment/task-triggering-interfaces.md) — Wraps functions in a web-based interface and REST API to let users trigger extraction jobs. ([source](https://cdn.jsdelivr.net/gh/omkarcloud/botasaurus@master/README.md))

### Artificial Intelligence & ML

- [Optical Character Recognition](https://awesome-repositories.com/f/artificial-intelligence-ml/optical-character-recognition.md) — Converts images and PDF files into structured text and spreadsheets using optical character recognition.

### Part of an Awesome List

- [OCR Document Parsers](https://awesome-repositories.com/f/awesome-lists/data/document-parsing-and-extraction/ocr-document-parsers.md) — Uses optical character recognition to extract structured text and data from images and PDF files.
- [Optical Character Recognitions](https://awesome-repositories.com/f/awesome-lists/more/text-extraction-and-ocr/optical-character-recognitions.md) — Converts images and PDFs into structured formats like Excel or LaTeX using optical character recognition. ([source](https://www.omkar.cloud/))
- [Bot Detection Evasion](https://awesome-repositories.com/f/awesome-lists/security/bot-detection-evasion.md) — Emulates human-like browser behavior and interaction patterns to circumvent automated bot detection systems. ([source](https://cdn.jsdelivr.net/gh/omkarcloud/botasaurus@master/README.md))

### Data & Databases

- [Data Extraction Pipelines](https://awesome-repositories.com/f/data-databases/data-extraction-pipelines.md) — Implements automated workflows for parsing and transforming raw web and document content into structured formats.
- [High-Volume Data Collection](https://awesome-repositories.com/f/data-databases/high-volume-data-collection.md) — Manages parallel browser instances and rotating proxies to collect massive amounts of information.
- [Web Data Extraction Tools](https://awesome-repositories.com/f/data-databases/web-data-extraction-tools.md) — Converts website content and documents into structured formats like JSON, CSV, and Excel.
- [Execution Result Caches](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/caching-performance/caching-strategies/query-result-caching/method-result-caches/execution-result-caches.md) — Caches the outputs of web scraping executions to avoid redundant network requests and improve performance. ([source](https://cdn.jsdelivr.net/gh/omkarcloud/botasaurus@master/README.md))
- [Database Backend Integration](https://awesome-repositories.com/f/data-databases/external-storage-integrations/database-backend-integration.md) — Integrates with production-grade databases using connection strings for scalable result management. ([source](https://github.com/omkarcloud/botasaurus/blob/master/advanced.md))
- [Resource-Aware Scaling Controllers](https://awesome-repositories.com/f/data-databases/horizontal-database-scaling/resource-scaling-strategies/resource-aware-scaling-controllers.md) — Calculates available system memory to dynamically determine the optimal number of concurrent browser instances.
- [Pluggable Database Backends](https://awesome-repositories.com/f/data-databases/persistent-storage-backends/pluggable-database-backends.md) — Supports interchangeable database backends, allowing users to switch between local storage and production-grade databases.

### Development Tools & Productivity

- [Headless Browser Automation](https://awesome-repositories.com/f/development-tools-productivity/headless-browser-automation.md) — Provides an automation system for controlling headless browsers to interact with page elements and manage sessions.
- [Resource-Aware Parallelization](https://awesome-repositories.com/f/development-tools-productivity/parallel-execution/custom-parallel-task-execution/resource-aware-parallelization.md) — Implements a system that calculates available memory to determine the optimal number of concurrent browser instances for data extraction. ([source](https://www.omkar.cloud/botasaurus/docs/what-is-botasaurus))
- [Input List Decomposition](https://awesome-repositories.com/f/development-tools-productivity/workflow-automations/task-decompositions/input-list-decomposition.md) — Splits lists of inputs into individual sub-tasks to enable parallel processing of multiple items. ([source](https://github.com/omkarcloud/botasaurus/blob/master/advanced.md))

### DevOps & Infrastructure

- [Task Schedulers](https://awesome-repositories.com/f/devops-infrastructure/automation-orchestration/task-execution-frameworks/task-job-management/task-schedulers.md) — Schedules, caches, sorts, and filters data extraction jobs to organize the overall processing workflow. ([source](https://www.omkar.cloud/botasaurus/docs/botasaurus-desktop/introduction))
- [Distribution and Packaging](https://awesome-repositories.com/f/devops-infrastructure/distribution-packaging.md) — Generates installers and handles automatic updates to distribute logic as a standalone desktop application. ([source](https://www.omkar.cloud/botasaurus/docs/botasaurus-desktop/introduction))

### Networking & Communication

- [Proxy Rotation Services](https://awesome-repositories.com/f/networking-communication/proxy-rotation-services.md) — Distributes outgoing HTTP requests across a pool of rotating proxy servers to bypass IP-based rate limits.
- [Asynchronous Task Queues](https://awesome-repositories.com/f/networking-communication/communication-protocols-architectures/inter-process-communication/inter-process-communication-frameworks/asynchronous-task-queues.md) — Splits input lists into individual sub-tasks processed through an asynchronous background queue for concurrency.

### Software Engineering & Architecture

- [UI-to-Function Mappings](https://awesome-repositories.com/f/software-engineering-architecture/dynamic-function-mappings/ui-to-function-mappings.md) — Wraps Python functions in a web interface and REST API to automatically generate user-facing task controls.
- [Asynchronous Task Queues](https://awesome-repositories.com/f/software-engineering-architecture/asynchronous-task-queues.md) — Runs jobs in a background queue to perform concurrent operations such as scrolling and fetching page details. ([source](https://cdn.jsdelivr.net/gh/omkarcloud/botasaurus@master/README.md))
- [Concurrent Task Limiters](https://awesome-repositories.com/f/software-engineering-architecture/concurrent-task-runners/concurrent-task-limiters.md) — Provides utilities to cap the number of simultaneous browser instances and network requests to prevent resource exhaustion. ([source](https://github.com/omkarcloud/botasaurus/blob/master/advanced.md))
- [Task Result Aggregation](https://awesome-repositories.com/f/software-engineering-architecture/task-result-aggregation.md) — Combines outputs from multiple sub-tasks into a single consolidated result set for streamlined analysis. ([source](https://github.com/omkarcloud/botasaurus/blob/master/advanced.md))

### User Interface & Experience

- [Scraper Management Interfaces](https://awesome-repositories.com/f/user-interface-experience/scraper-management-interfaces.md) — Provides a web and desktop UI for triggering extraction jobs and filtering results for non-technical users.
- [Application Generators](https://awesome-repositories.com/f/user-interface-experience/desktop-applications/application-generators.md) — Generates a graphical user interface for non-technical users to input parameters and manage extraction tasks. ([source](https://www.omkar.cloud/botasaurus/docs/botasaurus-desktop/quick-start))

### Business & Productivity Software

- [Standalone Data Extractors](https://awesome-repositories.com/f/business-productivity-software/standalone-data-extractors.md) — Creates standalone local applications to extract data from websites and documents without cloud dependencies. ([source](https://cdn.jsdelivr.net/gh/omkarcloud/botasaurus@master/README.md))

### Graphics & Multimedia

- [Asset Blocking](https://awesome-repositories.com/f/graphics-multimedia/media-asset-loading/asset-blocking.md) — Implements asset-filtering during page loads to reduce memory usage and increase scraping speed.

### Testing & Quality Assurance

- [Browser Automation Interfaces](https://awesome-repositories.com/f/testing-quality-assurance/software-testing/testing-frameworks/test-frameworks/browser-and-ui-testing/browser-automation-frameworks/browser-automation-interfaces.md) — Provides interfaces that enable users to trigger automated browser tasks and data extraction via web UIs.
