# firecrawl/firecrawl

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/firecrawl-firecrawl).**

133,479 stars · 7,820 forks · TypeScript · AGPL-3.0

## Links

- GitHub: https://github.com/firecrawl/firecrawl
- Homepage: https://firecrawl.dev
- awesome-repositories: https://awesome-repositories.com/repository/firecrawl-firecrawl.md

## Topics

`ai` `ai-agents` `ai-crawler` `ai-scraping` `ai-search` `crawler` `data-extraction` `html-to-markdown` `llm` `markdown` `scraper` `scraping` `web-crawler` `web-data` `web-data-extraction` `web-scraper` `web-scraping` `web-search` `webscraping`

## Description

Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveraging headless browser orchestration, the system handles dynamic, JavaScript-heavy pages to ensure comprehensive data capture.

The platform distinguishes itself through its focus on agentic workflows, providing a programmatic interface that allows autonomous agents to perform live web research, interact with pages, and execute multi-step navigation tasks. It supports distributed crawling infrastructure, enabling users to scale data collection across multiple nodes while managing concurrency and long-running jobs through asynchronous queueing. The system also integrates with agentic frameworks via standardized protocols, allowing for seamless connection to AI-powered clients and automated pipelines.

Beyond its core extraction capabilities, the project provides a suite of developer tools for site mapping, batch scraping, and web searching. It includes features for stateful session persistence, webhook-based notifications, and configurable crawl depth, allowing for granular control over how information is retrieved and processed.

The project offers comprehensive API documentation and SDKs to facilitate integration into backend services and local development environments. Users can deploy the crawling infrastructure within their own private networks or utilize managed cloud services.

## Tags

### Artificial Intelligence & ML

- [Autonomous Web Agents](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/agent-orchestration-multi-agent/autonomous-agents/autonomous-web-agents.md) — Interprets natural language instructions to navigate sites and gather data without human oversight. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Autonomous Web Researchers](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/agent-orchestration-multi-agent/autonomous-agents/autonomous-web-researchers.md) — Navigates and synthesizes information from live web sources to perform complex research tasks autonomously.
- [Agentic Web Browsing](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/integration-deployment/agentic-domains/agentic-web-browsing.md) — Equips agents with the capability to perform live web searches and interact with pages for real-time problem solving.
- [Web Access Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/agent-orchestration-multi-agent/integration-surfaces/web-access-interfaces.md) — Acts as a bridge for agents to fetch and parse web content into usable data formats. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Application Integration SDKs](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/agent-orchestration-multi-agent/integration-surfaces/application-integration-sdks.md) — Embeds web extraction capabilities directly into agentic workflows through dedicated software development kits and programmatic interfaces. ([source](https://firecrawl.dev/agent-onboarding/SKILL.md))
- [Automated Workflow Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-coding-assistants/automated-workflow-generators.md) — Triggers specialized data collection routines that synthesize raw web information into finished research briefs or technical audits. ([source](https://firecrawl.dev/agent-onboarding/SKILL.md))

### Data & Databases

- [LLM-Ready Data Extractors](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/web-extraction-engines/llm-ready-data-extractors.md) — Transforms unstructured web pages into clean, structured formats specifically optimized for language model ingestion.
- [LLM-Driven Data Extractors](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-transformation/data-parsing-extraction/llm-driven-data-extractors.md) — Leverages language models to intelligently parse and convert raw HTML into clean, semantic data structures.
- [LLM Data Preparation Tools](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/document-llm-preparation/llm-data-preparation-tools.md) — Prepares raw web content for AI by converting it into clean, structured formats like markdown or JSON.
- [Web Content Scrapers](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/web-extraction-engines/web-content-scrapers.md) — Scrapes information from web pages and converts the retrieved content into structured formats for data pipelines. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Web Search APIs](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/query-interfaces-dsls/web-search-apis.md) — Translates natural language search queries into actionable web requests to retrieve relevant links and content from across the internet. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Web Data Connectors](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/web-extraction-engines/web-data-connectors.md) — Links web-based information sources to external AI platforms via pre-built connectors that simplify data ingestion and processing. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Web Data Pipelines](https://awesome-repositories.com/f/data-databases/data-integration-synchronization/event-driven-data-pipelines/web-data-pipelines.md) — Streamlines the flow of live web data into downstream analytical services by standardizing extraction and transformation protocols.

### Testing & Quality Assurance

- [Browser Automation Interfaces](https://awesome-repositories.com/f/testing-quality-assurance/software-testing/testing-frameworks/test-frameworks/browser-and-ui-testing/browser-automation-frameworks/browser-automation-interfaces.md) — Exposes programmatic endpoints for controlling browser environments to extract and interact with live web content. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))

### Web Development

- [Web Crawling](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-crawling.md) — Maps site structures and discovers URLs to facilitate systematic content indexing across the web. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Autonomous Web Crawlers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-crawling/autonomous-web-crawlers.md) — Provides recursive navigation to map site hierarchies and aggregate data from interconnected web pages.
- [Distributed Crawling Infrastructures](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-crawling/distributed-crawling-infrastructures.md) — Scales concurrent data collection tasks across distributed architectures while managing complex web interactions.
- [Large-Scale Domain Crawlers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-crawling/large-scale-domain-crawlers.md) — Systematically indexes entire domains to retrieve comprehensive datasets suitable for large-scale analysis.
- [Web Crawlers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/web-crawlers.md) — Automates the traversal of multiple pages to discover and extract information from diverse websites. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Web Scraping APIs](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/web-scraping-apis.md) — Delivers a managed API for retrieving scraped website content in clean, machine-readable formats like markdown or JSON. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Headless Browser Orchestrators](https://awesome-repositories.com/f/web-development/web-automation-scraping/browser-orchestration-systems/headless-browser-orchestrators.md) — Manages isolated headless browser instances to render dynamic, JavaScript-heavy content for extraction.
- [Autonomous Research Agents](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/autonomous-research-agents.md) — Retrieves structured information from target URLs by applying user-defined prompts and output schemas. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Batch Scrapers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/batch-scrapers.md) — Processes multiple URLs in parallel to convert unstructured web pages into clean, structured formats suitable for large-scale data operations. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Full Page Screenshots](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/browser-automation/full-page-screenshots.md) — Captures high-fidelity visual snapshots of full-length, scrollable web pages to provide accurate documentation of site layouts. ([source](https://api.firecrawl.dev/v2/scrape))

### Networking & Communication

- [Distributed Crawl Coordination](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/distributed-crawl-coordination.md) — Coordinates discovery tasks by partitioning and synchronizing web data collection across multiple worker nodes.

### Software Engineering & Architecture

- [Model Context Protocol Integrations](https://awesome-repositories.com/f/software-engineering-architecture/integration-extensibility/programmatic-interfaces/model-context-protocol-integrations.md) — Implements standardized protocols to integrate web data extraction capabilities directly into AI-powered applications. ([source](https://cdn.jsdelivr.net/gh/firecrawl/firecrawl@main/README.md))
- [Asynchronous Data Processing](https://awesome-repositories.com/f/software-engineering-architecture/software-architecture/architectural-patterns/reactive-messaging/reactive-event-driven-systems/asynchronous-data-processing.md) — Offloads resource-intensive crawling operations to background workers to maintain non-blocking execution.
- [Web Data Service Integrations](https://awesome-repositories.com/f/software-engineering-architecture/integration-extensibility/third-party-service-connectors/web-data-service-integrations.md) — Integrates web-derived data into diverse development environments using consistent protocols that maintain data quality across third-party services. ([source](https://www.firecrawl.dev/integrations))

### Part of an Awesome List

- [AI Data Collection](https://awesome-repositories.com/f/awesome-lists/ai/ai-data-collection.md) — API for scraping and interacting with web content at scale.
- [Development Frameworks and Tools](https://awesome-repositories.com/f/awesome-lists/ai/development-frameworks-and-tools.md) — Web crawling and data extraction service for AI applications.
- [Web Scraping](https://awesome-repositories.com/f/awesome-lists/data/web-scraping.md) — API for converting websites into LLM-ready data.

### User Interface & Experience

- [Agentic Browsing Interfaces](https://awesome-repositories.com/f/user-interface-experience/graphical-user-interfaces/ai-specific-ux-design/agentic-browsing-interfaces.md) — Facilitates autonomous web navigation by providing standardized tool definitions that agents use to interact with and extract content from live websites.

### Security & Cryptography

- [Stateful Session Persistence](https://awesome-repositories.com/f/security-cryptography/identity-access-management/session-management/stateful-session-persistence.md) — Preserves browser cookies and authentication states across sequential requests to ensure continuous access during complex, multi-step web interactions.
