# psf/requests-html

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/psf-requests-html).**

13,826 stars · 1,000 forks · Python · MIT

## Links

- GitHub: https://github.com/psf/requests-html
- Homepage: http://html.python-requests.org
- awesome-repositories: https://awesome-repositories.com/repository/psf-requests-html.md

## Topics

`beautifulsoup` `css-selectors` `html` `http` `kennethreitz` `lxml` `pyquery` `python` `requests` `scraping`

## Description

requests-html is a Python HTML parsing library and web scraping framework. It functions as an asynchronous HTTP client and a JavaScript rendering engine designed to fetch and parse web pages for structured data extraction.

The project integrates a headless browser to execute JavaScript, allowing it to retrieve dynamically generated content that standard HTML parsers cannot see. It provides tools for automated data extraction using CSS selectors and XPath expressions to isolate specific text or attributes from HTML structures.

The framework covers network operations including asynchronous page fetching, session state management with cookies, and connection pooling. It also includes utilities for hyperlink retrieval to harvest and normalize URLs from websites.

## Tags

### Web Development

- [Web Scraping](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping.md) — Provides a framework for extracting structured information from websites that require JavaScript execution.
- [Web Page Retrievers](https://awesome-repositories.com/f/web-development/web-page-retrievers.md) — Fetches HTML content from URLs and automatically handles character encoding for correct text decoding. ([source](https://github.com/psf/requests-html/blob/master/requests_html.py))
- [Headless Browsers](https://awesome-repositories.com/f/web-development/headless-browsers.md) — Integrates a headless browser to execute JavaScript and capture the final rendered state of web pages.
- [Headless Rendering Engines](https://awesome-repositories.com/f/web-development/headless-browsers/headless-rendering-engines.md) — Integrates a headless rendering engine to execute JavaScript and retrieve dynamically generated content.
- [Asynchronous Fetching](https://awesome-repositories.com/f/web-development/web-page-retrievers/asynchronous-fetching.md) — Provides asynchronous support to fetch multiple web pages concurrently, significantly reducing total network wait time. ([source](https://github.com/psf/requests-html#readme))
- [Web Scraping Frameworks](https://awesome-repositories.com/f/web-development/web-scraping-frameworks.md) — Offers a complete toolset for automating data collection from websites using CSS selectors and XPath.
- [Web Scraping Selectors](https://awesome-repositories.com/f/web-development/web-scraping-selectors.md) — Locates specific HTML elements using CSS selectors or XPath expressions to retrieve text and attributes. ([source](https://github.com/psf/requests-html#readme))
- [Hyperlink Harvesting](https://awesome-repositories.com/f/web-development/web-page-retrievers/hyperlink-harvesting.md) — Automatically collects and normalizes all hyperlinks from a website to map page structures or build crawlers.

### Part of an Awesome List

- [HTML Parsing](https://awesome-repositories.com/f/awesome-lists/data/html-parsing.md) — Provides a human-friendly Python interface for fetching and parsing HTML web pages.
- [Hyperlink Extraction](https://awesome-repositories.com/f/awesome-lists/devtools/markup-extraction/hyperlink-extraction.md) — Collects all hyperlinks from a page and provides them as relative or absolute URLs. ([source](https://github.com/psf/requests-html/blob/master/README.rst))

### Content Management & Publishing

- [Web Page Scraping Extractors](https://awesome-repositories.com/f/content-management-publishing/web-page-scraping-extractors.md) — Extracts and structures data from web pages by parsing raw HTML via a human-friendly interface. ([source](https://github.com/psf/requests-html/blob/master/setup.py))
- [JavaScript Rendering](https://awesome-repositories.com/f/content-management-publishing/web-page-scraping-extractors/javascript-rendering.md) — Executes JavaScript on a web page using a headless browser to retrieve content generated after initial load. ([source](https://github.com/psf/requests-html#readme))

### Data & Databases

- [Structured Data Extraction](https://awesome-repositories.com/f/data-databases/structured-data-extraction.md) — Uses CSS selectors and XPath to programmatically isolate and pull specific text or attributes from HTML structures.

### Networking & Communication

- [Asynchronous HTTP Clients](https://awesome-repositories.com/f/networking-communication/asynchronous-http-clients.md) — Functions as an asynchronous HTTP client that fetches multiple web pages concurrently to improve collection speed.
- [Connection Pooling](https://awesome-repositories.com/f/networking-communication/connection-pooling.md) — Maintains a set of reusable TCP connections to minimize the overhead of opening sockets to the same host.
- [Session Management](https://awesome-repositories.com/f/networking-communication/session-management.md) — Maintains cookies and implements connection pooling to preserve state and handle redirects across multiple requests. ([source](https://github.com/psf/requests-html#readme))

### Software Engineering & Architecture

- [CSS Selector Engines](https://awesome-repositories.com/f/software-engineering-architecture/syntax-query-definitions/css-selector-engines.md) — Provides a CSS selector engine to map high-level queries to DOM elements for data extraction.

### Programming Languages & Runtimes

- [Event Loop Concurrency](https://awesome-repositories.com/f/programming-languages-runtimes/runtime-execution-environments/runtime-environments/runtime-internals-foundations/runtime-architecture/event-loop-concurrency.md) — Implements an asynchronous event loop to handle multiple network requests simultaneously without blocking the main thread.

### Security & Cryptography

- [Stateful Session Persistence](https://awesome-repositories.com/f/security-cryptography/identity-access-management/session-management/stateful-session-persistence.md) — Persists cookies and headers across multiple requests to simulate a continuous user session.
- [Web Session Management](https://awesome-repositories.com/f/security-cryptography/web-session-management.md) — Handles cookies and connection pooling to maintain a consistent state across multiple requests to a server.
