Requests Html | Awesome Repository

requests-html is a Python HTML parsing library and web scraping framework. It functions as an asynchronous HTTP client and a JavaScript rendering engine designed to fetch and parse web pages for structured data extraction.

The project integrates a headless browser to execute JavaScript, allowing it to retrieve dynamically generated content that standard HTML parsers cannot see. It provides tools for automated data extraction using CSS selectors and XPath expressions to isolate specific text or attributes from HTML structures.

The framework covers network operations including asynchronous page fetching, session state management with cookies, and connection pooling. It also includes utilities for hyperlink retrieval to harvest and normalize URLs from websites.

Features

Web Scraping - Provides a framework for extracting structured information from websites that require JavaScript execution.
Web Page Retrievers - Fetches HTML content from URLs and automatically handles character encoding for correct text decoding.
HTML Parsing - Provides a human-friendly Python interface for fetching and parsing HTML web pages.
Web Page Scraping Extractors - Extracts and structures data from web pages by parsing raw HTML via a human-friendly interface.

Features

Web Scraping - Provides a framework for extracting structured information from websites that require JavaScript execution.
Web Page Retrievers - Fetches HTML content from URLs and automatically handles character encoding for correct text decoding.
HTML Parsing - Provides a human-friendly Python interface for fetching and parsing HTML web pages.
Web Page Scraping Extractors - Extracts and structures data from web pages by parsing raw HTML via a human-friendly interface.