# kr1s77/python-crawler-tutorial-starts-from-zero

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/kr1s77-python-crawler-tutorial-starts-from-zero).**

4,567 stars · 760 forks · Python

## Links

- GitHub: https://github.com/Kr1s77/Python-crawler-tutorial-starts-from-zero
- awesome-repositories: https://awesome-repositories.com/repository/kr1s77-python-crawler-tutorial-starts-from-zero.md

## Description

This project is a Python web scraping tutorial and framework designed for building automated data extraction tools and web crawlers. It provides a structured approach to navigating websites and persisting scraped data to databases.

The project includes a toolset for web API analysis, focusing on reverse engineering obfuscated API requests and inspecting network traffic to extract structured data. It also covers optical character recognition workflows to convert visual text within images into machine-readable strings.

The framework covers capabilities for headless browser automation to handle JavaScript and dynamic elements, as well as methods for automating browser interactions and developing scalable web crawlers.

## Tags

### Data & Databases

- [Web Crawlers](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/data-collection-tools/web-crawlers.md) — Provides a comprehensive framework for building automated web crawlers to extract data at scale. ([source](https://github.com/Kr1s77/Python-crawler-tutorial-starts-from-zero/tree/master//))
- [CSS and XPath Query Engines](https://awesome-repositories.com/f/data-databases/content-extraction/xpath-2-0-parsing/css-and-xpath-query-engines.md) — Implements data extraction from webpages using CSS selectors and XPath query engines.
- [Automated Web Scraping](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/data-collection-tools/web-crawlers/automated-web-scraping.md) — Automates the process of navigating websites and extracting data while managing sessions. ([source](https://cdn.jsdelivr.net/gh/kr1s77/python-crawler-tutorial-starts-from-zero@master/README.md))
- [Web Data Extraction](https://awesome-repositories.com/f/data-databases/web-data-extraction.md) — Implements programmatic scraping and processing of web content to prepare data for analysis.
- [Scraped Data Persistence](https://awesome-repositories.com/f/data-databases/scraped-data-persistence.md) — Provides a structured approach to persisting scraped data into databases for long-term storage and analysis.
- [Scraped Data Storage](https://awesome-repositories.com/f/data-databases/scraped-data-storage.md) — Enables the storage of large volumes of unstructured scraped data in databases for future analysis. ([source](https://github.com/Kr1s77/Python-crawler-tutorial-starts-from-zero/tree/master//))

### Education & Learning Resources

- [Web Scraping Tutorials](https://awesome-repositories.com/f/education-learning-resources/educational-resources/reference-and-media/tutorials-media-curated-lists/technical-tutorials/data-analytics/web-scraping-tutorials.md) — Provides a comprehensive guide and project-based materials for automated data extraction from web sources using Python.

### Development Tools & Productivity

- [Headless Browser Automation](https://awesome-repositories.com/f/development-tools-productivity/headless-browser-automation.md) — Controls headless browser engines to automate interactions and extract content from dynamic web pages. ([source](https://github.com/Kr1s77/Python-crawler-tutorial-starts-from-zero/tree/master//))

### Software Engineering & Architecture

- [API Reverse-Engineering Tools](https://awesome-repositories.com/f/software-engineering-architecture/application-lifecycle-management/data-integration-and-processing/http-api-references/reverse-engineered-api-references/api-reverse-engineering-tools.md) — Provides utilities to reconstruct API specifications and discover hidden endpoints from intercepted traffic.

### Web Development

- [Web Crawlers](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping/web-crawlers.md) — Offers a structured framework for developing Python-based web crawlers that traverse websites at scale.
- [API Reverse Engineering](https://awesome-repositories.com/f/web-development/web-scraping-engines/api-reverse-engineering.md) — Provides tools to study network traffic and reverse engineer obfuscated requests to interact with protected services. ([source](https://cdn.jsdelivr.net/gh/kr1s77/python-crawler-tutorial-starts-from-zero@master/README.md))

### Networking & Communication

- [Browser Mimicking Requests](https://awesome-repositories.com/f/networking-communication/browser-mimicking-requests.md) — Implements browser mimicking requests to interact with hidden API endpoints and bypass restrictions.
- [Request Header Configuration](https://awesome-repositories.com/f/networking-communication/request-header-configuration.md) — Ships tools for configuring request headers to mimic browsers and bypass bot detection.
- [Traffic Interception](https://awesome-repositories.com/f/networking-communication/traffic-interception.md) — Provides techniques for intercepting network traffic to analyze API logic and data formats.

### Security & Cryptography

- [JavaScript De-obfuscation](https://awesome-repositories.com/f/security-cryptography/javascript-de-obfuscation.md) — Provides methods for analyzing and de-obfuscating JavaScript to discover hidden API endpoints. ([source](https://github.com/Kr1s77/Python-crawler-tutorial-starts-from-zero/tree/master//))
