# adbar/trafilatura

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/adbar-trafilatura).**

5,319 stars · 344 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/adbar/trafilatura
- Homepage: https://trafilatura.readthedocs.io
- awesome-repositories: https://awesome-repositories.com/repository/adbar-trafilatura.md

## Topics

`article-extractor` `corpus-builder` `corpus-tools` `crawler` `html-to-markdown` `html2text` `llm` `news-aggregator` `news-crawler` `nlp` `rag` `readability` `rss-feed` `scraping` `tei` `text-cleaning` `text-extraction` `text-mining` `text-preprocessing` `web-scraping`

## Tags

### Part of an Awesome List

- [Data Scraping Tools](https://awesome-repositories.com/f/awesome-lists/ai/data-scraping-tools.md) — Python tool for gathering text and metadata from web pages.
- [Web Scraping](https://awesome-repositories.com/f/awesome-lists/devtools/web-scraping.md) — Tool to gather text and metadata from the web.
