# uscdatascience/sparkler

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/uscdatascience-sparkler).**

422 stars · 135 forks · Java · Apache-2.0

## Links

- GitHub: https://github.com/USCDataScience/sparkler
- Homepage: http://irds.usc.edu/sparkler/
- awesome-repositories: https://awesome-repositories.com/repository/uscdatascience-sparkler.md

## Description

A web crawler is a bot program that fetches resources from the web for the sake of building applications like search engines, knowledge bases, etc. Sparkler (contraction of Spark-Crawler) is a new web crawler that makes use of recent advancements in distributed computing and information…

## Tags

### Part of an Awesome List

- [Java Crawling Frameworks](https://awesome-repositories.com/f/awesome-lists/devtools/java-crawling-frameworks.md) — Apache Nutch implementation running on Spark.
