# apache/spark

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/apache-spark).**

43,467 stars · 29,230 forks · Scala · Apache-2.0

## Links

- GitHub: https://github.com/apache/spark
- Homepage: https://spark.apache.org/
- awesome-repositories: https://awesome-repositories.com/repository/apache-spark.md

## Topics

`big-data` `java` `jdbc` `python` `r` `scala` `spark` `sql`

## Description

Apache Spark - A unified analytics engine for large-scale data processing

## Tags

### Part of an Awesome List

- [Machine Learning](https://awesome-repositories.com/f/awesome-lists/ai/machine-learning.md) — Apache Spark's scalable Machine Learning library for distributed computing.
- [Big Data](https://awesome-repositories.com/f/awesome-lists/data/big-data.md) — Unified analytics engine for large-scale data.
- [Data Processing and Analysis](https://awesome-repositories.com/f/awesome-lists/data/data-processing-and-analysis.md) — High-performance engine for large-scale data processing and analytics.
- [Data Processing Engines](https://awesome-repositories.com/f/awesome-lists/data/data-processing-engines.md) — Unified framework for large-scale data processing and query optimization.
- [Stream Processing](https://awesome-repositories.com/f/awesome-lists/data/stream-processing.md) — Handles micro-batch stream processing with stateful semantics.
- [Data Engineering](https://awesome-repositories.com/f/awesome-lists/devops/data-engineering.md) — Engine for large-scale data processing and analytics.
- [Distributed Computing](https://awesome-repositories.com/f/awesome-lists/devops/distributed-computing.md) — Python API for Apache Spark.
- [Streaming Engines](https://awesome-repositories.com/f/awesome-lists/devtools/streaming-engines.md) — Scalable fault-tolerant engine for streaming applications.