# oxnr/awesome-bigdata

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/oxnr-awesome-bigdata).**

14,241 stars · 2,594 forks · mit

## Links

- GitHub: https://github.com/oxnr/awesome-bigdata
- Homepage: https://github.com/onurakpolat/awesome-bigdata
- awesome-repositories: https://awesome-repositories.com/repository/oxnr-awesome-bigdata.md

## Topics

`awesome` `awesome-list` `bigdata` `data` `data-analytics` `data-science` `data-stream` `data-visualization` `data-warehouse` `database` `distributed-database` `series-database` `stream-processing` `streaming-data` `visualize-data`

## Description

This project is a curated directory of software, frameworks, and educational resources designed for building, scaling, and maintaining distributed data processing and storage architectures. It serves as a comprehensive index for the distributed computing ecosystem, helping users identify the appropriate tools for managing large-scale information systems.

The repository functions as a central hub for data engineering, offering categorized access to technologies that support batch and stream processing, machine learning, and interactive querying. By organizing these resources, it assists in the design and development of complex data pipelines and the selection of infrastructure components for massive datasets.

## Tags

### Repository Format

- [Awesome List](https://awesome-repositories.com/f/repository-format/awesome-list.md) — A community-curated directory that catalogs and links out to other open-source projects, rather than a standalone tool you run yourself.

### Content Management & Publishing

- [Curated Software Directories](https://awesome-repositories.com/f/content-management-publishing/content-management-systems/content-management-platforms/enterprise-specialized-systems/knowledge-management-systems/categorical-directory-indexing/curated-software-directories.md) — Acts as a structured directory of high-quality software projects and frameworks organized by technical domain for distributed data architectures.

### Data & Databases

- [Data Analytics Engines](https://awesome-repositories.com/f/data-databases/data-analysis-visualization/analytical-platforms-engines/data-analytics-engines.md) — Provides a comprehensive index of high-performance computational engines for executing complex analytical queries on massive datasets.
- [Data Discovery Tools](https://awesome-repositories.com/f/data-databases/data-discovery-tools.md) — Helps identify and evaluate databases and processing frameworks for large-scale data infrastructure.
- [Data Pipeline Orchestration](https://awesome-repositories.com/f/data-databases/data-pipeline-orchestration.md) — Schedules recurring batch jobs and resolves task dependencies for reliable data pipeline execution. ([source](https://github.com/oxnr/awesome-bigdata/blob/master/README.md))
- [Distributed Computing](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing/distributed-processing-frameworks/distributed-computing.md) — Executes data processing tasks across interconnected nodes to handle massive datasets through parallel computation.
- [Columnar Storage Engines](https://awesome-repositories.com/f/data-databases/columnar-storage-engines.md) — Organizes data into columns to optimize analytical query performance and compression on massive datasets.
- [Distributed Computing Engines](https://awesome-repositories.com/f/data-databases/data-engineering/distributed-compute-frameworks/distributed-computing-engines.md) — Indexes a wide range of distributed computing engines and frameworks for batch, stream, and interactive data processing.
- [Stream Processing Systems](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/stream-processing-systems.md) — Processes continuous data flows through distributed buffers for real-time analysis and state updates.
- [Distributed Data Processing](https://awesome-repositories.com/f/data-databases/distributed-data-processing.md) — Executes batch and real-time data workflows across computing clusters using parallel programming models. ([source](https://github.com/oxnr/awesome-bigdata/blob/master/README.md))
- [High-Performance Data Infrastructures](https://awesome-repositories.com/f/data-databases/high-performance-data-infrastructures.md) — Maintains scalable, high-performance storage systems for structured and unstructured data across cloud environments. ([source](https://github.com/oxnr/awesome-bigdata/blob/master/README.md))
- [Log-Structured Storage](https://awesome-repositories.com/f/data-databases/log-structured-storage.md) — Uses append-only file structures to optimize write throughput and retrieval in distributed storage systems.
- [Query Languages](https://awesome-repositories.com/f/data-databases/query-languages.md) — Facilitates interactive analysis of large datasets using standard query languages. ([source](https://github.com/oxnr/awesome-bigdata/blob/master/README.md))
- [Data Ingestion](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/data-ingestion.md) — Collects and synchronizes streaming data from external sources into storage or processing systems. ([source](https://github.com/oxnr/awesome-bigdata/blob/master/README.md))
- [Data Visualization Dashboards](https://awesome-repositories.com/f/data-databases/data-visualization-dashboards.md) — Enables the creation of interactive dashboards and charts to visualize complex data insights. ([source](https://github.com/oxnr/awesome-bigdata/blob/master/README.md))

### Artificial Intelligence & ML

- [Large-Scale Training Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-training-frameworks.md) — Curates infrastructure and orchestration frameworks for scaling machine learning model training across massive compute clusters.
- [Machine Learning Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training.md) — Supports training predictive models on large distributed datasets to improve analytical accuracy. ([source](https://github.com/oxnr/awesome-bigdata/blob/master/README.md))

### Education & Learning Resources

- [Developer Resource Hubs](https://awesome-repositories.com/f/education-learning-resources/developer-resource-hubs.md) — Serves as a centralized hub for accessing educational materials, libraries, and tools for designing large-scale data pipelines.
- [Engineering Guides](https://awesome-repositories.com/f/education-learning-resources/data-engineering-curricula/engineering-guides.md) — Offers educational resources and libraries for designing and maintaining distributed data processing architectures. ([source](https://github.com/oxnr/awesome-bigdata#readme))

### Software Engineering & Architecture

- [Directed Acyclic Graph Engines](https://awesome-repositories.com/f/software-engineering-architecture/directed-acyclic-graph-engines.md) — Manages complex data pipeline dependencies and execution order across distributed environments.
- [Query Optimization Engines](https://awesome-repositories.com/f/software-engineering-architecture/query-optimization-engines.md) — Translates high-level analytical requests into optimized execution plans for distributed processing engines.

### Security & Cryptography

- [Access Control](https://awesome-repositories.com/f/security-cryptography/identity-access-management/access-control.md) — Implements authentication and authorization controls to protect sensitive data within distributed environments. ([source](https://github.com/oxnr/awesome-bigdata/blob/master/README.md))
- [Role-Based Access Control](https://awesome-repositories.com/f/security-cryptography/role-based-access-control.md) — Enforces security by mapping user identities to specific permissions for data and infrastructure access.
