# gunnarmorling/1brc

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/gunnarmorling-1brc).**

8,062 stars · 2,234 forks · Java · Apache-2.0

## Links

- GitHub: https://github.com/gunnarmorling/1brc
- Homepage: https://www.morling.dev/blog/one-billion-row-challenge/
- awesome-repositories: https://awesome-repositories.com/repository/gunnarmorling-1brc.md

## Topics

`1brc` `challenges`

## Description

The 1BRC (One Billion Row Challenge) is a Java performance benchmarking exercise that processes one billion temperature records from a text file to compute the minimum, mean, and maximum temperature per weather station. At its core, it is a large-scale data aggregation challenge designed to test how efficiently a Java program can parse and aggregate structured data from a plain text file, serving as both a programming exercise and a benchmark for Java performance optimization.

The project distinguishes itself through a collection of performance-oriented architectural patterns for high-throughput data processing. These include branchless temperature parsing using bitwise operations, CPU-core-local aggregation maps that eliminate lock contention, a custom primitive hash map with long keys and int values to minimize object overhead, and garbage-collection-aware allocation that pre-allocates all working data structures upfront. Additional differentiators include JIT-friendly loop unrolling, memory-mapped file I/O, parallel stream processing across file chunks, and direct memory access via sun.misc.Unsafe to bypass bounds checks.

The project also provides supporting capabilities for benchmarking and profiling, including synthetic dataset generation with configurable parameters for reproducible testing, CPU profiling with flamegraphs to visualize execution time distribution, and tools for measuring and optimizing Java code execution speed against the fixed data processing challenge. The repository includes utilities for generating benchmark data files and profiling application performance to identify bottlenecks.

## Tags

### Data & Databases

- [Data Aggregation Challenges](https://awesome-repositories.com/f/data-databases/large-scale-data-computation/data-aggregation-challenges.md) — A programming exercise that processes one billion temperature records from a text file to compute per-station statistics.
- [Memory-Mapped File Access](https://awesome-repositories.com/f/data-databases/data-access-querying/memory-mapped-file-access.md) — Reads the input file by mapping it directly into virtual memory, avoiding traditional buffered reads for faster access.
- [Grouped Aggregations](https://awesome-repositories.com/f/data-databases/grouped-aggregations.md) — Computing summary statistics like min, mean, and max across grouped data records.
- [Primitive](https://awesome-repositories.com/f/data-databases/hash-maps/primitive.md) — Uses a hand-optimised hash map with primitive long keys and int values to minimise object overhead and garbage collection.
- [Chunked File Processing](https://awesome-repositories.com/f/data-databases/parallel-processing/chunked-file-processing.md) — Splits the file into chunks processed concurrently by multiple threads, aggregating partial results before merging.
- [Text File Processing Benchmarks](https://awesome-repositories.com/f/data-databases/text-file-parsers/text-file-processing-benchmarks.md) — A benchmark that tests how efficiently a Java program can parse and aggregate structured data from a plain text file.

### Programming Languages & Runtimes

- [Performance Benchmarks](https://awesome-repositories.com/f/programming-languages-runtimes/programming-language-varieties/programming-languages/jvm-languages/java/performance-benchmarks.md) — Measuring and optimizing the execution speed of Java programs processing large datasets.
- [JIT-Friendly Loop Unrolling](https://awesome-repositories.com/f/programming-languages-runtimes/looping-constructs/loop-unrolling-transformations/jit-friendly-loop-unrolling.md) — Writes tight, manually unrolled loops that the JIT compiler can further optimise into efficient native machine code.

### System Administration & Monitoring

- [Aggregated Temperature Statistics](https://awesome-repositories.com/f/system-administration-monitoring/logging-and-telemetry/metric-data-ingestion/temperature-metrics/aggregated-temperature-statistics.md) — Reads a large text file of weather station temperature readings and computes the min, mean, and max per station. ([source](https://cdn.jsdelivr.net/gh/gunnarmorling/1brc@main/README.md))
- [Thread-Local Aggregation](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/virtual-thread-monitors/thread-safe-metric-counters/thread-local-aggregation.md) — Assigns each processing thread its own aggregation map to eliminate lock contention, merging results only at the end.

### Software Engineering & Architecture

- [Branch-Less Parsing Techniques](https://awesome-repositories.com/f/software-engineering-architecture/branching-depth-reducers/branch-less-parsing-techniques.md) — Parses temperature values using bitwise operations and integer arithmetic instead of branching, reducing CPU pipeline stalls.
- [Zero-Allocation Architectures](https://awesome-repositories.com/f/software-engineering-architecture/zero-allocation-architectures.md) — Pre-allocates all working data structures upfront and avoids object creation during the hot loop to eliminate GC pauses.

### Testing & Quality Assurance

- [Java Benchmarking Tools](https://awesome-repositories.com/f/testing-quality-assurance/performance-benchmarking-tools/java-benchmarking-tools.md) — A tool for measuring and optimizing Java code execution speed against a fixed data processing challenge.

### Web Development

- [Large File Processing](https://awesome-repositories.com/f/web-development/client-side-data-ingestion/large-file-processing.md) — Reading and aggregating data from text files with billions of rows efficiently.

### Development Tools & Productivity

- [CPU Profilers](https://awesome-repositories.com/f/development-tools-productivity/debugging-profiling-testing/debugging-diagnostics/performance-resource-profilers/cpu-profilers.md) — Identifying performance bottlenecks in Java code using flamegraphs and execution time analysis.

### Operating Systems & Systems Programming

- [Direct Memory Access](https://awesome-repositories.com/f/operating-systems-systems-programming/direct-memory-access.md) — Leverages Unsafe for direct memory operations on the mapped file, bypassing bounds checks for maximum throughput.
