# johnkerl/miller

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/johnkerl-miller).**

9,911 stars · 237 forks · Go · NOASSERTION

## Links

- GitHub: https://github.com/johnkerl/miller
- Homepage: https://miller.readthedocs.io
- awesome-repositories: https://awesome-repositories.com/repository/johnkerl-miller.md

## Topics

`command-line` `command-line-tools` `csv` `csv-format` `data-cleaning` `data-processing` `data-reduction` `data-regression` `devops` `devops-tools` `json` `json-data` `miller` `statistical-analysis` `statistics` `streaming-algorithms` `streaming-data` `tabular-data` `tsv` `unix-toolkit`

## Description

Miller is a command-line data processor used for filtering, transforming, and aggregating name-indexed tabular data. It functions as a tool for querying and reshaping records across multiple file formats, serving as a converter between CSV, JSON, and YAML.

The tool distinguishes itself by using a name-indexed data model, allowing users to manipulate fields by name rather than numeric position. It utilizes single-pass streaming algorithms to compute statistics and summaries on large datasets that exceed available system memory.

Its capabilities cover data transformation and analysis, including field computation, record filtering, and data sorting. It supports the chaining of multiple operations into a linear pipeline to perform complex cleaning and statistical aggregation tasks.

## Tags

### Data & Databases

- [Name-Indexed Data Models](https://awesome-repositories.com/f/data-databases/name-indexed-data-models.md) — Uses a name-indexed data model to allow manipulation of fields by name rather than numeric position.
- [Tabular Data Processors](https://awesome-repositories.com/f/data-databases/tabular-data-processors.md) — Provides utilities for filtering, aggregating, and joining delimited text data in CSV, JSON, and YAML formats. ([source](https://miller.readthedocs.io))
- [Statistical Aggregators](https://awesome-repositories.com/f/data-databases/data-analysis-visualization/analytical-platforms-engines/advanced-analytics-functions/statistical-aggregators.md) — Computes summary statistics such as sums, averages, and counts across grouped tabular datasets.
- [Data Cleaning Utilities](https://awesome-repositories.com/f/data-databases/data-cleaning-utilities.md) — Filters, reshapes, and modifies name-indexed data formats using a set of command-line operations. ([source](https://miller.readthedocs.io/10min/))
- [Data Filtering](https://awesome-repositories.com/f/data-databases/data-filtering.md) — Retains only the records that satisfy a specific logical expression based on field values. ([source](https://miller.readthedocs.io/en/latest/10min))
- [Data Format Interoperability](https://awesome-repositories.com/f/data-databases/data-format-interoperability.md) — Transforms data between formats such as CSV, JSON, and YAML to move information between systems. ([source](https://cdn.jsdelivr.net/gh/johnkerl/miller@main/README.md))
- [Data Format Translators](https://awesome-repositories.com/f/data-databases/data-format-translators.md) — Provides a common internal representation to convert seamlessly between CSV, JSON, and YAML formats.
- [Stream Processing](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/stream-processing-systems/stream-processing.md) — Processes data one record at a time to handle datasets that exceed available system memory.
- [Format Converters](https://awesome-repositories.com/f/data-databases/data-serialization-formats/data-formats/json/format-converters.md) — Transforms data between CSV, JSON, and YAML formats while preserving structured fields.
- [Data Transformation Tools](https://awesome-repositories.com/f/data-databases/data-transformation-tools.md) — Cleans and aggregates records using named fields across various formats without relying on positional indices. ([source](https://cdn.jsdelivr.net/gh/johnkerl/miller@main/README.md))
- [Large Dataset Streaming](https://awesome-repositories.com/f/data-databases/incremental-data-streaming/large-dataset-streaming.md) — Employs incremental streaming techniques to process massive files that exceed available system memory.
- [Interactive Data Querying Tools](https://awesome-repositories.com/f/data-databases/interactive-data-querying-tools.md) — Offers a terminal-based interface for filtering and exploring structured, name-indexed data in real-time. ([source](https://miller.readthedocs.io/))
- [Computed Fields](https://awesome-repositories.com/f/data-databases/computed-fields.md) — Creates new data fields by applying mathematical or string operations to existing values. ([source](https://miller.readthedocs.io/en/latest/10min))
- [Data Aggregation Tools](https://awesome-repositories.com/f/data-databases/data-aggregation-tools.md) — Consolidates datasets into grouped totals or summary reports based on indexed fields. ([source](https://miller.readthedocs.io/10min/))
- [Streaming Aggregations](https://awesome-repositories.com/f/data-databases/data-analysis-visualization/analytical-platforms-engines/advanced-analytics-functions/statistical-aggregators/streaming-aggregations.md) — Reduces large datasets using single-pass algorithms that operate on streaming data to calculate summaries. ([source](https://cdn.jsdelivr.net/gh/johnkerl/miller@main/README.md))
- [Data Shaping](https://awesome-repositories.com/f/data-databases/data-shaping.md) — Removes unnecessary columns or creates new ones using programming statements to clean datasets. ([source](https://miller.readthedocs.io))
- [Data Sorting Engines](https://awesome-repositories.com/f/data-databases/data-sorting-engines.md) — Provides multi-column sorting algorithms to order tabular datasets alphabetically or numerically. ([source](https://miller.readthedocs.io/en/latest/10min))
- [Field Selection](https://awesome-repositories.com/f/data-databases/field-selection.md) — Extracts specific subsets of fields and reorders them for the final output. ([source](https://miller.readthedocs.io/en/latest/10min))
- [Field Transformations](https://awesome-repositories.com/f/data-databases/field-transformations.md) — Modifies datasets by removing unwanted columns or calculating new fields using logical expressions. ([source](https://miller.readthedocs.io/))
- [Streaming Aggregators](https://awesome-repositories.com/f/data-databases/incremental-data-streaming/large-dataset-streaming/streaming-aggregators.md) — Computes statistics and summaries on large datasets using memory-efficient single-pass streaming algorithms.

### Development Tools & Productivity

- [Command-Line Data Processors](https://awesome-repositories.com/f/development-tools-productivity/command-line-data-processors.md) — Serves as a CLI tool for parsing, transforming, and aggregating structured data streams.
- [Command-Line Data Tools](https://awesome-repositories.com/f/development-tools-productivity/command-line-data-tools.md) — Offers terminal-based utilities for retrieving, filtering, and displaying information from name-indexed tabular files.
- [Expression Evaluators](https://awesome-repositories.com/f/development-tools-productivity/mathematical-calculators/expression-evaluators.md) — Evaluates mathematical and string expressions against record fields to generate new data columns.

### Software Engineering & Architecture

- [Data Operation Pipelines](https://awesome-repositories.com/f/software-engineering-architecture/data-operation-pipelines.md) — Sequences multiple data transformation verbs into a linear execution flow for incremental processing.

### Part of an Awesome List

- [Command Line Utilities](https://awesome-repositories.com/f/awesome-lists/devtools/command-line-utilities.md) — Processes and queries structured data files.
