Miller | Awesome Repository

Miller is a command-line data processor used for filtering, transforming, and aggregating name-indexed tabular data. It functions as a tool for querying and reshaping records across multiple file formats, serving as a converter between CSV, JSON, and YAML.

The tool distinguishes itself by using a name-indexed data model, allowing users to manipulate fields by name rather than numeric position. It utilizes single-pass streaming algorithms to compute statistics and summaries on large datasets that exceed available system memory.

Its capabilities cover data transformation and analysis, including field computation, record filtering, and data sorting. It supports the chaining of multiple operations into a linear pipeline to perform complex cleaning and statistical aggregation tasks.

Features

Name-Indexed Data Models - Uses a name-indexed data model to allow manipulation of fields by name rather than numeric position.
Tabular Data Processors - Provides utilities for filtering, aggregating, and joining delimited text data in CSV, JSON, and YAML formats.
Statistical Aggregators - Computes summary statistics such as sums, averages, and counts across grouped tabular datasets.
Data Cleaning Utilities - Filters, reshapes, and modifies name-indexed data formats using a set of command-line operations.

Features

Name-Indexed Data Models - Uses a name-indexed data model to allow manipulation of fields by name rather than numeric position.
Tabular Data Processors - Provides utilities for filtering, aggregating, and joining delimited text data in CSV, JSON, and YAML formats.
Statistical Aggregators - Computes summary statistics such as sums, averages, and counts across grouped tabular datasets.
Data Cleaning Utilities - Filters, reshapes, and modifies name-indexed data formats using a set of command-line operations.