csvkit is a composable Unix-style command-line toolkit for converting, filtering, and analyzing CSV files directly from the terminal. It provides a suite of focused single-purpose commands that can be combined via pipes to build complex data processing workflows, with a modular architecture that includes a column-type inference engine for automatically detecting data types and a streaming-pipeline design for efficient handling of tabular data.
The toolkit distinguishes itself through its SQL-engine abstraction layer, which allows users to run SQL queries directly against CSV files without requiring a database server, treating them as database tables for flexible analysis. It also offers a format-agnostic serialization bridge for converting between CSV, JSON, Excel, and fixed-width formats, along with an in-memory aggregation engine for computing summary statistics and an interactive Python shell that pre-loads CSV data as lists for ad-hoc analysis.
Beyond its core identity, csvkit covers a broad range of CSV data operations including inspection of file structure and schema, cleaning and validation to remove duplicates and fix malformed rows, filtering and sorting by column values, joining multiple files on common columns, and splitting data based on column values. It also supports database integration for importing CSV data into PostgreSQL and exporting query results back to CSV, as well as formatted terminal display of tabular data as aligned tables.