Why is burntsushi/xsv a recommended Command-Line Data Processors GitHub Repositories repository?

Provides a comprehensive suite of high-performance Rust-based command-line tools for processing large CSV datasets.

Why is dinedal/textql a recommended Command-Line Data Processors GitHub Repositories repository?

Provides a high-performance CLI utility for manipulating and analyzing structured datasets via SQL.

Why is missing-semester-cn/missing-semester-cn.github.io a recommended Command-Line Data Processors GitHub Repositories repository?

Teaches generating simple plots from command-line data using tools like gnuplot.

Why is osgeo/gdal a recommended Command-Line Data Processors GitHub Repositories repository?

Runs command-line utilities to translate and analyze geospatial raster and vector datasets.

Why is andmarti1424/sc-im a recommended Command-Line Data Processors GitHub Repositories repository?

Offers a command-line interface for manipulating structured datasets through sorting, filtering, and multi-format I/O.

Why is missing-semester/missing-semester a recommended Command-Line Data Processors GitHub Repositories repository?

The Missing Semester teaches computing statistics and plotting data using command-line tools like bc, R, and gnuplot.

Why is red-data-tools/youplot a recommended Command-Line Data Processors GitHub Repositories repository?

Generates statistical charts and graphs from tabular or streamed data using Unicode characters in the command line.

Why is amperser/proselint a recommended Command-Line Data Processors GitHub Repositories repository?

Functions as a terminal-based processor that accepts standard input and outputs structured linting results.

Why is zu1k/nali a recommended Command-Line Data Processors GitHub Repositories repository?

Processes IP address streams via standard input to add geographic and provider metadata.

Why is jeroenjanssens/data-science-at-the-command-line a recommended Command-Line Data Processors GitHub Repositories repository?

Analyzes datasets using high-performance terminal tools for quick calculations and data manipulations.

11 Repos

Awesome GitHub RepositoriesCommand-Line Data Processors

High-performance utilities for manipulating, filtering, and analyzing structured datasets via a command-line interface.

Distinct from Rust-Implemented Tooling: Existing candidates focus on Rust language internals, compilers, or serialization libraries rather than a high-level CLI toolkit for data processing.

Explore 11 awesome GitHub repositories matching data & databases · Command-Line Data Processors. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

burntsushi/xsv
BurntSushi/xsv
10,750Auf GitHub ansehen
xsv is a suite of high-performance command-line utilities written in Rust for the analysis, manipulation, and statistical processing of large delimited datasets. It provides a toolkit for processing comma-separated value files through a command line interface. The project provides capabilities for statistical analysis, including the computation of column statistics, value frequencies, and descriptive metrics. It also includes data manipulation utilities for joining, slicing, sampling, and reformatting records. The toolkit covers a broad range of data operations including column selection, da
Provides a comprehensive suite of high-performance Rust-based command-line tools for processing large CSV datasets.
Rust
Auf GitHub ansehen10,750
dinedal/textql
dinedal/textql
9,109Auf GitHub ansehen
TextQL is a command line SQL query engine designed to execute relational queries directly against structured text files, such as CSV and TSV, without requiring a database import. It functions as a relational text file analyzer and a CSV processor that treats plain text files as virtual tables for filtering, joining, and aggregating data. The tool is built as a pipe-compatible data transformation utility, allowing it to process data from standard input and output formatted datasets. It enables relational joins across multiple files or directories within a single query to analyze relationships
Provides a high-performance CLI utility for manipulating and analyzing structured datasets via SQL.
Go
Auf GitHub ansehen9,109
missing-semester-cn/missing-semester-cn.github.io
missing-semester-cn/missing-semester-cn.github.io
7,311Auf GitHub ansehen
This is an open-source educational website that translates and localizes MIT's Missing Semester course, teaching practical computing skills for computer science students. The curriculum covers developer tooling, shell scripting, version control, security fundamentals, and open-source collaboration, with a focus on core computing skills including data processing pipelines, workflow automation, secure remote access, shell productivity, Vim editing, and Git version control. The project distinguishes itself by teaching command-line mastery, shell scripting, and automation to boost daily developer
Teaches generating simple plots from command-line data using tools like gnuplot.
Markdown
Auf GitHub ansehen7,311
osgeo/gdal
OSGeo/gdal
5,942Auf GitHub ansehen
GDAL ist eine MIT-lizenzierte Open-Source-Übersetzerbibliothek, die ein einheitliches abstraktes Datenmodell für das Lesen und Schreiben von Geodaten-Raster- und Vektordaten über Hunderte von Dateiformaten hinweg bereitstellt. Sie dient als grundlegende Geodaten-Übersetzungsbibliothek und ermöglicht den Zugriff auf diverse Geodatenformate über eine einzige, konsistente Schnittstelle. Die Bibliothek macht ihre Kernfunktionalität über Befehlszeilen-Dienstprogramme zugänglich, die es Benutzern ermöglichen, Geodaten zwischen Formaten zu übersetzen, zu konvertieren und zu verarbeiten. Eine Koordinatentransformations-Engine übernimmt Konvertierungen zwischen räumlichen Referenzsystemen, während ein Format-Treiber-Plugin-System die formatspezifische Lese- und Schreiblogik zur Laufzeit lädt. Die virtuelle Dateisystemschicht bietet einheitlichen I/O-Zugriff über lokale Dateien, HTTP, Cloud-Speicher und komprimierte Archive, und ein Raster-Block-Cache verwaltet das In-Memory-Tile-Caching, um I/O-Operationen zu reduzieren. GDAL unterstützt das Lesen und Schreiben sowohl von Raster- als auch von Vektor-Geodaten, wobei die Vektor-Feature-Iteration Features einzeln streamt, ohne ganze Datensätze in den Speicher zu laden. Das Projekt ermöglicht die geodatenbasierte Interoperabilität zwischen Formaten durch die Unterstützung des Datenaustauschs zwischen verschiedenen Geodaten-Software-Ökosystemen mittels ihrer umfangreichen Formatunterstützung.
Runs command-line utilities to translate and analyze geospatial raster and vector datasets.
C++
Auf GitHub ansehen5,942
andmarti1424/sc-im
andmarti1424/sc-im
5,638Auf GitHub ansehen
sc-im is a text user interface spreadsheet calculator and data manager. It provides a keyboard-driven environment for performing mathematical computations and managing data grids within a command line interface. The application is scriptable, supporting custom functions, event-driven triggers, and the integration of external scripts to automate calculation tasks. It further allows for the loading of external compiled modules at runtime to extend its mathematical capabilities. The system covers data management through row sorting, filtering, and subtotal calculations. It supports data interop
Offers a command-line interface for manipulating structured datasets through sorting, filtering, and multi-format I/O.
Cconsoleconsole-applicationncurses
Auf GitHub ansehen5,638
missing-semester/missing-semester
missing-semester/missing-semester
5,525Auf GitHub ansehen
The Missing Semester is a free, open-source educational curriculum designed to bridge the gap between theoretical computer science and the practical tooling every software engineer needs. Organized as a structured course, it covers Unix shell mastery, version control with Git, software debugging and profiling, system administration fundamentals, and computer security practices — the skills often left out of traditional degree programs. The project is maintained as a collaborative set of lecture notes, exercises, and guides that function as both a professional development tools course and a Uni
The Missing Semester teaches computing statistics and plotting data using command-line tools like bc, R, and gnuplot.
CSS
Auf GitHub ansehen5,525
red-data-tools/youplot
red-data-tools/YouPlot
4,761Auf GitHub ansehen
YouPlot ist ein Kommandozeilen-Plot-Dienstprogramm und ein Terminal-Datenvisualisierungstool, das verwendet wird, um statistische Plots und Diagramme direkt innerhalb einer Terminal-Schnittstelle mithilfe von Unicode-Zeichen zu rendern. Es fungiert als Unix-Pipeline-Plotter, der es Benutzern ermöglicht, numerische Daten zu visualisieren, ohne die Shell zu verlassen. Das Projekt arbeitet als Echtzeit-Datenvisualisierer und zeichnet Plots progressiv, während Daten in das System gestreamt werden. Es integriert sich in Kommandozeilen-Pipelines, indem es Daten von der Standardeingabe liest, um Echtzeit-Stream-Überwachung und Datenanalyse bereitzustellen. Das Tool deckt eine Vielzahl von Rendering-Funktionen ab, darunter Liniendiagramme, Streudiagramme, Histogramme, Balkendiagramme, Box-Plots und Dichtediagramme. Diese werden durch interne Systeme für dynamische Achsenskalierung und Koordinaten-Mapping unterstützt, um sich an die Terminal-Dimensionen anzupassen.
Generates statistical charts and graphs from tabular or streamed data using Unicode characters in the command line.
Rubyclicsvruby
Auf GitHub ansehen4,761
amperser/proselint
amperser/proselint
4,542Auf GitHub ansehen
Proselint ist ein Prose-Linter und regelbasierter Textanalysator, der darauf ausgelegt ist, stilistische Fehler, Klischees und Jargon in geschriebenen Texten zu identifizieren. Er scannt Dokumente anhand eines kuratierten Registers linguistischer und typografischer Regeln, um professionelle redaktionelle Standards zu wahren und die Schreibqualität zu verbessern. Das Projekt fungiert als Kommandozeilen-Textprozessor, programmierbare Analysebibliothek und Git-Pre-Commit-Hook. Seine modulare Architektur erlaubt es, die Kern-Engine in andere Anwendungen einzubetten, über eine REST-API bereitzustellen oder in Texteditoren zu integrieren. Das Tool unterstützt die rekursive Verzeichnisdurchsuchung für Stapelanalysen und akzeptiert Text über die Standardeingabe für den Einsatz in Kommandozeilen-Pipelines. Es bietet Konfigurationsoptionen zum Aktivieren oder Deaktivieren spezifischer linguistischer Prüfungen und kann Diagnoseergebnisse im strukturierten JSON-Format exportieren.
Functions as a terminal-based processor that accepts standard input and outputs structured linting results.
JavaScript
Auf GitHub ansehen4,542
zu1k/nali
zu1k/nali
4,089Auf GitHub ansehen
Nali is a suite of command-line tools for resolving IP addresses to geographic locations and identifying content delivery network providers using offline databases. It functions as an offline IP geolocation tool and database resolver that maps addresses to physical locations and network owners without requiring an active internet connection. The project distinguishes itself through an offline-first approach to network analysis, using pluggable database providers and local file metadata caching to ensure data privacy and independence from external APIs. It includes a dedicated utility for iden
Processes IP address streams via standard input to add geographic and provider metadata.
Gocdncdn-providerchunzhen
Auf GitHub ansehen4,089
jeroenjanssens/data-science-at-the-command-line
jeroenjanssens/data-science-at-the-command-line
3,952Auf GitHub ansehen
Dieses Projekt bietet ein Framework für die Durchführung von Data-Science-Aufgaben unter Verwendung von Befehlszeilentools und Skripten. Es konzentriert sich auf die Verarbeitung und Analyse von Text- und strukturierten Daten direkt im Terminal. Der Ansatz konzentriert sich auf die Verwendung von Unix-Pipes zum Streamen von Daten zwischen unabhängigen Prozessen und die Verwendung von Shell-Skripten zur Automatisierung repetitiver Data-Science-Workflows. Es nutzt Klartext-Austauschformate wie CSV, um Informationen zwischen verschiedenen Dienstprogrammen zu bewegen. Funktionsbereiche umfassen textbasierte Datenverarbeitung, Befehlszeilen-Datenanalyse und terminalbasierte Datenvisualisierung. Diese werden durch das Verketten diskreter ausführbarer Programme zu linearen Transformations-Pipelines erreicht.
Analyzes datasets using high-performance terminal tools for quick calculations and data manipulations.
HTMLbashbookbookdown
Auf GitHub ansehen3,952
medialab/xan
medialab/xan
3,752Auf GitHub ansehen
Xan is a command-line tool and data transformation engine for processing CSV, TSV, and JSONL datasets. It functions as a processor for compressed files, enabling random access and seeking within gzipped and Zstd files, and serves as a converter for specialized bioinformatics data formats. The tool handles large datasets without requiring full memory loads by utilizing stream-based processing. It provides capabilities for merging, sorting, and deduplicating massive files, as well as converting data between various tabular formats. The project covers a broad range of data wrangling and analysi
Provides high-performance command-line utilities for manipulating, filtering, and analyzing structured CSV, TSV, and JSONL datasets.
Rustclicsvrust
Auf GitHub ansehen3,752