11 个仓库
High-performance utilities for manipulating, filtering, and analyzing structured datasets via a command-line interface.
Distinct from Rust-Implemented Tooling: Existing candidates focus on Rust language internals, compilers, or serialization libraries rather than a high-level CLI toolkit for data processing.
Explore 11 awesome GitHub repositories matching data & databases · Command-Line Data Processors. Refine with filters or upvote what's useful.
xsv is a suite of high-performance command-line utilities written in Rust for the analysis, manipulation, and statistical processing of large delimited datasets. It provides a toolkit for processing comma-separated value files through a command line interface. The project provides capabilities for statistical analysis, including the computation of column statistics, value frequencies, and descriptive metrics. It also includes data manipulation utilities for joining, slicing, sampling, and reformatting records. The toolkit covers a broad range of data operations including column selection, da
Provides a comprehensive suite of high-performance Rust-based command-line tools for processing large CSV datasets.
TextQL is a command line SQL query engine designed to execute relational queries directly against structured text files, such as CSV and TSV, without requiring a database import. It functions as a relational text file analyzer and a CSV processor that treats plain text files as virtual tables for filtering, joining, and aggregating data. The tool is built as a pipe-compatible data transformation utility, allowing it to process data from standard input and output formatted datasets. It enables relational joins across multiple files or directories within a single query to analyze relationships
Provides a high-performance CLI utility for manipulating and analyzing structured datasets via SQL.
This is an open-source educational website that translates and localizes MIT's Missing Semester course, teaching practical computing skills for computer science students. The curriculum covers developer tooling, shell scripting, version control, security fundamentals, and open-source collaboration, with a focus on core computing skills including data processing pipelines, workflow automation, secure remote access, shell productivity, Vim editing, and Git version control. The project distinguishes itself by teaching command-line mastery, shell scripting, and automation to boost daily developer
Teaches generating simple plots from command-line data using tools like gnuplot.
GDAL 是一个 MIT 许可的开源转换器库,它提供了一个统一的抽象数据模型,用于读取和写入数百种文件格式的地理空间栅格和矢量数据。它作为一个基础地理空间数据转换库,通过单一、一致的接口实现对多种地理空间数据格式的访问。 该库通过命令行实用程序公开其核心功能,允许用户在不同格式之间转换、处理地理空间数据。坐标转换引擎处理空间参考系统之间的转换,而格式驱动程序插件系统在运行时加载特定格式的读写逻辑。虚拟文件系统层提供跨本地文件、HTTP、云存储和压缩归档的统一 I/O 访问,栅格块缓存管理内存中的切片缓存以减少 I/O 操作。 GDAL 支持读取和写入栅格和矢量地理空间数据,其矢量特征迭代器可以单独流式传输特征,而无需将整个数据集加载到内存中。该项目通过其广泛的格式支持,实现了不同地理空间软件生态系统之间的数据交换,从而促进了地理空间互操作性。
Runs command-line utilities to translate and analyze geospatial raster and vector datasets.
sc-im 是一个文本用户界面电子表格计算器和数据管理器。它提供了一个键盘驱动的环境,用于在命令行界面内执行数学计算和管理数据网格。 该应用可脚本化,支持自定义函数、事件驱动的触发器以及集成外部脚本以自动化计算任务。它还允许在运行时加载外部编译模块以扩展其数学功能。 该系统通过行排序、过滤和分类汇总计算来涵盖数据管理。它通过导入和导出 CSV、TAB、Markdown 和 XLSX 格式支持数据互操作性。其他功能包括用于无头数据处理的非交互式执行模式,以及将数据发送到外部绘图软件进行可视化的能力。
Offers a command-line interface for manipulating structured datasets through sorting, filtering, and multi-format I/O.
The Missing Semester is a free, open-source educational curriculum designed to bridge the gap between theoretical computer science and the practical tooling every software engineer needs. Organized as a structured course, it covers Unix shell mastery, version control with Git, software debugging and profiling, system administration fundamentals, and computer security practices — the skills often left out of traditional degree programs. The project is maintained as a collaborative set of lecture notes, exercises, and guides that function as both a professional development tools course and a Uni
The Missing Semester teaches computing statistics and plotting data using command-line tools like bc, R, and gnuplot.
YouPlot is a command line plotting utility and terminal data visualization tool used to render statistical plots and charts directly within a terminal interface using Unicode characters. It functions as a Unix pipeline plotter, allowing users to visualize numerical data without leaving the shell. The project operates as a real-time data visualizer, drawing plots progressively as data streams into the system. It integrates into command line pipelines by reading data from standard input to provide real-time stream monitoring and data analysis. The tool covers a variety of rendering capabilitie
Generates statistical charts and graphs from tabular or streamed data using Unicode characters in the command line.
Proselint 是一个散文语法检查器和基于规则的文本分析器,旨在识别书面文本中的文体错误、陈词滥调和行话。它根据预定义的语言和排版规则注册表扫描文档,以保持专业的编辑标准并提高写作质量。 该项目可作为命令行文本处理器、可编程分析库和 git 预提交钩子使用。其模块化架构允许核心引擎嵌入到其他应用程序中,通过 REST API 公开,或集成到文本编辑器中。 该工具支持递归目录遍历进行批量分析,并接受标准输入中的文本以用于命令行管道。它提供了启用或禁用特定语言检查的配置选项,并能以结构化的 JSON 格式导出诊断结果。
Functions as a terminal-based processor that accepts standard input and outputs structured linting results.
Nali 是一套命令行工具,用于使用离线数据库将 IP 地址解析为地理位置并识别内容分发网络提供商。它作为一个离线 IP 地理定位工具和数据库解析器,无需活动互联网连接即可将地址映射到物理位置和网络所有者。 该项目通过离线优先的网络分析方法脱颖而出,使用可插拔的数据库提供商和本地文件元数据缓存,以确保数据隐私并独立于外部 API。它包括一个用于识别内容分发网络提供商的专用实用程序,以及一个用于管理和更新本地地理数据文件的系统。 该工具集支持交互式和自动化工作流,具有用于顺序手动查找的读取-求值-输出循环(REPL),以及一个从标准输入读取 IP 地址流的元数据处理器。这允许将地理和提供商元数据集成到 shell 管道中。 数据存储和配置文件的配置通过系统环境变量进行管理。
Processes IP address streams via standard input to add geographic and provider metadata.
该项目提供了一个使用命令行工具和脚本执行数据科学任务的框架。它专注于直接在终端内处理和分析文本及结构化数据。 其方法核心在于使用 Unix 管道在独立进程之间传输数据,并利用 Shell 脚本自动化重复的数据科学工作流。它使用 CSV 等纯文本交换格式在不同工具之间移动信息。 功能领域包括基于文本的数据处理、命令行数据分析和基于终端的数据可视化。这些功能通过将离散的可执行程序链接成线性转换管道来实现。
Analyzes datasets using high-performance terminal tools for quick calculations and data manipulations.
Xan is a command-line tool and data transformation engine for processing CSV, TSV, and JSONL datasets. It functions as a processor for compressed files, enabling random access and seeking within gzipped and Zstd files, and serves as a converter for specialized bioinformatics data formats. The tool handles large datasets without requiring full memory loads by utilizing stream-based processing. It provides capabilities for merging, sorting, and deduplicating massive files, as well as converting data between various tabular formats. The project covers a broad range of data wrangling and analysi
Provides high-performance command-line utilities for manipulating, filtering, and analyzing structured CSV, TSV, and JSONL datasets.