2 个仓库
Capabilities to run shell commands for each row in a dataset and ingest the resulting output as new columns.
Distinct from Shell Command Interpolations: This is a data wrangling feature using the shell as a function, distinct from general shell automation or parameter interpolation.
Explore 2 awesome GitHub repositories matching data & databases · Shell-Augmented Data Processing. Refine with filters or upvote what's useful.
VisiData is a terminal-based interactive data analysis tool and browser designed for exploring, filtering, and sorting large tabular datasets. It functions as a structured data inspector that loads and flattens complex formats like JSON, XML, and PCAP into interactive sheets, as well as a terminal file manager for navigating directories and performing staged filesystem operations. The project distinguishes itself by rendering data visualizations, such as scatter plots and histograms, directly in the terminal using Unicode Braille characters. It provides a Python-based data wrangling environme
Augments tabular data by executing shell commands for each row and capturing the output as new columns.
该项目是一个针对 R 的高性能表格数据处理框架,旨在以内存效率和速度处理海量数据集。它提供了一种增强的数据结构,利用引用语义和就地修改来执行复杂的转换,而无需不必要的对象复制开销。 该库凭借其底层架构优化脱颖而出,包括多线程并行处理、基数排序和内存映射文件解析。通过将关键的数据操作和聚合例程卸载到编译后的 C 代码,它实现了对原本计算昂贵的任务的快速执行。其核心引擎支持高级关系操作,如非等值连接、滚动连接和重叠区间连接,以及用于加速重复数据访问的自动二级索引。 除了主要的处理功能外,该项目还提供了一套全面的数据生命周期管理工具。这包括具有自动类型检测的高速摄取和序列化工具,以及对时间序列分析和多维聚合的专门支持。该框架旨在实现可扩展性,允许用户在包含数十亿行的数据集上执行复杂的分组、过滤和重塑操作,同时保持系统稳定性和性能。
Executes command-line utilities on input files to filter or transform data before loading it into the environment.