1 مستودع
Techniques for processing datasets that exceed available system memory by utilizing disk-based storage.
Distinct from External Memory Block Compression: Broadens from just compression to include general sorting and deduplication using external memory
Explore 1 awesome GitHub repository matching data & databases · External Memory Processing. Refine with filters or upvote what's useful.
qsv is a high-performance command line toolkit for querying, transforming, and analyzing comma-separated value files. It functions as a data wrangling interface and a tabular data profiler, featuring a query engine capable of executing SQL statements and joins directly on flat files without requiring a database. The project is distinguished by its ability to process massive datasets that exceed available system memory. This is achieved through disk-based external memory processing, including multithreaded merge sorting, on-disk hash tables for deduplication, and lightweight file indexing for
Uses disk-based external memory to sort and deduplicate massive datasets that exceed available system RAM.