1 Repo
Executing external commands to transform or normalize file content before performing comparisons or hashing.
Distinguishing note: None of the candidates cover the use of external commands to normalize file content for deduplication purposes.
Explore 1 awesome GitHub repository matching operating systems & systems programming · Content Normalization Preprocessing. Refine with filters or upvote what's useful.
fclones is a command-line tool designed to locate identical files across a filesystem by comparing file sizes and cryptographic hashes. It functions as a parallel filesystem scanner and a deduplication utility that identifies duplicate files to reclaim disk space. The tool distinguishes itself through a persistent hash cache system that stores hashes and metadata on disk to accelerate repeated scans. It employs a multi-phase scanning process and device-aware parallel I/O, which adjusts thread pools based on whether the storage is an SSD or HDD to maximize throughput. Beyond discovery, the pr
Runs an external command on a copy of each file to normalize content before comparing for duplicates.