←BackMinishLab/semhash0Copy as MarkdownView on GitHub↗936 stars·57 forks·Python·MIT·0 viewsminish.ai/packages/semhash/introduction↗SemhashFeaturesData Curation and Filtering - Fuzzy deduplication tool using fast embedding generation.LLM Development Tools - Library for near-deduplication and decontamination of text datasets.Training Datasets - Listed in the “Training Datasets” section of the Llm Course awesome list.