Cleanlab is a data-centric AI library and toolkit designed to improve machine learning model performance by detecting label errors and increasing overall dataset quality. It implements a confident learning framework that iteratively refines label noise estimates by comparing model predictions with estimated label probabilities to identify mislabeled examples. The project provides specialized utilities for active learning optimization, allowing for the selection of the most impactful examples for labeling or re-labeling. It also includes an outlier detection tool to identify atypical data poin
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances (NeurIPS 2020)