Cleverhans is an adversarial machine learning library and toolkit designed to generate adversarial examples, incorporate them into training loops, and benchmark the resilience of machine learning models. It provides a gradient-based attack framework for constructing both white-box and black-box attacks to identify model misclassifications.
The project includes capabilities for model robustness benchmarking, allowing users to evaluate and verify how models resist evasion attacks and malicious input perturbations. It also facilitates adversarial training to increase a model's resistance to perturbations by integrating malicious examples directly into the training process.
The library covers a broad surface of security and testing functions, including gradient-based perturbation, loss-function optimization, and black-box strategies such as substitute-model imitation. These tools are supported by a framework-agnostic backend and command line utilities for applying adversarial functionality to saved models.