ScanCode Toolkit is a software composition analysis tool and scanning framework designed to identify open-source licenses and copyright statements in source code and binary files. It functions as an open-source license detector, a dependency vulnerability scanner, and a generator for standardized software bills of materials in SPDX and CycloneDX formats.
The project is built as a plugin-based scanning framework, allowing the integration of custom detection logic, specialized analyzers, and modified scanning behaviors at runtime. It distinguishes itself through the ability to produce formal legal compliance reports and attribution documents using customizable templates.
The toolkit covers several core capability areas, including the extraction of copyright declarations through regular expressions and the resolution of transitive dependency trees from package manifests. It provides a multi-format serialization pipeline to export scan data as JSON, YAML, HTML, CSV, SPDX, or CycloneDX. Additionally, it includes security analysis capabilities to cross-reference identified dependencies against vulnerability databases.