This project is a collection of deep learning tools for image classification and audio tagging, providing a repository of pre-trained model weights and architectures. It serves as a Keras model zoo that enables the immediate use of established neural networks for inference and transfer learning.
The library includes a music tagging framework that classifies audio recordings using convolutional recurrent neural networks and mel-spectrograms. For visual data, it provides implementations of architectures such as ResNet, VGG, and Xception, alongside a repository of weights trained on large datasets like ImageNet.
The project covers a broad range of capabilities including computer vision and audio analysis. It supports the generation of visual feature maps through layer-based feature extraction and provides workflows for adapting pre-existing networks to new datasets.