mmaction2 is a PyTorch video understanding toolbox designed for training and evaluating deep learning models. It serves as a framework for action recognition, temporal localization, and spatio-temporal action detection, providing specialized tools for both pixel-based video analysis and skeleton-based action recognition. The project distinguishes itself through a modular architecture featuring registry-based component discovery and hierarchical, config-driven model assembly. It supports multi-modal feature fusion, integrating RGB frames, optical flow, and audio, and includes capabilities for
This github repo provides a Pytorch implementation of the Mixture-of-Embeddings-Experts model (MEE) 1.
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
This repo provides code from the HowTo100M paper. We provide implementation of: - Our training procedure on HowTo100M for learning a joint text-video embedding - Our evaluation code on MSR-VTT, YouCook2 and LSMDC for Text-to-Video retrieval - A pretrain model on HowTo100M - Feature extraction…