This repo provides the PyTorch source code of our paper: VideoAgent: Long-form Video Understanding with Large Language Model as Agent (ECCV 2024). Check out project page here!
Features
Agent Memory Systems - Memory-augmented multimodal agent for video understanding tasks.