YouTube-BoundingBoxes is a large-scale data set of video URLs with densely-sampled high-quality single-object bounding box annotations.
The data set consists of approximately 380,000 15-20s video segments extracted from 240,000 different publicly visible YouTube videos, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera.
All these video segments were human-annotated with high precision classifications and bounding boxes at 1 frame per second.
Our goal with the public release of this dataset is to help advance the state of the art of machine learning for video understanding.