Quantitative results

Sports-1M dataset test set results (200,000 videos and 4,000,000 clips) are summarized in the following table. The approach of multiple networks consistently and significantly outperforms the feature-based baseline. The feature-based approach computes visual words densely over the duration of the video and produces predictions that are based on the complete video-level feature vector, while the authors' networks only see 20 randomly sampled clips individually:

Results on the 200,000 videos of the Sports-1M test set. Hit@k values indicate the fraction of test samples that contained at least one of the ground truth labels ...

Get Neural Network Programming with TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.