In addition to comparing CNN architectures with each other, the authors also report the accuracy of a feature-based approach. A standard bag-of-words pipeline was used to extract several types of features at all frames of the videos, followed by discretizing them using k-means vector quantization and accumulating words into histograms with spatial pyramid encoding and soft quantization.
Feature histogram baselines
Get Neural Network Programming with TensorFlow now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.