More data is always beneficial

In several experiments conducted by Google researchers in the paper Revisiting Unreasonable Effectiveness of Data in Deep Learning Era, they constructed an internal dataset that contained 300 million observations, which is obviously much larger than ImageNet. They then trained several state-of-the-art architectures on this dataset, increasing the amount of data shown to the model from 10 million to 30 million, 100 million, and finally 300 million. In doing so, they showed that model performance increased linearly with the log of the number of observations used to train, showing us that more data always helps in the source domain.

But what about the target domain? We repeated the Google experiment using a few ...

Get Deep Learning Quick Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.