You might be wondering how to make sure that the returned recommendations make any sense. The only way to really be sure about how effective recommendations are is to use A/B testing in a live system, with real users. For example, the A group receives a random item as a recommendation, while the B group receives an item that's recommended by our engine.
As this is not always possible (nor practical), we can get an estimate with offline statistical evaluation. One way to proceed is to use k-fold cross-validation, which was introduced in Chapter 1, Applied Machine Learning Quick Start. We partition a dataset into multiple sets; some are used to train our recommendation engine, and the rest are used to test how well it recommends ...