APPENDIX FA DATA MINING EXAMPLE

INTRODUCTION

This example uses just one statistical technique, the a-priori algorithm. This algorithm is used to find association rules in data. It uses data that appears more than a certain percentage of the time, the ‘support threshold’.

THE SCENARIO

A supermarket chain wishes to determine whether customers opt for either ‘own-label’ products or branded products.

Raw data is available for each customer’s purchases, recording the quantities of each product bought during each supermarket visit. The data from 500 such visits will be investigated.

The support threshold is 15 per cent.

Step 1

The raw data is scanned to determine the frequency of each product category bought during a visit. The results satisfying ...

Get Principles of Data Management, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.