Appendix A. Data Sets

This appendix describes the data sets used throughout the book. All data sets are provided as SQL backup files. The data sets are generally available in each of the chapter-specific downloads on the companion website (www.wiley.com/go/data_mining_SQL_2008) as well as in the Appendix A downloads.

MovieClick Data Set

The MovieClick data set consists of almost 3200 results from a survey taken by Microsoft employees in November 2002. Questions were asked about their movie watching behavior, demographics, and favorite hobbies, movies, actors, and directors. Table A-1 shows the questions that were asked in the survey. The results of the survey were used to test and exercise the data mining capabilities of SQL Server 2005 while in development.

The survey resulted in eight tables: one for the main survey responses, and one each for questions 8, 9, 13, 14, 15, 26, and 27. The main table results in the case table for data mining analysis, and the additional tables become nested tables. Figure A-1 shows a data source view (DSV) representing the relationships between these tables.

Note

Some flaws in the survey methodology were discovered when the results were examined. Favorite movies, actors, and directors were selected from an alphabetical list. This resulted in an unexpected number of selections starting with the letter A, which is evident in the resulting mining models.

Table A.1. Movie Survey Questions

1. What is your preferred format for pre-recorded movies?

2. How often ...

Get Data Mining with Microsoft® SQL Server® 2008 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.