Appendix A

Data Summarization and Visualization

Here we present a very brief review of methods for summarizing and visualizing data. For deeper coverage, see Discovering Statistics, Second Edition, by Daniel Larose (W.H. Freeman, second edition, 2013).

Part 1: Summarization 1: Building Blocks Of Data Analysis

  • Descriptive statistics refers to methods for summarizing and organizing the information in a data set.

    Consider Table A.1, which we will use to illustrate some statistical concepts.

    Table A.1 Characteristics of 10 loan applicants

    Applicant Marital Status Mortgage Income ($) Rank Year Risk
    1 Single Y 38,000 2 2009 Good
    2 Married Y 32,000 7 2010 Good
    3 Other N 25,000 9 2011 Good
    4 Other N 36,000 3 2009 Good
    5 Other Y 33,000 4 2010 Good
    6 Other N 24,000 10 2008 Bad
    7 Married Y 25,100 8 2010 Good
    8 Married Y 48,000 1 2007 Good
    9 Married Y 32,100 6 2009 Bad
    10 Married Y 32,200 5 2010 Good
  • The entities for which information is collected are called the elements. In Table A.1, the elements are the 10 applicants. Elements are also called cases or subjects.
  • A variable is a characteristic of an element, which takes on different values for different elements. The variables in Table A.1 are marital status, mortgage, income, rank, year, and risk. Variables are also called attributes.
  • The set of variable values for a particular element is an observation. Observations are also called records. The observation for Applicant 2 is:
Applicant ...

Get Data Mining and Predictive Analytics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.