Chapter 5

Basic Statistical Ideas

In This Chapter

arrow Understanding the basic types of measurement

arrow Learning the fundamental statistical measures of central tendency

arrow Understanding hypothesis testing

This chapter introduces some of the most important statistical concepts you need to get started with big data. It also introduces several summary measures that represent the key properties of a dataset.

A dataset may consist of the elements of a population of interest, or it may take the form of a sample. A sample is a subset of a population; it’s chosen in such a way that it accurately represents the underlying population. For most empirical applications, sample data is used instead of population data due to the time and cost required to analyze an entire population.

warning When taking random samples, it is critical that the samples be really random. And that involves using a random number generator. It also involves making sure that each observation in your dataset is equally likely to be selected. Just taking the first 10 percent of the observations or selecting every tenth observation until ...

Get Statistics for Big Data For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.