Unit 46Recollecting Statistical Measures

From the point of view of exploratory (non--inference-based) data science, statistics answers four important questions:

Where is the data?

The sample mean is the average of all observations:

images/_pragprog/svg-0005.png

You can use the sample mean to represent the whole sample when the distribution of data is close to normal (“bell-shaped”) and the standard deviation is low.

How broad is the data?

The sample standard deviation is the measure of spread and is calculated as the square root of the average square deviation from the sample mean:

High sx means that the data is widely spread.

How skewed is the data?

Sample skewness is a ...

Get Data Science Essentials in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.