5Single Samples

Suppose we have a single sample. The questions we might want to answer are these:

  • what is the mean value?
  • is the mean value significantly different from current expectation or theory?
  • what is the level of uncertainty associated with our estimate of the mean value?

In order to be reasonably confident that our inferences are correct, we need to establish some facts about the distribution of the data:

  • are the values normally distributed or not?
  • are there outliers in the data?
  • if data were collected over a period of time, is there evidence for serial correlation?

Non-normality, outliers and serial correlation can all invalidate inferences made by standard parametric such as like Student's t test. It is much better in cases with non-normality and/or outliers to use a non-parametric technique such as Wilcoxon's signed-rank test. If there is serial correlation in the data, then you need to use time series analysis or mixed effects models.

Data Summary in the One-Sample Case

To see what is involved, read the data called y from the file called example.csv:

data <- read.csv("c:\\temp\\example.csv")
attach(data)
names(data)
[1] "y"

Summarizing the data could not be simpler. We use the built-in function called summary like this:

summary(y)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.904   2.241   2.414   2.419   2.568   2.984

This gives us six pieces of information about the vector called y. The smallest value is 1.904 (labelled Min. for minimum) and the largest value ...

Get Statistics: An Introduction Using R, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.