- Data Analysis with Open Source Tools
- Dedication
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- A Note Regarding Supplemental Files
- Preface
- 1. Introduction
- I. Graphics: Looking at Data
- II. Analytics: Modeling Data
- III. Computation: Mining Data
- IV. Applications: Using Data
- A. Programming Environments for Scientific Computation and Data Analysis
- B. Results from Calculus
- C. Working with Data
- D. About the Author
- Index
- About the Author
- Colophon
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- Copyright

**IN THIS
CHAPTER, WE LOOK AT SIMULATIONS AS A WAY TO UNDERSTAND DATA. IT MAY SEEM
STRANGE TO FIND** simulations included in a book on
data analysis: don’t simulations just generate even
*more* data that needs to be analyzed? Not
necessarily—as we will see, simulations in the form of
*resampling methods* provide a family of techniques
for extracting information from data. In addition, simulations can be
useful when developing and validating models, and in this way, they
facilitate our understanding of data. Finally, in the context of this
chapter we can take a brief look at a few other relevant topics, such as
discrete event simulations and queueing theory.

A technical comment: I assume that your programming environment includes a random-number generator—not only for uniformly distributed random numbers but also for other distributions (this is a pretty safe bet). I also assume that this random-number generator produces random numbers of sufficiently high quality. This is probably a reasonable assumption, but there’s no guarantee: although the theory of random-number generators is well understood, broken implementations apparently continue to ship. Most books on simulation methods will contain information on random-number generators—look there if you feel that you need more detail.

As a warm-up to demonstrate how simulations can help us analyze data, consider the following example. We are given a data set with the results of eight tosses of a coin: ...