Book description
Today, interpreting data is a critical decisionmaking factor for businesses and organizations. If your job requires you to manage and analyze all kinds of data, turn to Head First Data Analysis, where you'll quickly learn how to collect and organize data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others. Whether you're a product developer researching the market viability of a new product or service, a marketing manager gauging or predicting the effectiveness of a campaign, a salesperson who needs data to support product presentations, or a lone entrepreneur responsible for all of these dataintensive functions and more, the unique approach in Head First Data Analysis is by far the most efficient way to learn what you need to know to convert raw data into a vital business tool. You'll learn how to:
Determine which data sources to use for collecting information
Assess data quality and distinguish signal from noise
Build basic data models to illuminate patterns, and assimilate new information into the models
Cope with ambiguous information
Design experiments to test hypotheses and draw conclusions
Use segmentation to organize your data within discrete market groups
Visualize data distributions to reveal new relationships and persuade others
Predict the future with sampling and probability models
Clean your data to make it useful
Communicate the results of your analysis to your audience
Using the latest research in cognitive science and learning theory to craft a multisensory learning experience, Head First Data Analysis uses a visually rich format designed for the way your brain works, not a textheavy approach that puts you to sleep.
Table of Contents
 Head First Data Analysis
 Dedication
 A Note Regarding Supplemental Files
 Advance Praise for Head First Data Analysis
 Praise for other Head First books
 Author of Head First Data Analysis
 How to Use This Book: Intro

1. Introduction to Data Analysis: Break it down
 Acme Cosmetics needs your help
 The CEO wants data analysis to help increase sales
 Data analysis is careful thinking about evidence
 Define the problem
 Your client will help you define your problem
 Acme’s CEO has some feedback for you
 Break the problem and data into smaller pieces
 Now take another look at what you know
 Evaluate the pieces
 Analysis begins when you insert yourself
 Make a recommendation
 Your report is ready
 The CEO likes your work
 An article just came across the wire
 You let the CEO’s beliefs take you down the wrong path
 Your assumptions and beliefs about the world are your mental model
 Your statistical model depends on your mental model
 Mental models should always include what you don’t know
 The CEO tells you what he doesn’t know
 Acme just sent you a huge list of raw data
 Time to drill further into the data
 General American Wholesalers confirms your impression
 Here’s what you did
 Your analysis led your client to a brilliant decision

2. Experiments: Test your theories
 It’s a coffee recession!
 The Starbuzz board meeting is in three months
 The Starbuzz Survey
 Always use the method of comparison
 Comparisons are key for observational data
 Could value perception be causing the revenue decline?
 A typical customer’s thinking
 Observational studies are full of confounders
 How location might be confounding your results
 Manage confounders by breaking the data into chunks
 It’s worse than we thought!
 You need an experiment to say which strategy will work best
 The Starbuzz CEO is in a big hurry
 Starbuzz drops its prices
 One month later...
 Control groups give you a baseline
 Not getting fired 101
 Let’s experiment for real!
 One month later...
 Confounders also plague experiments
 Avoid confounders by selecting groups carefully
 Randomization selects similar groups
 Your experiment is ready to go
 The results are in
 Starbuzz has an empirically tested sales strategy

3. Optimization: Take it to the max
 You’re now in the bath toy game
 Constraints limit the variables you control
 Decision variables are things you can control
 You have an optimization problem
 Find your objective with the objective function
 Your objective function
 Show product mixes with your other constraints
 Plot multiple constraints on the same chart
 Your good options are all in the feasible region
 Your new constraint changed the feasible region
 Your spreadsheet does optimization
 Solver crunched your optimization problem in a snap
 Profits fell through the floor
 Your model only describes what you put into it
 Calibrate your assumptions to your analytical objectives
 Watch out for negatively linked variables
 Your new plan is working like a charm
 Your assumptions are based on an everchanging reality

4. Data Visualization: Pictures make you smarter
 New Army needs to optimize their website
 The results are in, but the information designer is out
 The last information designer submitted these three infographics
 What data is behind the visualizations?
 Show the data!
 Here’s some unsolicited advice from the last designer
 Too much data is never your problem
 Making the data pretty isn’t your problem either
 Data visualization is all about making the right comparisons
 Your visualization is already more useful than the rejected ones
 Use scatterplots to explore causes
 The best visualizations are highly multivariate
 Show more variables by looking at charts together
 The visualization is great, but the web guru’s not satisfied yet
 Good visual designs help you think about causes
 The experiment designers weigh in
 The experiment designers have some hypotheses of their own
 The client is pleased with your work
 Orders are coming in from everywhere!

5. Hypothesis Testing: Say it ain’t so
 Gimme some skin...
 When do we start making new phone skins?
 PodPhone doesn’t want you to predict their next move
 Here’s everything we know
 ElectroSkinny’s analysis does fit the data
 ElectroSkinny obtained this confidential strategy memo
 Variables can be negatively or positively linked
 Causes in the real world are networked, not linear
 Hypothesize PodPhone’s options
 You have what you need to run a hypothesis test
 Falsification is the heart of hypothesis testing
 Diagnosticity helps you find the hypothesis with the least disconfirmation
 You can’t rule out all the hypotheses, but you can say which is strongest
 You just got a picture message...
 It’s a launch!

6. Bayesian Statistics: Get past first base
 The doctor has disturbing news
 Let’s take the accuracy analysis one claim at a time
 How common is lizard flu really?
 You’ve been counting false positives
 All these terms describe conditional probabilities
 You need to count
 1 percent of people have lizard flu
 Your chances of having lizard flu are still pretty low
 Do complex probabilistic thinking with simple whole numbers
 Bayes’ rule manages your base rates when you get new data
 You can use Bayes’ rule over and over
 Your second test result is negative
 The new test has different accuracy statistics
 New information can change your base rate
 What a relief!

7. Subjective Probabilities: Numerical belief
 Backwater Investments needs your help
 Their analysts are at each other’s throats
 Subjective probabilities describe expert beliefs
 Subjective probabilities might show no real disagreement after all
 The analysts responded with their subjective probabilities
 The CEO doesn’t see what you’re up to
 The CEO loves your work
 The standard deviation measures how far points are from the average
 You were totally blindsided by this news
 Bayes’ rule is great for revising subjective probabilities
 The CEO knows exactly what to do with this new information
 Russian stock owners rejoice!

8. Heuristics: Analyze like a human
 LitterGitters submitted their report to the city council
 The LitterGitters have really cleaned up this town
 The LitterGitters have been measuring their campaign’s effectiveness
 The mandate is to reduce the tonnage of litter
 Tonnage is unfeasible to measure
 Give people a hard question, and they’ll answer an easier one instead
 Littering in Dataville is a complex system
 You can’t build and implement a unified littermeasuring model
 Heuristics are a middle ground between going with your gut and optimization
 Use a fast and frugal tree
 Is there a simpler way to assess LitterGitters’ success?
 Stereotypes are heuristics
 Your analysis is ready to present
 Looks like your analysis impressed the city council members

9. Histograms: The shape of numbers
 Your annual review is coming up
 Going for more cash could play out in a bunch of different ways
 Here’s some data on raises
 Histograms show frequencies of groups of numbers
 Gaps between bars in a histogram mean gaps among the data points
 Install and run R
 Load data into R
 R creates beautiful histograms
 Make histograms from subsets of your data
 Negotiation pays
 What will negotiation mean for you?

10. Regression: Prediction
 What are you going to do with all this money?
 An analysis that tells people what to ask for could be huge
 Behold... the Raise Reckoner!
 Inside the algorithm will be a method to predict raises
 Scatterplots compare two variables
 A line could tell your clients where to aim
 Predict values in each strip with the graph of averages
 The regression line predicts what raises people will receive
 The line is useful if your data shows a linear correlation
 You need an equation to make your predictions precise
 Tell R to create a regression object
 The regression equation goes hand in hand with your scatterplot
 The regression equation is the Raise Reckoner algorithm
 Your raise predictor didn’t work out as planned...

11. Error: Err Well
 Your clients are pretty ticked off
 What did your raise prediction algorithm do?
 The segments of customers
 The guy who asked for 25% went outside the model
 How to handle the client who wants a prediction outside the data range
 The guy who got fired because of extrapolation has cooled off
 You’ve only solved part of the problem
 What does the data for the screwy outcomes look like?
 Chance errors are deviations from what your model predicts
 Error is good for you and your client
 Specify error quantitatively
 Quantify your residual distribution with Root Mean Squared error
 Your model in R already knows the R.M.S. error
 R’s summary of your linear model shows your R.M.S. error
 Segmentation is all about managing error
 Good regressions balance explanation and prediction
 Your segmented models manage error better than the original model
 Your clients are returning in droves

12. Relational Databases: Can you relate?
 The Dataville Dispatch wants to analyze sales
 Here’s the data they keep to track their operations
 You need to know how the data tables relate to each other
 A database is a collection of data with wellspecified relations to each other
 Trace a path through the relations to make the comparison you need
 Create a spreadsheet that goes across that path
 Your summary ties article count and sales together
 Looks like your scatterplot is going over really well
 Copying and pasting all that data was a pain
 Relational databases manage relations for you
 Dataville Dispatch built an RDBMS with your relationship diagram
 Dataville Dispatch extracted your data using the SQL language
 Comparison possibilities are endless if your data is in a RDBMS
 You’re on the cover

13. Cleaning Data: Impose order
 Just got a client list from a defunct competitor
 The dirty secret of data analysis
 Head First Head Hunters wants the list for their sales team
 Cleaning messy data is all about preparation
 Once you’re organized, you can fix the data itself
 Use the # sign as a delimiter
 Excel split your data into columns using the delimiter
 Use SUBSTITUTE to replace the carat character
 You cleaned up all the first names
 The last name pattern is too complex for SUBSTITUTE
 Handle complex patterns with nested text formulas
 R can use regular expressions to crunch complex data patterns
 The sub command fixed your last names
 Now you can ship the data to your client
 Maybe you’re not quite done yet...
 Sort your data to show duplicate values together
 The data is probably from a relational database
 Remove duplicate names
 You created nice, clean, unique records
 Head First Head Hunters is recruiting like gangbusters!
 Leaving town...
 It’s been great having you here in Dataville!
 A. Leftovers: The Top Ten Things (we didn’t cover)
 B. Install R: Start R up!
 C. Install Excel Analysis Tools: The ToolPak
 Index
 About the Author
 Copyright