You are previewing Large-Scale Inference.
O'Reilly logo
Large-Scale Inference

Book Description

We live in a new age for statistical inference, where modern scientific technology such as microarrays and fMRI machines routinely produce thousands and sometimes millions of parallel data sets, each with its own estimation or testing problem. Doing thousands of problems at once is more than repeated application of classical methods. Taking an empirical Bayes approach, Bradley Efron, inventor of the bootstrap, shows how information accrues across problems in a way that combines Bayesian and frequentist ideas. Estimation, testing and prediction blend in this framework, producing opportunities for new methodologies of increased power. New difficulties also arise, easily leading to flawed inferences. This book takes a careful look at both the promise and pitfalls of large-scale statistical inference, with particular attention to false discovery rates, the most successful of the new statistical techniques. Emphasis is on the inferential ideas underlying technical developments, illustrated using a large number of real examples.

Table of Contents

  1. Cover
  2. Title
  3. Copyright
  4. Contents
  5. Prologue
  6. Acknowledgments
  7. 1 Empirical Bayes and the James–Stein Estimator
    1. 1.1 Bayes Rule and Multivariate Normal Estimation
    2. 1.2 Empirical Bayes Estimation
    3. 1.3 Estimating the Individual Components
    4. 1.4 Learning from the Experience of Others
    5. 1.5 Empirical Bayes Confidence Intervals
    6. Notes
  8. 2 Large-Scale Hypothesis Testing
    1. 2.1 A Microarray Example
    2. 2.2 Bayesian Approach
    3. 2.3 Empirical Bayes Estimates
    4. 2.4 Fdr(Z) as a Point Estimate
    5. 2.5 Independence versus Correlation
    6. 2.6 Learning from the Experience of Others II
    7. Notes
  9. 3 Significance Testing Algorithms
    1. 3.1 p-Values and z-Values
    2. 3.2 Adjusted p-Values and the FWER
    3. 3.3 Stepwise Algorithms
    4. 3.4 Permutation Algorithms
    5. 3.5 Other Control Criteria
    6. Notes
  10. 4 False Discovery Rate Control
    1. 4.1 True and False Discoveries
    2. 4.2 Benjamini and Hochberg’s FDR Control Algorithm
    3. 4.3 Empirical Bayes Interpretation
    4. 4.4 Is FDR Control “Hypothesis Testing”?
    5. 4.5 Variations on the Benjamini–Hochberg Algorithm
    6. 4.6 Fdr and Simultaneous Tests of Correlation
    7. Notes
  11. 5 Local False Discovery Rates
    1. 5.1 Estimating the Local False Discovery Rate
    2. 5.2 Poisson Regression Estimates for f (z)
    3. 5.3 Inference and Local False Discovery Rates
    4. 5.4 Power Diagnostics
    5. Notes
  12. 6 Theoretical, Permutation, and Empirical Null Distributions
    1. 6.1 Four Examples
    2. 6.2 Empirical Null Estimation
    3. 6.3 The MLE Method for Empirical Null Estimation
    4. 6.4 Why the Theoretical Null May Fail
    5. 6.5 Permutation Null Distributions
    6. Notes
  13. 7 Estimation Accuracy
    1. 7.1 Exact Covariance Formulas
    2. 7.2 Rms Approximations
    3. 7.3 Accuracy Calculations for General Statistics
    4. 7.4 The Non-Null Distribution of z-Values
    5. 7.5 Bootstrap Methods
    6. Notes
  14. 8 Correlation Questions
    1. 8.1 Row and Column Correlations
    2. 8.2 Estimating the Root Mean Square Correlation
    3. 8.3 Are a Set of Microarrays Independent of Each Other?
    4. 8.4 Multivariate Normal Calculations
    5. 8.5 Count Correlations
    6. Notes
  15. 9 Sets of Cases (Enrichment)
    1. 9.1 Randomization and Permutation
    2. 9.2 Efficient Choice of a Scoring Function
    3. 9.3 A Correlation Model
    4. 9.4 Local Averaging
    5. Notes
  16. 10 Combination, Relevance, and Comparability
    1. 10.1 The Multi-Class Model
    2. 10.2 Small Subclasses and Enrichment
    3. 10.3 Relevance
    4. 10.4 Are Separate Analyses Legitimate?
    5. 10.5 Comparability
    6. Notes
  17. 11 Prediction and Effect Size Estimation
    1. 11.1 A Simple Model
    2. 11.2 Bayes and Empirical Bayes Prediction Rules
    3. 11.3 Prediction and Local False Discovery Rates
    4. 11.4 Effect Size Estimation
    5. 11.5 The Missing Species Problem
    6. Notes
  18. Appendix A Exponential Families
    1. A.1 Multiparameter Exponential Families
    2. A.2 Lindsey’s Method
  19. Appendix B Data Sets and Programs
  20. References
  21. Index