You are previewing Probability Theory.
O'Reilly logo
Probability Theory

Book Description

The standard rules of probability can be interpreted as uniquely valid principles in logic. In this book, E. T. Jaynes dispels the imaginary distinction between 'probability theory' and 'statistical inference', leaving a logical unity and simplicity, which provides greater technical power and flexibility in applications. This book goes beyond the conventional mathematics of probability theory, viewing the subject in a wider context. New results are discussed, along with applications of probability theory to a wide variety of problems in physics, mathematics, economics, chemistry and biology. It contains many exercises and problems, and is suitable for use as a textbook on graduate level courses involving data analysis. The material is aimed at readers who are already familiar with applied mathematics at an advanced undergraduate level or higher. The book will be of interest to scientists working in any area where inference from incomplete information is necessary.

Table of Contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright
  5. Dedication
  6. Contents
  7. Editor’s foreword
  8. Preface
  9. Part I: Principles and elementary applications
    1. 1. Plausible reasoning
      1. 1.1 Deductive and plausible reasoning
      2. 1.2 Analogies with physical theories
      3. 1.3 The thinking computer
      4. 1.4 Introducing the robot
      5. 1.5 Boolean algebra
      6. 1.6 Adequate sets of operations
      7. 1.7 The basic desiderata
      8. 1.8 Comments
        1. 1.8.1 Common language vs. formal logic
        2. 1.8.2 Nitpicking
    2. 2. The quantitative rules
      1. 2.1 The product rule
      2. 2.2 The sum rule
      3. 2.3 Qualitative properties
      4. 2.4 Numerical values
      5. 2.5 Notation and finite-sets policy
      6. 2.6 Comments
        1. 2.6.1 ‘Subjective’ vs. ‘objective’
        2. 2.6.2 Gödel’s theorem
        3. 2.6.3 Venn diagrams
        4. 2.6.4 The ‘Kolmogorov axioms’
    3. 3. Elementary sampling theory
      1. 3.1 Sampling without replacement
      2. 3.2 Logic vs. propensity
      3. 3.3 Reasoning from less precise information
      4. 3.4 Expectations
      5. 3.5 Other forms and extensions
      6. 3.6 Probability as a mathematical tool
      7. 3.7 The binomial distribution
      8. 3.8 Sampling with replacement
        1. 3.8.1 Digression: a sermon on reality vs. models
      9. 3.9 Correction for correlations
      10. 3.10 Simplification
      11. 3.11 Comments
        1. 3.11.1 A look ahead
    4. 4. Elementary hypothesis testing
      1. 4.1 Prior probabilities
      2. 4.2 Testing binary hypotheses with binary data
      3. 4.3 Nonextensibility beyond the binary case
      4. 4.4 Multiple hypothesis testing
        1. 4.4.1 Digression on another derivation
      5. 4.5 Continuous probability distribution functions
      6. 4.6 Testing an infinite number of hypotheses
        1. 4.6.1 Historical digression
      7. 4.7 Simple and compound (or composite) hypotheses
      8. 4.8 Comments
        1. 4.8.1 Etymology
        2. 4.8.2 What have we accomplished?
    5. 5. Queer uses for probability theory
      1. 5.1 Extrasensory perception
      2. 5.2 Mrs Stewart’s telepathic powers
        1. 5.2.1 Digression on the normal approximation
        2. 5.2.2 Back to Mrs Stewart
      3. 5.3 Converging and diverging views
      4. 5.4 Visual perception – evolution into Bayesianity?
      5. 5.5 The discovery of Neptune
        1. 5.5.1 Digression on alternative hypotheses
        2. 5.5.2 Back to Newton
      6. 5.6 Horse racing and weather forecasting
        1. 5.6.1 Discussion
      7. 5.7 Paradoxes of intuition
      8. 5.8 Bayesian jurisprudence
      9. 5.9 Comments
        1. 5.9.1 What is queer?
    6. 6. Elementary parameter estimation
      1. 6.1 Inversion of the urn distributions
      2. 6.2 Both N and R unknown
      3. 6.3 Uniform prior
      4. 6.4 Predictive distributions
      5. 6.5 Truncated uniform priors
      6. 6.6 A concave prior
      7. 6.7 The binomial monkey prior
      8. 6.8 Metamorphosis into continuous parameter estimation
      9. 6.9 Estimation with a binomial sampling distribution
        1. 6.9.1 Digression on optional stopping
      10. 6.10 Compound estimation problems
      11. 6.11 A simple Bayesian estimate: quantitative prior information
        1. 6.11.1 From posterior distribution function to estimate
      12. 6.12 Effects of qualitative prior information
      13. 6.13 Choice of a prior
      14. 6.14 On with the calculation!
      15. 6.15 The Jeffreys prior
      16. 6.16 The point of it all
      17. 6.17 Interval estimation
      18. 6.18 Calculation of variance
      19. 6.19 Generalization and asymptotic forms
      20. 6.20 Rectangular sampling distribution
      21. 6.21 Small samples
      22. 6.22 Mathematical trickery
      23. 6.23 Comments
    7. 7. The central, Gaussian or normal distribution
      1. 7.1 The gravitating phenomenon
      2. 7.2 The Herschel–Maxwell derivation
      3. 7.3 The Gauss derivation
      4. 7.4 Historical importance of Gauss’s result
      5. 7.5 The Landon derivation
      6. 7.6 Why the ubiquitous use of Gaussian distributions?
      7. 7.7 Why the ubiquitous success?
      8. 7.8 What estimator should we use?
      9. 7.9 Error cancellation
      10. 7.10 The near irrelevance of sampling frequency distributions
      11. 7.11 The remarkable efficiency of information transfer
      12. 7.12 Other sampling distributions
      13. 7.13 Nuisance parameters as safety devices
      14. 7.14 More general properties
      15. 7.15 Convolution of Gaussians
      16. 7.16 The central limit theorem
      17. 7.17 Accuracy of computations
      18. 7.18 Galton’s discovery
      19. 7.19 Population dynamics and Darwinian evolution
      20. 7.20 Evolution of humming-birds and flowers
      21. 7.21 Application to economics
      22. 7.22 The great inequality of Jupiter and Saturn
      23. 7.23 Resolution of distributions into Gaussians
      24. 7.24 Hermite polynomial solutions
      25. 7.25 Fourier transform relations
      26. 7.26 There is hope after all
      27. 7.27 Comments
        1. 7.27.1 Terminology again
    8. 8. Sufficiency, ancillarity, and all that
      1. 8.1 Sufficiency
      2. 8.2 Fisher sufficiency
        1. 8.2.1 Examples
        2. 8.2.2 The Blackwell–Rao theorem
      3. 8.3 Generalized sufficiency
      4. 8.4 Sufficiency plus nuisance parameters
      5. 8.5 The likelihood principle
      6. 8.6 Ancillarity
      7. 8.7 Generalized ancillary information
      8. 8.8 Asymptotic likelihood: Fisher information
      9. 8.9 Combining evidence from different sources
      10. 8.10 Pooling the data
        1. 8.10.1 Fine-grained propositions
      11. 8.11 Sam’s broken thermometer
      12. 8.12 Comments
        1. 8.12.1 The fallacy of sample re-use
        2. 8.12.2 A folk theorem
        3. 8.12.3 Effect of prior information
        4. 8.12.4 Clever tricks and gamesmanship
    9. 9. Repetitive experiments: probability and frequency
      1. 9.1 Physical experiments
      2. 9.2 The poorly informed robot
      3. 9.3 Induction
      4. 9.4 Are there general inductive rules?
      5. 9.5 Multiplicity factors
      6. 9.6 Partition function algorithms
        1. 9.6.1 Solution by inspection
      7. 9.7 Entropy algorithms
      8. 9.8 Another way of looking at it
      9. 9.9 Entropy maximization
      10. 9.10 Probability and frequency
      11. 9.11 Significance tests
        1. 9.11.1 Implied alternatives
      12. 9.12 Comparison of psi and chi-squared
      13. 9.13 The chi-squared test
      14. 9.14 Generalization
      15. 9.15 Halley’s mortality table
      16. 9.16 Comments
        1. 9.16.1 The irrationalists
        2. 9.16.2 Superstitions
    10. 10. Physics of ‘random experiments’
      1. 10.1 An interesting correlation
      2. 10.2 Historical background
      3. 10.3 How to cheat at coin and die tossing
        1. 10.3.1 Experimental evidence
      4. 10.4 Bridge hands
      5. 10.5 General random experiments
      6. 10.6 Induction revisited
      7. 10.7 But what about quantum theory?
      8. 10.8 Mechanics under the clouds
      9. 10.9 More on coins and symmetry
      10. 10.10 Independence of tosses
      11. 10.11 The arrogance of the uninformed
  10. Part II: Advanced applications
    1. 11. Discrete prior probabilities: the entropy principle
      1. 11.1 A new kind of prior information
      2. 11.2 Minimum ∑ p2i
      3. 11.3 Entropy: Shannon’s theorem
      4. 11.4 The Wallis derivation
      5. 11.5 An example
      6. 11.6 Generalization: a more rigorous proof
      7. 11.7 Formal properties of maximum entropy distributions
      8. 11.8 Conceptual problems – frequency correspondence
      9. 11.9 Comments
    2. 12. Ignorance priors and transformation groups
      1. 12.1 What are we trying to do?
      2. 12.2 Ignorance priors
      3. 12.3 Continuous distributions
      4. 12.4 Transformation groups
        1. 12.4.1 Location and scale parameters
        2. 12.4.2 A Poisson rate
        3. 12.4.3 Unknown probability for success
        4. 12.4.4 Bertrand’s problem
      5. 12.5 Comments
    3. 13. Decision theory, historical background
      1. 13.1 Inference vs. decision
      2. 13.2 Daniel Bernoulli’s suggestion
      3. 13.3 The rationale of insurance
      4. 13.4 Entropy and utility
      5. 13.5 The honest weatherman
      6. 13.6 Reactions to Daniel Bernoulli and Laplace
      7. 13.7 Wald’s decision theory
      8. 13.8 Parameter estimation for minimum loss
      9. 13.9 Reformulation of the problem
      10. 13.10 Effect of varying loss functions
      11. 13.11 General decision theory
      12. 13.12 Comments
        1. 13.12.1 ‘Objectivity’ of decision theory
        2. 13.12.2 Loss functions in human society
        3. 13.12.3 A new look at the Jeffreys prior
        4. 13.12.4 Decision theory is not fundamental
        5. 13.12.5 Another dimension?
    4. 14. Simple applications of decision theory
      1. 14.1 Definitions and preliminaries
      2. 14.2 Sufficiency and information
      3. 14.3 Loss functions and criteria of optimum performance
      4. 14.4 A discrete example
      5. 14.5 How would our robot do it?
      6. 14.6 Historical remarks
        1. 14.6.1 The classical matched filter
      7. 14.7 The widget problem
        1. 14.7.1 Solution for Stage 2
        2. 14.7.2 Solution for Stage 3
        3. 14.7.3 Solution for Stage 4
      8. 14.8 Comments
    5. 15. Paradoxes of probability theory
      1. 15.1 How do paradoxes survive and grow?
      2. 15.2 Summing a series the easy way
      3. 15.3 Nonconglomerability
      4. 15.4 The tumbling tetrahedra
      5. 15.5 Solution for a finite number of tosses
      6. 15.6 Finite vs. countable additivity
      7. 15.7 The Borel–Kolmogorov paradox
      8. 15.8 The marginalization paradox
        1. 15.8.1 On to greater disasters
      9. 15.9 Discussion
        1. 15.9.1 The DSZ Example #5
        2. 15.9.2 Summary
      10. 15.10 A useful result after all?
      11. 15.11 How to mass-produce paradoxes
      12. 15.12 Comments
    6. 16. Orthodox methods: historical background
      1. 16.1 The early problems
      2. 16.2 Sociology of orthodox statistics
      3. 16.3 Ronald Fisher, Harold Jeffreys, and Jerzy Neyman
      4. 16.4 Pre-data and post-data considerations
      5. 16.5 The sampling distribution for an estimator
      6. 16.6 Pro-causal and anti-causal bias
      7. 16.7 What is real, the probability or the phenomenon?
      8. 16.8 Comments
        1. 16.8.1 Communication difficulties
    7. 17. Principles and pathology of orthodox statistics
      1. 17.1 Information loss
      2. 17.2 Unbiased estimators
      3. 17.3 Pathology of an unbiased estimate
      4. 17.4 The fundamental inequality of the sampling variance
      5. 17.5 Periodicity: the weather in Central Park
        1. 17.5.1 The folly of pre-filtering data
      6. 17.6 A Bayesian analysis
      7. 17.7 The folly of randomization
      8. 17.8 Fisher: common sense at Rothamsted
        1. 17.8.1 The Bayesian safety device
      9. 17.9 Missing data
      10. 17.10 Trend and seasonality in time series
        1. 17.10.1 Orthodox methods
        2. 17.10.2 The Bayesian method
        3. 17.10.3 Comparison of Bayesian and orthodox estimates
        4. 17.10.4 An improved orthodox estimate
        5. 17.10.5 The orthodox criterion of performance
      11. 17.11 The general case
      12. 17.12 Comments
    8. 18. The Ap distribution and rule of succession
      1. 18.1 Memory storage for old robots
      2. 18.2 Relevance
      3. 18.3 A surprising consequence
      4. 18.4 Outer and inner robots
      5. 18.5 An application
      6. 18.6 Laplace’s rule of succession
      7. 18.7 Jeffreys’ objection
      8. 18.8 Bass or carp?
      9. 18.9 So where does this leave the rule?
      10. 18.10 Generalization
      11. 18.11 Confirmation and weight of evidence
        1. 18.11.1 Is indifference based on knowledge or ignorance?
      12. 18.12 Carnap’s inductive methods
      13. 18.13 Probability and frequency in exchangeable sequences
      14. 18.14 Prediction of frequencies
      15. 18.15 One-dimensional neutron multiplication
        1. 18.15.1 The frequentist solution
        2. 18.15.2 The Laplace solution
      16. 18.16 The de Finetti theorem
      17. 18.17 Comments
    9. 19. Physical measurements
      1. 19.1 Reduction of equations of condition
      2. 19.2 Reformulation as a decision problem
        1. 19.2.1 Sermon on Gaussian error distributions
      3. 19.3 The underdetermined case: K is singular
      4. 19.4 The overdetermined case: K can be made nonsingular
      5. 19.5 Numericale valuation of the result
      6. 19.6 Accuracy of the estimates
      7. 19.7 Comments
        1. 19.7.1 A paradox
    10. 20. Model comparison
      1. 20.1 Formulation of the problem
      2. 20.2 The fair judge and the cruel realist
        1. 20.2.1 Parameters known in advance
        2. 20.2.2 Parameters unknown
      3. 20.3 But where is the idea of simplicity?
      4. 20.4 An example: linear response models
        1. 20.4.1 Digression: the old sermon still another time
      5. 20.5 Comments
        1. 20.5.1 Final causes
    11. 21. Outliers and robustness
      1. 21.1 The experimenter’s dilemma
      2. 21.2 Robustness
      3. 21.3 The two-model model
      4. 21.4 Exchangeable selection
      5. 21.5 The general Bayesian solution
      6. 21.6 Pure outliers
      7. 21.7 One receding datum
    12. 22. Introduction to communication theory
      1. 22.1 Origins of the theory
      2. 22.2 The noiseless channel
      3. 22.3 The information source
      4. 22.4 Does the English language have statistical properties?
      5. 22.5 Optimum encoding: letter frequencies known
      6. 22.6 Better encoding from knowledge of digram frequencies
      7. 22.7 Relation to a stochastic model
      8. 22.8 The noisy channel
  11. Appendix A: Other approaches to probability theory
    1. A.1 The Kolmogorov system of probability
    2. A.2 The de Finetti system of probability
    3. A.3 Comparative probability
    4. A.4 Holdouts against universal comparability
    5. A.5 Speculations about lattice theories
  12. Appendix B: Mathematical formalities and style
    1. B.1 Notation and logical hierarchy
    2. B.2 Our ‘cautious approach’ policy
    3. B.3 Willy Feller on measure theory
    4. B.4 Kronecker vs. Weierstrasz
    5. B.5 What is a legitimate mathematical function?
      1. B.5.1 Delta-functions
      2. B.5.2 Nondifferentiable functions
      3. B.5.3 Bogus nondifferentiable functions
    6. B.6 Counting infinite sets?
    7. B.7 The Hausdorff sphere paradox and mathematical diseases
    8. B.8 What am I supposed to publish?
    9. B.9 Mathematical courtesy
  13. Appendix C: Convolutions and cumulants
    1. C.1 Relation of cumulants and moments
    2. C.2 Examples
  14. References
  15. Bibliography
  16. Author index
  17. Subject index