You are previewing Probability and Statistics for Computer Scientists, Second Edition, 2nd Edition.
O'Reilly logo
Probability and Statistics for Computer Scientists, Second Edition, 2nd Edition

Book Description

Student-Friendly Coverage of Probability, Statistical Methods, Simulation, and Modeling Tools
Incorporating feedback from instructors and researchers who used the previous edition, Probability and Statistics for Computer Scientists, Second Edition helps students understand general methods of stochastic modeling, simulation, and data analysis; make optimal decisions under uncertainty; model and evaluate computer systems and networks; and prepare for advanced probability-based courses. Written in a lively style with simple language, this classroom-tested book can now be used in both one- and two-semester courses.

New to the Second Edition

  • Axiomatic introduction of probability
  • Expanded coverage of statistical inference, including standard errors of estimates and their estimation, inference about variances, chi-square tests for independence and goodness of fit, nonparametric statistics, and bootstrap
  • More exercises at the end of each chapter
  • Additional MATLAB® codes, particularly new commands of the Statistics Toolbox

In-Depth yet Accessible Treatment of Computer Science-Related Topics
Starting with the fundamentals of probability, the text takes students through topics heavily featured in modern computer science, computer engineering, software engineering, and associated fields, such as computer simulations, Monte Carlo methods, stochastic processes, Markov chains, queuing theory, statistical inference, and regression. It also meets the requirements of the Accreditation Board for Engineering and Technology (ABET).

Encourages Practical Implementation of Skills
Using simple MATLAB commands (easily translatable to other computer languages), the book provides short programs for implementing the methods of probability and statistics as well as for visualizing randomness, the behavior of random variables and stochastic processes, convergence results, and Monte Carlo simulations. Preliminary knowledge of MATLAB is not required. Along with numerous computer science applications and worked examples, the text presents interesting facts and paradoxical statements. Each chapter concludes with a short summary and many exercises.

Table of Contents

  1. Preliminaries
  2. Dedication
  3. Preface
    1. For whom this book is written
    2. Recommended courses
    3. Prerequisites, and use of the appendix
    4. Style and motivation
    5. Computers, demos, illustrations, and MATLAB<span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="cSuperscript">&#174;</span>
    6. Second edition
    7. Thanks and acknowledgments
      1. Figure 1
      1. Table 0.1
  4. Chapter 1 Introduction and Overview
    1. 1.1 Making decisions under uncertainty
      1. Summary and conclusion
    2. 1.2 Overview of this book
    3. Summary and conclusions
    4. Exercises
      1. Figure 1.1
  5. Part I Probability and Random Variables
    1. Chapter 2 Probability
      1. 2.1 Events and their probabilities
        1. 2.1.1 Outcomes, events, and the sample space
        2. 2.1.2 Set operations
      2. 2.2 Rules of Probability
        1. 2.2.1 Axioms of Probability
        2. 2.2.2 Computing probabilities of events
          1. Extreme cases
          2. Union
          3. Complement
          4. Intersection of independent events
        3. 2.2.3 Applications in reliability
      3. 2.3 Combinatorics
        1. 2.3.1 Equally likely outcomes
        2. 2.3.2 Permutations and combinations
          1. Permutations with replacement
          2. Permutations without replacement
          3. Combinations without replacement
          4. Computational shortcuts
          5. Combinations with replacement
      4. 2.4 Conditional probability and independence
        1. Conditional probability
        2. Independence
        3. Bayes Rule
        4. Law of Total Probability
      5. Summary and conclusions
      6. Exercises
        1. Figure 2.1
        2. Figure 2.2
        3. Figure 2.3
        4. Figure 2.4
        5. Figure 2.5
        6. Figure 2.6
        7. Figure 2.7
        8. Figure 2.8
    2. Chapter 3 Discrete Random Variables and Their Distributions
      1. 3.1 Distribution of a random variable
        1. 3.1.1 Main concepts
        2. 3.1.2 Types of random variables
      2. 3.2 Distribution of a random vector
        1. 3.2.1 Joint distribution and marginal distributions
        2. 3.2.2 Independence of random variables
      3. 3.3 Expectation and variance
        1. 3.3.1 Expectation
        2. 3.3.2 Expectation of a function
        3. 3.3.3 Properties
        4. 3.3.4 Variance and standard deviation
        5. 3.3.5 Covariance and correlation
        6. 3.3.6 Properties
        7. 3.3.7 Chebyshev’s inequality
        8. 3.3.8 Application to finance
      4. 3.4 Families of discrete distributions
        1. 3.4.1 Bernoulli distribution
        2. 3.4.2 Binomial distribution
          1. MATLAB notes
        3. 3.4.3 Geometric distribution
        4. 3.4.4 Negative Binomial distribution
        5. 3.4.5 Poisson distribution
        6. 3.4.6 Poisson approximation of Binomial distribution
      5. Summary and conclusions
      6. Exercises
        1. Figure 3.1
        2. Figure 3.2
        3. Figure 3.3
        4. Figure 3.4
        5. Figure 3.5
        6. Figure 3.6
    3. Chapter 4 Continuous Distributions
      1. 4.1 Probability density
        1. Analogy: pmf versus pdf
        2. Joint and marginal densities
        3. Expectation and variance
      2. 4.2 Families of continuous distributions
        1. 4.2.1 Uniform distribution
          1. The Uniform property
          2. Standard Uniform distribution
          3. Expectation and variance
        2. 4.2.2 Exponential distribution
          1. Times between rare events are Exponential
          2. Memoryless property
        3. 4.2.3 Gamma distribution
          1. Expectation, variance, and some useful integration remarks
          2. Gamma-Poisson formula
        4. 4.2.4 Normal distribution
          1. Standard Normal distribution
      3. 4.3 Central Limit Theorem
        1. Normal approximation to Binomial distribution
        2. Continuity correction
      4. Summary and conclusions
      5. Exercises
        1. Figure 4.1
        2. Figure 4.2
        3. Figure 4.3
        4. Figure 4.4
        5. Figure 4.5
        6. Figure 4.6
        1. Table 4.1
        2. Table 4.2
        3. Table 4.3
    4. Chapter 5 Computer Simulations and Monte Carlo Methods
      1. 5.1 Introduction
        1. 5.1.1 Applications and examples
      2. 5.2 Simulation of random variables
        1. 5.2.1 Random number generators
        2. 5.2.2 Discrete methods
          1. Arbitrary discrete distribution
        3. 5.2.3 Inverse transform method
          1. Arbitrary continuous distribution
          2. Discrete distributions revisited
          3. Exponential-Geometric relation
        4. 5.2.4 Rejection method
        5. 5.2.5 Generation of random vectors
        6. 5.2.6 Special methods
          1. Poisson distribution
          2. Normal distribution
      3. 5.3 Solving problems by Monte Carlo methods
        1. 5.3.1 Estimating probabilities
          1. Accuracy of a Monte Carlo study
        2. 5.3.2 Estimating means and standard deviations
        3. 5.3.3 Forecasting
        4. 5.3.4 Estimating lengths, areas, and volumes
          1. Lengths
          2. Areas and volumes
          3. Areas of arbitrary regions with unknown boundaries
        5. 5.3.5 Monte Carlo integration
          1. Accuracy of results
          2. Improved Monte Carlo integration method
          3. Accuracy of the improved method
      4. Summary and conclusions
      5. Exercises
        1. Figure 5.1
        2. Figure 5.2
        3. Figure 5.3
        4. Figure 5.4
        5. Figure 5.5
        6. Figure 5.6
        7. Figure 5.7
        8. Figure 5.8
  6. Part II Stochastic Processes
    1. Chapter 6 Stochastic Processes
      1. 6.1 Definitions and classifications
      2. 6.2 Markov processes and Markov chains
        1. 6.2.1 Markov chains
          1. Characteristics of a Markov chain
        2. 6.2.2 Matrix approach
        3. 6.2.3 Steady-state distribution
          1. Computing the steady-state distribution
          2. The limit of <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="cItalic">P<span class="cSuperscript">h</span></span>
          3. Steady state
          4. Existence of a steady-state distribution. Regular Markov chains
          5. Conclusion
      3. 6.3 Counting processes
        1. 6.3.1 Binomial process
          1. Relation to real time: frames
          2. Markov property
        2. 6.3.2 Poisson process
          1. Continuous time
          2. Poisson process as the limiting case
          3. Rare events and modeling
      4. 6.4 Simulation of stochastic processes
          1. Discrete-time processes
          2. Markov chains
          3. Binomial process
          4. Continuous-time processes
          5. Poisson process
      5. Summary and conclusions
      6. Exercises
        1. Figure 6.1
        2. Figure 6.2
        3. Figure 6.3
        4. Figure 6.4
        5. Figure 6.5
        6. Figure 6.6
        7. Figure 6.7
        8. Figure 6.8
        9. Figure 6.9
        10. Figure 6.10
    2. Chapter 7 Queuing Systems
      1. 7.1 Main components of a queuing system
          1. Arrival
          2. Queuing and routing to servers
          3. Service
          4. Departure
      2. 7.2 The Little’s Law
      3. 7.3 Bernoulli single-server queuing process
          1. Markov property
          2. Steady-state distribution
        1. 7.3.1 Systems with limited capacity
      4. 7.4 M/M/1 system
          1. M/M/1 as a limiting case of a Bernoulli queuing process
          2. Steady-state distribution for an M/M/1 system
        1. 7.4.1 Evaluating the system’s performance
          1. Utilization
          2. Waiting time
          3. Response time
          4. Queue
          5. Little’s Law revisited
          6. When a system gets nearly overloaded
      5. 7.5 Multiserver queuing systems
        1. 7.5.1 Bernoulli <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="cItalic">k</span>-server queuing process-server queuing process
          1. Markov property
          2. Transition probabilities
        2. 7.5.2 M/M/k systems
          1. Steady-state distribution
        3. 7.5.3 Unlimited number of servers and M/M/∞
          1. M/M/∞ queueing system
      6. 7.6 Simulation of queuing systems
          1. Markov case
          2. General case
          3. Example: simulation of a multiserver queuing system
      7. Summary and conclusions
      8. Exercises
        1. Figure 7.1
        2. Figure 7.2
        3. Figure 7.3
        4. Figure 7.4
        5. Figure 7.5
  7. Part III Statistics
    1. Chapter 8 Introduction to Statistics
      1. 8.1 Population and sample, parameters and statistics
          1. Sampling and non-sampling errors
      2. 8.2 Simple descriptive statistics
        1. 8.2.1 Mean
          1. Unbiasedness
          2. Consistency
          3. Asymptotic Normality
        2. 8.2.2 Median
            1. Understanding the shape of a distribution
            2. Computation of a population median
            3. Computing sample medians
        3. 8.2.3 Quantiles, percentiles, and quartiles
        4. 8.2.4 Variance and standard deviation
          1. Computation
        5. 8.2.5 Standard errors of estimates
        6. 8.2.6 Interquartile range
          1. Detection of outliers
          2. Handling of outliers
      3. 8.3 Graphical statistics
        1. 8.3.1 Histogram
          1. How else may histograms look like?
          2. Mixtures
          3. The choice of bins
        2. 8.3.2 Stem-and-leaf plot
        3. 8.3.3 Boxplot
          1. Parallel boxplots
        4. 8.3.4 Scatter plots and time plots
          1. MATLAB notes
      4. Summary and conclusions
      5. Exercises
        1. Figure 8.1
        2. Figure 8.2
        3. Figure 8.3
        4. Figure 8.4
        5. Figure 8.5
        6. Figure 8.6
        7. Figure 8.7
        8. Figure 8.8
        9. Figure 8.9
        10. Figure 8.10
        11. Figure 8.11
        12. Figure 8.12
    2. Chapter 9 Statistical Inference I
      1. 9.1 Parameter estimation
        1. 9.1.1 Method of moments
          1. Moments
          2. Estimation
        2. 9.1.2 Method of maximum likelihood
          1. Discrete case
          2. Continuous case
        3. 9.1.3 Estimation of standard errors
      2. 9.2 Confidence intervals
        1. 9.2.1 Construction of confidence intervals: a general method
        2. 9.2.2 Confidence interval for the population mean
        3. 9.2.3 Confidence interval for the difference between two means
        4. 9.2.4 Selection of a sample size
        5. 9.2.5 Estimating means with a given precision
      3. 9.3 Unknown standard deviation
        1. 9.3.1 Large samples
        2. 9.3.2 Confidence intervals for proportions
        3. 9.3.3 Estimating proportions with a given precision
        4. 9.3.4 Small samples: Student’s <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="cItalic">t</span> distribution distribution
        5. 9.3.5 Comparison of two populations with unknown variances
          1. Case 1. Equal variances
      4. 9.4 Hypothesis testing
        1. 9.4.1 Hypothesis and alternative
        2. 9.4.2 Type I and Type II errors: level of significance
        3. 9.4.3 Level <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="cItalic">&#945;</span> tests: general approach tests: general approach
          1. Step 1. Test statistic
          2. Step 2. Acceptance region and rejection region
          3. Step 3: Result and its interpretation
        4. 9.4.4 Rejection regions and power
        5. 9.4.5 Standard Normal null distribution (Z-test)
        6. 9.4.6 Z-tests for means and proportions
        7. 9.4.7 Pooled sample proportion
        8. 9.4.8 Unknown <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="cSymGreekItalic">&#963;</span>: T-tests: T-tests
        9. 9.4.9 Duality: two-sided tests and two-sided confidence intervals
        10. 9.4.10 P-value
          1. How do we choose <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="cSymGreekItalic">&#945;</span>??
          2. P-value
          3. Testing hypotheses with a P-value
          4. Computing P-values
          5. Understanding P-values
      5. 9.5 Inference about variances
        1. 9.5.1 Variance estimator and Chi-square distribution
        2. 9.5.2 Confidence interval for the population variance
        3. 9.5.3 Testing variance
          1. Level <span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="cSymGreekItalic">&#945;</span> test test
          2. P-value
        4. 9.5.4 Comparison of two variances. F-distribution.
        5. 9.5.5 Confidence interval for the ratio of population variances
        6. 9.5.6 F-tests comparing two variances
          1. MATLAB notes
      6. Summary and conclusions
      7. Exercises
        1. Figure 9.1
        2. Figure 9.2
        3. Figure 9.3
        4. Figure 9.4
        5. Figure 9.5
        6. Figure 9.6
        7. Figure 9.7
        8. Figure 9.8
        9. Figure 9.9
        10. Figure 9.10
        11. Figure 9.11
        12. Figure 9.12
        13. Figure 9.13
        14. Figure 9.14
        15. Figure 9.15
        1. Table 9.1
        2. Table 9.2
        3. Table 9.3
        4. Table 9.4
        5. Table 9.5
        6. Table 9.6
    3. Chapter 10 Statistical Inference II
      1. 10.1 Chi-square tests
        1. 10.1.1 Testing a distribution
        2. 10.1.2 Testing a family of distributions
        3. 10.1.3 Testing independence
      2. 10.2 Nonparametric statistics
        1. 10.2.1 Sign test
        2. 10.2.2 Wilcoxon signed rank test
          1. Null distribution of Wilcoxon test statistic
          2. Exact distribution
          3. Normal approximation
        3. 10.2.3 Mann-Whitney-Wilcoxon rank sum test
          1. Mann-Whitney-Wilcoxon test in MATLAB
          2. Null distribution of Mann-Whitney-Wilcoxon test statistic
          3. Normal approximation
      3. 10.3 Bootstrap
        1. 10.3.1 Bootstrap distribution and all bootstrap samples
          1. Bootstrap distribution
        2. 10.3.2 Computer generated bootstrap samples
        3. 10.3.3 Bootstrap confidence intervals
          1. Parametric method, based on the bootstrap estimation of the standard error
          2. Nonparametric method, based on the bootstrap quantiles
      4. 10.4 Bayesian inference
        1. 10.4.1 Prior and posterior
          1. Conjugate distribution families
          2. Gamma family is conjugate to the Poisson model
          3. Beta family is conjugate to the Binomial model
          4. Normal family is conjugate to the Normal model
        2. 10.4.2 Bayesian estimation
        3. 10.4.3 Bayesian credible sets
        4. 10.4.4 Bayesian hypothesis testing
          1. Loss and risk
      5. Summary and conclusions
      6. Exercises
        1. Figure 10.1
        2. Figure 10.2
        3. Figure 10.3
        4. Figure 10.4
        5. Figure 10.5
        6. Figure 10.6
        7. Figure 10.7
        8. Figure 10.8
        9. Figure 10.9
        10. Figure 10.10
        11. Figure 10.11
        12. Figure 10.12
        13. Figure 10.13
        14. Figure 10.14
        1. Table 10.1
        2. Table 10.2
    4. Chapter 11 Regression
      1. 11.1 Least squares estimation
        1. 11.1.1 Examples
        2. 11.1.2 Method of least squares
        3. 11.1.3 Linear regression
          1. Estimation in linear regression
        4. 11.1.4 Regression and correlation
        5. 11.1.5 Overfitting a model
      2. 11.2 Analysis of variance, prediction, and further inference
        1. 11.2.1 ANOVA and R-square
        2. 11.2.2 Tests and confidence intervals
          1. Degrees of freedom and variance estimation
          2. Inference about the regression slope
          3. ANOVA F-test
          4. F-test and T-test
        3. 11.2.3 Prediction
          1. Confidence interval for the mean of responses
          2. Prediction interval for the individual response
          3. Prediction bands
      3. 11.3 Multivariate regression
        1. 11.3.1 Introduction and examples
        2. 11.3.2 Matrix approach and least squares estimation
          1. Matrix approach to multivariate linear regression
          2. Least squares estimates
        3. 11.3.3 Analysis of variance, tests, and prediction
          1. Testing significance of the entire model
          2. Variance estimator
          3. Testing individual slopes
          4. Prediction
      4. 11.4 Model building
        1. 11.4.1 Adjusted R-square
        2. 11.4.2 Extra sum of squares, partial F-tests, and variable selection
          1. Stepwise (forward) selection
          2. Backward elimination
        3. 11.4.3 Categorical predictors and dummy variables
          1. Avoid singularity by creating only (<span xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" class="cItalic">C</span> &#8722; 1) dummies − 1) dummies
          2. Interpretation of slopes for dummy variables
          3. Matlab notes
      5. Summary and conclusions
      6. Exercises
        1. Figure 11.1
        2. Figure 11.2
        3. Figure 11.3
        4. Figure 11.4
        5. Figure 11.5
        6. Figure 11.6
        1. Table 11.1
  8. Part IV Appendix
    1. Chapter 12 Appendix
      1. 12.1 Inventory of distributions
        1. 12.1.1 Discrete families
        2. 12.1.2 Continuous families
      2. 12.2 Distribution tables
      3. 12.3 Calculus review
        1. 12.3.1 Inverse function
        2. 12.3.2 Limits and continuity
        3. 12.3.3 Sequences and series
        4. 12.3.4 Derivatives, minimum, and maximum
          1. Computing maxima and minima
        5. 12.3.5 Integrals
          1. Improper integrals
          2. Integration by substitution
          3. Integration by parts
          4. Computing areas
          5. Gamma function and factorial
      4. 12.4 Matrices and linear systems
          1. Multiplying a row by a column
          2. Multiplying matrices
          3. Transposition
          4. Solving systems of equations
          5. Inverse matrix
          6. Matrix operations in Matlab
      5. 12.5 Answers to selected exercises
        1. Figure 12.1
        2. Figure 12.2
        3. Figure 12.3
        4. Figure 12.4
        5. Figure 12.5
        6. Figure 12.6
        7. Figure 12.7
        8. Figure 12.8
        9. Figure 12.9
        1. Table A1
        2. Table A2
        3. Table A3
        4. Table A4
        5. Table A5
        6. Table A6
        7. Table A7
        8. Table A8
        9. Table A9
        10. Table 12.1