O'Reilly logo

Think Bayes by Allen Downey

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Predictive distributions

I introduced the hidden species problem in the form of four related questions. We have answered the first two by computing the posterior distribution for n and the prevalence of each species.

The other two questions are:

  • If we are planning to collect additional reads, can we predict how many new species we are likely to discover?

  • How many additional reads are needed to increase the fraction of observed species to a given threshold?

To answer predictive questions like this we can use the posterior distributions to simulate possible future events and compute predictive distributions for the number of species, and fraction of the total, we are likely to see.

The kernel of these simulations looks like this:

  1. Choose n from its posterior distribution.

  2. Choose a prevalence for each species, including possible unseen species, using the Dirichlet distribution.

  3. Generate a random sequence of future observations.

  4. Compute the number of new species, num_new, as a function of the number of additional reads, k.

  5. Repeat the previous steps and accumulate the joint distribution of num_new and k.

And here’s the code. RunSimulation runs a single simulation:

# class Subject

    def RunSimulation(self, num_reads):
        m, seen = self.GetSeenSpecies()
        n, observations = self.GenerateObservations(num_reads)

        curve = []
        for k, obs in enumerate(observations):
            seen.add(obs)

            num_new = len(seen) - m
            curve.append((k+1, num_new))

        return curve

num_reads is the number of additional reads to simulate. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required