I’ll start with a simplified version of the problem where we know that there are exactly three species. Let’s call them lions, tigers and bears. Suppose we visit a wild animal preserve and see 3 lions, 2 tigers and one bear.
If we have an equal chance of observing any animal in the preserve,
the number of each species we see is governed by the multinomial
distribution. If the prevalence of lions and tigers and bears is
p_bear, the likelihood of seeing 3 lions, 2 tigers
and one bear is
p_lion**3 * p_tiger**2 * p_bear**1
An approach that is tempting, but not correct, is to use beta
distributions, as in The beta distribution, to describe the prevalence
of each species separately. For example, we saw 3 lions and 3 non-lions;
if we think of that as 3 “heads” and 3 “tails,” then the posterior
beta = thinkbayes.Beta() beta.Update((3, 3)) print beta.MaximumLikelihood()
The maximum likelihood estimate for
p_lion is the observed rate, 50%. Similarly the
p_bear are 33% and
But there are two problems:
We have implicitly used a prior for each species that is uniform from 0 to 1, but since we know that there are three species, that prior is not correct. The right prior should have a mean of 1/3, and there should be zero likelihood that any species has a prevalence of 100%.
The distributions for each species are not independent, because the prevalences have to add up to 1. To capture this ...