In Section 2.6, I discussed some general properties of maximum likelihood estimators. Here, we consider the details of the construction and maximization of the likelihood function. Although this section can be skipped without loss of continuity, I strongly encourage you to work through it to the best of your ability. A basic understanding of maximum likelihood for the logit model can help to remove much of the mystery of the technique. It can also help you understand how and why things sometimes go wrong.
Let’s start with some notation and assumptions. We have data for n individuals (i=1,....,n) who are assumed to be statistically independent. For each individual i, the data consists of yi and