Imagine if, instead of a deep neural network, we were using k logistic regressions, where each regression is predicting membership in a single class. That collection of logistic regressions, one for each class would look like this:
The problem with using this group of logistic regressions is that the output of each individual logistic regression is independent. Imagine a case where several of these logistic regressions in our set were uncertain of membership in their particular class, resulting in multiple answers that were around P(Y=k) = 0.5. This keeps us from using these outputs as an overall probability of class membership ...