Replication and Variations of the Prediction Model

It is really important to provide evidence that methods work by replicating studies. Using the same statistical model with the same releases of a given subject system generally will not provide new information, because the algorithms used to make the predictions are deterministic. Therefore, replication in this arena usually means either applying the same model to different releases of the same system or applying the model to different systems.

We have demonstrated both types of replication. In Table 9-1, we indicated the number of different releases included in the study for each of six different systems. In this section, we discuss another form of replication involving model variations.

Although the prediction accuracies achieved by the negative binomial models described in the previous section have generally been very good, we have continued to investigate ways to further improve our predictions.

This has led us to experiment with the use of different predictor variables for the NBR model, as well as comparing the NBR model against other statistical models. These trials have shown that it’s very difficult to improve on the standard NBR model. The best we’ve achieved with additional predictor variables is accuracy improvements of roughly 1%, and none of the other statistical models produced better results than those achieved by the NBR model.

In addition to being a search for improved prediction methods, these studies are another ...

Get Making Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.