Designing a Study

Once you have decided which data to leverage, you have to choose a methodology—that is, a plan on how to conduct your investigation. This very book collects a number of successful approaches and gives plenty of ideas on the investigations to pursue.

There is a caveat, though. Even though many mining and research projects follow the same basic principles, projects experience many small differences that might lead your mining efforts into a dead end. Two of these factors include the nature of the project and the underlying development process. A good example is the differences between open source software (e.g., Eclipse) and industrial software (e.g., Microsoft or SAP projects). Differences in the environments (physical and organizational) that surround the software projects engender fundamental difference in development processes. For example, in open source projects that tend to draw developers from around the world, none of them sharing the same office, pair programming or group code reviews become difficult if not impossible. Even a quick face-to-face chat about a problem requires advanced technology and must respect different time zones.

Differences in development environments and processes will have a fundamental impact on the project history and thus must be considered when mining that history. Recent research projects and replication studies mining data sets from both distributed open source projects and industry projects showed that some mining activities ...

Get Making Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.