Validating Crowdsourced Data

As data is collected by different researchers using different methods, incompatible values are likely to arise. These may appear as outliers or simply as a wide spread of results. Traditionally, with no additional information, researchers had little choice but to give equal weight to each measurement or apply statistical methods to exclude outliers. But, as we have adopted an Open Notebook approach requiring the full record of how each measurement was carried out, each measurement can be evaluated in the context of the information recorded. In several cases this allows a scientist familiar with the methods reported to exclude questionable data points on the basis of inappropriate conditions or a failure to report an important parameter.

In the case of solubility, mixing time and evaporation conditions proved to be important factors. A good example of this was the determination of the solubility of 4-nitrobenzaldehyde in methanol. Of the five measurements taken, three are significantly lower than the other two (Figure 16-2, shown later in this chapter; http://oru.edu/cccda/sl/solubility/ugidata.php?solute=4-nitrobenzaldehyde&solvent=methanol). This method was based on preparing a saturated solution of 4-nitrobenzaldehyde in methanol, evaporating the methanol, and then weighing the residue left behind. It is crucial that a fully saturated solution is prepared, and this was generally done by adding solute with mixing until visible solid remained in the tube. ...

Get Beautiful Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.