Guest commentary on chapter 5: Advances in biomarker discovery with gene expression data
With the ability to measure simultaneously the expression levels of thousands of genes in a single experiment, global gene expression profiling technologies such as micro-arrays and serial analysis of gene expression (SAGE) offer significant advantages in the search for new biomarkers. However, the massive amounts of genome-wide expression data generated pose a great challenge for data mining and analysis. It has been shown that traditional statistical and classification techniques are not sufficient to address some fundamental issues in the search of novel and meaningful biomarkers. For example, one common practice is to apply statistical tests to score genes on the basis of their association with specific clinical outcomes and then to select the top-ranked genes as biomarker candidates, which may result in the identification of a set of highly correlated biomarkers. Gerszten and Wang (2008) argued that, in order to achieve a significant improvement in predictive performance, new orthogonal biomarkers associated with new disease pathways are needed. Unsupervised clustering techniques and recent advances in network-based analysis offer great benefits in this endeavour.
Unsupervised clustering approaches
Clustering is the process of ...