Chapter 24. Bioconductor

Most of this book is applicable across multiple areas of study, but this chapter focuses on a single field: bioinformatics. In particular, we’re going to focus on the Bioconductor project. Bioconductor is an open source software project for analyzing genomic data in R. Initially, it focused on just gene expression data, but now includes tools for analyzing other types of data such as serial analysis of gene expression (SAGE), proteomic, single-nucleotide polymorphism (SNP), and gene sequence data.

Biological data isn’t much different from other types of data we’ve seen in the book: data is stored in vectors, arrays, and data frames. You can process and analyze this data using the same tools that R provides for other types of data, including data access tools, statistical models, and graphics.

Bioconductor provides tools for each step of the analysis process: loading, cleaning, and analyzing data. Depending on the type of data that you are working with, you might need to use other software in conjunction with Bioconductor. For example, if you are working with Affymetrix GeneChip arrays, you will need to use the Affymetrix GeneChip Command Console software to scan the arrays and produce probe cell intensity data (CEL files) that can be loaded into R. You can then load the probe cell intensity files into Bioconductor for futher processing.

This chapter provides a very brief overview of Bioconductor. In this chapter, we’ll first look at an example, using publically ...

Get R in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.