Chapter 64. Web Databases the Genome Project Way

Lincoln D. Stein

This article is going to be a bit different. Instead of describing a neat trick or technique for Perl web programming, I’m going to talk a bit about my own work in the Human Genome Project.

The data generated by the Genome Project is more complex than the type of data one usually sees in business applications. Instead of a few simple relationships between objects, biological objects are woven into a rich web of interconnections. For example, a DNA sequence will contain a number of genes, each of which encodes one or more proteins, each of which has a confirmed or predicted function. The protein functions, in turn, are related to diseases, which are related to disease mapping information, which are related to genes, which are related back to DNA sequences. You can describe biological information in the familiar terms of a relational database schema, but you might not like the results. Inevitably you “fracture” the original biological objects into many small tables. Some Oracle-based genome databases use relational schemas of over 600 tables and require a database guru just to formulate and execute useful queries!

A more natural solution for storing biological data is to use an object-oriented system. In such a system, real-world objects like genes and DNA sequences are mapped directly onto database objects. This makes it easier for the end users (the biologists), to understand the database, and facilitates communication ...

Get Computer Science & Perl Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.