Chapter 13. The Design of the Gene Sorte

Jim Kent

This chapter is about a moderate-sized program i wrote called the gene sorter. The size of the Gene Sorter code is larger than the projects described in most of the other chapters, about 20,000 lines in all. Though there are some smaller pieces of the Gene Sorter that are quite nice, for me the real beauty is how easy it is to read, understand, and extend the program as a whole. In this chapter, I’ll present an overview of what the Gene Sorter does, highlight some of the more important parts of the code, and then discuss the issues involved in making programs longer than a thousand lines enjoyable and even beautiful to work with.

The Gene Sorter helps scientists rapidly sift through the roughly 25,000 genes in the human genome to find those most relevant to their research. The program is part of the http://genome.ucsc.edu web site, which also contains many other tools for working with data generated by the Human Genome Project. The Gene Sorter design is simple and flexible. It incorporates many lessons we learned in two previous generations of programs that serve biomedical data over the Web. The program uses CGI to gather input from the user, makes queries into a MySQL database, and presents the results in HTML. About half of the program code resides in libraries shared with other genome.ucsc.edu tools.

The human genome is a digital code that somehow contains all of the information needed to build a human body, including that most ...

Get Beautiful Code now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.