Preface

This preface provides information I expect will be important for someone reading and using this book. The first part introduces the book itself. The second talks about Python. The third part contains other notes of various kinds.

Introduction

I would like to begin with some comments about this book, the field of bioinformatics, and the kinds of people I think will find it useful.

About This Book

The purpose of this book is to show the reader how to use the Python programming language to facilitate and automate the wide variety of data manipulation tasks encountered in life science research and development. It is designed to be accessible to readers with a range of interests and backgrounds, both scientific and technical. It emphasizes practical programming, using meaningful examples of useful code. In addition to meeting the needs of individual readers, it can also be used as a textbook for a one-semester upper-level undergraduate or graduate-level course.

The book differs from traditional introductory programming texts in a variety of ways. It does not attempt to detail every possible variation of the mechanisms it describes, emphasizing instead the most frequently used. It offers an introduction to Python programming that is more rapid and in some ways more superficial than what would be found in a text devoted solely to Python or introductory programming. At the same time, it includes some advanced features, techniques, and topics that are often omitted from entry-level Python books. These are included because of their wide applicability in bioinformatics programming, and they are used extensively in the book’s examples.

Python’s installation includes a large selection of optional components called “modules.” Python books usually cover a small selection of the most generally useful modules, and perhaps some others in less detail. Having bioinformatics programming as this book’s target had some interesting effects on the choice of which modules to discuss, and at what depth. The modules (or parts of modules) that are covered in this book are the ones that are most likely to be particularly valuable in bioinformatics programming. In some cases the discussions are more substantial than would be found in a generic Python book, and many of the modules covered here appear in few other books. Chapter 6, in particular, describes a large number of narrowly focused “utility” modules.

The remaining chapters focus on particular areas of programming technology: pattern matching, processing structured text (HTML and XML), web programming (opening web pages, programming HTTP requests, interacting with web servers, etc.), relational databases (SQL), and structured graphics (Tk and SVG). They each introduce one or two modules that are essential for working with these technologies, but the chapters have a much larger scope than simply describing those modules.

Unlike many technical books, this one really should be read linearly. Even in the later chapters, which deal extensively with particular kinds of programming work, examples will often use material from an earlier chapter. In most places the text says that and provides cross-references to earlier examples, so you’ll at least know when you’ve encountered something that depends on earlier material. If you do jump from one place to another, these will provide a path back to what you’ve missed.

Each chapter ends with a special “Tips, Traps, and Tracebacks” section. The tips provide guidance for applying the concepts, mechanisms, and techniques discussed in the chapter. In earlier chapters, many of the tips also provide advice and recommendations for learning Python, using development tools, and organizing programs. The traps are details, warnings, and clarifications regarding common sources of confusion or error for Python programmers (especially new ones). You’ll soon learn what a traceback is; for now it is enough to say that they are error messages likely to be encountered when writing code based on the chapter’s material.

About Bioinformatics

Any title with the word “bioinformatics” in it is intrinsically ambiguous. There are (at least) three quite different kinds of activities that fall within this term’s wide scope. Both the nature of the work performed and the educational backgrounds and technical talents of the people who perform these various activities differ significantly. The three main areas of bioinformatics are:

Computational biology

Concerned with the development of algorithms for mining biological data and modeling biological phenomena

Software development

Focused on writing software to implement computational biology algorithms, visualize complex data, and support research and development activity, with particular attention to the challenges of organizing, searching, and manipulating enormous quantities of biological data

Life science research and development

Focused on the application of the tools and results provided by the other two areas to probe the processes of life

This book is designed to teach you bioinformatics software development. There is no computational biology here: no statistics, formulas, equations—not even explanations of the algorithms that underlie commonly used informatics software. The book’s examples are all based on the kind of data life science researchers work with and what they do with it.

The book focuses on practical data management and manipulation tasks. The term “data” has a wide scope here, including not only the contents of databases but also the contents of text files, web pages, and other information sources. Examples focus on genomics, an area that, relative to others, is more mature and easier to introduce to people new to the scientific content of bioinformatics, as well as dealing with data that is more amenable to representation and manipulation in software. Also, and not incidentally, it is the part of bioinformatics with which the author is most familiar.

About the Reader

This book assumes no prior programming experience. Its introduction to and use of Python are completely self-contained. Even if you do have some programming experience, the nature of Python and the book’s presentation of technical matter won’t necessarily relate directly to anything you’ve learned before: you too might find much to explore here.

The book also assumes no particular knowledge of or experience in bioinformatics or any of the scientific fields to which it relates. It uses real examples from real biological data, and while nearly all of the topics should be familiar to anyone working in the field, there’s nothing conceptually daunting about them. Fundamentally, the goal here is to teach you how to write programs that manipulate data.

This book was written with several audiences in mind:

  • Life scientists

  • Life sciences students, both undergraduate and graduate

  • Technical staff supporting life science research

  • Software developers interested in the use of Python in the life sciences

To each of these groups, I offer an introductory message:

Scientists

Presumably you are reading this book because you’ve found yourself doing, or wanting to do, some programming to support your work, but you lack the computer science or software engineering background to do it as well as you’d like. The book’s introduction to Python programming is straightforward, and its examples are drawn from bioinformatics. You should find the book readable even if you are just curious about programming and don’t plan to do any yourself.

Students

This book could serve as a textbook for a one-semester course in bioinformatics programming or an equivalent independent study effort. If you are majoring in a life science, the technical competence you can gain from this book will enable you to make significant contributions to the projects in which you participate. If you are majoring in computer science or software engineering but are intrigued by bioinformatics, this book will give you an opportunity to apply your technical education in that field. In any case, nothing in the book should be intimidating to any student with a basic background either in one of the life sciences or in computing.

Technical staff

You’re probably already doing some work managing and manipulating data in support of life science research and development, and you may be accustomed to writing small scripts and performing system maintenance tasks. Perhaps you’re frustrated by the limits of your knowledge of computing techniques. Regardless, you have developed an interest in the science and technology of bioinformatics. You want to learn more about those fields and develop your skills in working with biological data. Whatever your training and responsibilities, you should find this book both approachable and helpful.

Programmers

Bioinformatics software differs from most other software in important, though hard to pin down, ways. Python also differs from other programming languages in ways that you will probably find intriguing. This book moves quickly into significant technical material—it does not follow the pattern of a traditional kind of “Programming in...” or “Learning...” or “Introduction to...” book. Though it makes no attempt to provide a bioinformatics primer, the book includes sufficient examples and explanations to intrigue programmers curious about the field and its unusual software needs.

Note

I would like to point out to computer scientists and experienced software developers who may read this book that some very particular choices were made for the purposes of presentation to its intended audience. At the risk of sounding arrogant, I assure you that these are backed by deep theoretical knowledge, extensive experience, and a full awareness of alternatives. These choices were made with the intention of simplifying technical vocabulary and presenting as clear and uniform a view of Python programming as possible. They also were based on the assumption that most people making use of what they learn in this book will not move on to more advanced programming or large-scale software development.

Some things that will appear strange to anyone with significant programming experience are in reality true to a pure “Pythonic” approach. It is delightful to have the opportunity to write in this vocabulary without the need to accommodate more traditional terminology.

The most significant example of this is that the word “variable” is never used in the context of assignment statements or function calls. Python does not assign values to variables in the way that traditional “values in a box” languages do. Instead, like some of the languages that influenced its design, what Python does is assign names to values. The assignment statement should be read from left to right as assigning a name to an existing value. This is a very real distinction that goes beyond the ways languages such as Java and C++ refer to objects through pointer-valued variables.

Another aspect of the book’s heavily Pythonic approach is its routine use of comprehensions. Approached by someone familiar with other languages, these can appear quite mysterious. For someone learning Python as a first language, though, they can be more natural and easier to use than the corresponding combinations of assignments, tests, and loops or iterations.

Python

This section introduces the Python language and gives instructions for installing and running Python on your machine.

Some Context

There are many kinds of programming languages, with different purposes, styles, intended uses, etc. Professional programmers often spend large portions of their careers working with a single language, or perhaps a few similar ones. As a result, they are often unaware of the many ways and levels at which programming languages can differ. For educational and professional development purposes, it can be extremely valuable for programmers to encounter languages that are fundamentally different from the ones with which they are familiar.

The effects of such an encounter are similar to learning a foreign human language from a different culture or language family. Learning Portuguese when you know Spanish is not much of a mental stretch. Learning Russian when you are a native English speaker is. Similarly, learning Java is quite easy for experienced C++ programmers, but learning Lisp, Smalltalk, ML, or Perl would be a completely different experience.

Broadly speaking, programming languages embody combinations of four paradigms. Some were designed with the intention of staying within the bounds of just one, or perhaps two. Others mix multiple paradigms, although in these cases one is usually dominant. The paradigms are:

Procedural

This is the traditional kind of programming language in which computation is described as a series of steps to be executed by the computer, along with a few mechanisms for branching, repetition, and subroutine calling. It dates back to the earliest days of computing and is still a core aspect of most modern languages, including those designed for other paradigms.

Declarative

Declarative programming is based on statements of facts and logical deduction systems that derive further facts from those. The primary embodiment of the logic programming paradigm is Prolog, a language used fairly widely in Artificial Intelligence (AI) research and applications starting in the 1980s. As a purely logic-based language, Prolog expresses computation as a series of predicate calculus assertions, in effect creating a puzzle for the system to solve.

Functional

In a purely functional language, all computation is expressed as function calls. In a truly pure language there aren’t even any variable assignments, just function parameters. Lisp was the earliest functional programming language, dating back to 1958. Its name is an acronym for “LISt Processing language,” a reference to the kind of data structure on which it is based.

Lisp became the dominant language of AI in the 1960s and still plays a major role in AI research and applications. The language has evolved substantially from its early beginnings and spawned many implementations and dialects, although most of these disappeared as hardware platforms and operating systems became more standardized in the 1980s.

A huge standardization effort combining ideas from several major dialects and a great many extensions, including a complete object-oriented (see below) component, was undertaken in the late 1980s. This effort resulted in the now-dominant CommonLisp.[1] Two important dialects with long histories and extensive current use are Scheme and Emacs Lisp, the scripting language for the Emacs editor. Other functional programming languages in current use are ML and Haskell.

Object-oriented

Object-oriented programming was invented in the late 1960s, developed in the research community in the 1970s, and incorporated into languages that spread widely into both academic and commercial environments in the 1980s (primarily Smalltalk, Objective-C, and C++). In the 1990s this paradigm became a key part of modern software development approaches. Smalltalk and Lisp continued to be used, C++ became dominant, and Java was introduced. Mac OS X, though built on a Unix-like kernel, uses Objective-C for upper layers of the system, especially the user interface, as do applications built for Mac OS X. JavaScript, used primarily to program web browser actions, is another object-oriented language. Once a radical innovation, object-oriented programming is today very much a mainstream paradigm.

Another dimension that distinguishes programming languages is their primary intended use. There have been languages focused on string matching, languages designed for embedded devices, languages meant to be easy to learn, languages built for efficient execution, languages designed for portability, languages that could be used interactively, languages based largely on list data structures, and many other kinds.

Language designers, whether consciously or not, make choices in these and other dimensions. Subsequent evolutions of their languages are subject to market forces, intellectual trends, hardware developments, and so on. These influences may help a language mature and reach a wider audience. They may also steer the language in directions somewhat different from those originally intended.

The Python Language

Simply put, Python is a beautiful language. It is effective for everything from teaching new programmers to advanced computer science study, from simple scripts to sophisticated advanced applications. It has always had some purchase in bioinformatics, and in recent years its popularity has been increasing rapidly. One goal of this book is to help significantly expand Python’s use for bioinformatics programming.

Python features a syntax in which the ends of statements are marked only by the end of a line, and statements that form part of a compound statement are indented relative to the lines of code that introduce them. The semicolons or keywords that end statements and the braces that group statements in other languages are entirely absent.

Programmers familiar with “standard syntax” languages often find Python’s uncluttered syntax deeply disconcerting. New programmers have no such problem, and for them, this simple and readable syntax is far easier to deal with than the visually arcane constructions using punctuation (with the attendant compilation errors that must be confronted). Traditional programmers should reconsider Python’s syntax after performing this experiment:

  1. Open a file containing some well-formatted code.

  2. Delete all semicolons, braces, and terminal keywords such as end, endif, etc.

  3. Look at the result.

To the human eye, the simplified code is easier to read—and it looks an awful lot like Python. It turns out that the semicolons, terminal keywords, and braces are primarily for the benefit of the compiler. They are not really necessary for human writers and readers of program code. Python frees the programmer from the drudgery of serving as a compiler assistant.

Python is an interesting and powerful language with respect to computing paradigms. Its skeleton is procedural, and it has been significantly influenced by functional programming, but it has evolved into a fundamentally object-oriented language. (There is no declarative programming component—of the four paradigms, declarative programming is the one least amenable to fitting together with another.) Few, if any, other languages provide a blend like this as seamlessly and elegantly as does Python.

Installing Python

This book uses Python 3, the language’s first non-backward-compatible release. With a few minor changes, noted where applicable, Python 2.x will work for most of the book’s examples. There are a few notes about Python 2 in Chapters 1, 3, and 5; they are there not just to help you if you find yourself using Python 2 for some work, but also for when you read Python 2 code. The major exception is that print was a statement in Python 2 but is now a function, allowing for more flexibility. Also, Python 3 reorganized and renamed some of its library modules and their contents, so using Python 2.x with examples that demonstrate the use of certain modules would involve more than a few minor changes.

The current release of Python can be downloaded from http://python.org/download/. Installers are available for OS X and Windows. With most distributions of Linux, you should be able to install Python through the usual package mechanisms. (Get help from someone who knows how to do that if you don’t.) You can also download the source, unpack the archive, and, following the steps in the “Build Instructions” section of the README file it contains, configure, “make,” and install the software.

Warning

If you are installing Python from its source code, you may need to download, configure, make, and install several libraries that Python uses if available. At the end of the “make” process, a list of missing optional libraries is printed. It is not necessary to obtain all the libraries. The ones you’ll want to have are:

  • curses

  • gdbm

  • sqlite3[2]

  • Tcl/Tk[3]

  • readline

All of these should be available through standard package installers.

Running Python

You can start Python in one of two ways:

  1. Type python3 on the command line.[4]

  2. Run an IDE. Python comes with one called IDLE, which is sufficient for the work you’ll do in this book and is a good place to start even if you eventually decide to move on to a more sophisticated IDE.

The term Unix in this book refers to all flavors thereof, including Linux and Mac OS X. The term command line refers to where you type commands to a “shell”—in particular, a Unix shell such as tcsh or bash or a Windows command window—as opposed to typing to the Python interpreter. The term interpreter may refer to either the interpreter running in a shell, the “Python Shell” window in IDLE, or the corresponding window in whatever other development environment you might be using.

When Python starts interactively, it prints some information about its version. Then it repeats a cycle in which it:

  1. Prints the prompt >>> to indicate that it is waiting for you to type something

  2. Reads what you type

  3. Interprets its meaning to obtain a value

  4. Prints that value

Throughout the book, the appearance of the >>> prompt in examples indicates the use of the interpreter. Like nearly all command-line interactive applications, the Python interpreter won’t pay any attention to what you’ve typed until you press the Return (Enter) key. Pressing the Return key does not always provide complete input for Python to process, though; as you’ll see, it is not unusual to enter multiline inputs. In the command-line interpreter, Python will indicate that it is still waiting for you to complete your input by starting lines following the >>> prompt with .... IDLE, unfortunately, gives no such indication.

Both IDLE and the command-line interpreter provide simple keyboard shortcuts for editing the current line before pressing Return. There are also keys to recall previous inputs. IDLE provides quite a few additional keyboard shortcuts that are worth learning early on. In addition, if you are using an IDE—IDLE, in particular—you’ll be able to use the mouse to click to move the cursor on the input line.

To get more information about using IDLE, select “IDLE Help” from its Help menu. That won’t show you the keyboard shortcuts, though; they are listed in the “Keys” tab of IDLE’s preferences dialog. Note that you can use that dialog to change the keystroke assignments, as well as the fonts, colors, window size, and so on.

When you want to quit a command-line Python interpreter, simply type Ctrl-D in Unix (including Linux and the Mac OS X Terminal application). In Windows, type Ctrl-Z. You exit an IDE with the usual Quit menu command.

Notes

I end this preface with some notes about things I think will help you make the most of your experience with this book.

Reading and Reference Recommendations

The documentation that comes with the Python installation is excellent, extensive, and well organized, but it can be overwhelming to newcomers. Moreover, the topics this book presents and the way it presents them are designed specifically with bioinformatics students and professionals in mind (though of course it’s hoped that it will be valuable to a much wider audience than that). Unless you find yourself needing more information about the Python language or library than is provided in this book while you’re reading it, it’s probably best to wait until you finish it before spending much time with the documentation. The documentation is aimed primarily at Python programmers. You’ll be one when you finish this book, at which point you’ll use the documentation all the time.

With respect to the bioinformatics side of things, I trust you won’t encounter anything unfamiliar here. But if you do, or you want to delve deeper, Wikipedia is a remarkably deep resource for bioinformatics—for programming and computer science too, for that matter. There are two astoundingly extensive bioinformatics references you should at least have access to, if not actually own:

  • Bioinformatics and Functional Genomics, Second Edition, by Jonathan Pevsner (Wiley-Blackwell)

  • Bioinformatics: Sequence and Genome Analysis, Second Edition, by David W. Mount (Cold Spring Harbor Laboratory Press)

An unusual collection of essays containing detailed information about approaches to analyzing bioinformatics data and the use of a vast array of online resources and tools is:

  • Bioinformatics for Geneticists: A Bioinformatics Primer for the Analysis of Genetic Data, Second Edition, by Michael R. Barnes (Ed.) (Wiley)

Example Code

All the code for this book’s examples, additional code, some lists of URLs, data for the examples, and so forth are found on the book’s website. In many cases, there is a sequence of files for the same example or set of examples that shows the evolution of the example from its simplest start to where it ends up. In a few cases, there are versions of the code that go beyond what is shown in the book. There are also some examples that are not in the book at all, as well as exercises for each chapter.

Within the book’s code examples, statement keywords are in boldface. Comments and documentation are in serif typeface. Some examples use oblique monospace to indicate descriptive “pseudocode” meant to be replaced with actual Python code. A shaded background indicates either code that has changed from the previous version of an example or code that implements a point made in the preceding paragraph(s).

Unfortunate and Unavoidable Vocabulary Overlap

This book’s vocabulary is drawn from three domains that, for the most part, are independent of each other: computer science (including data structures and programming language concepts), Python (which has its own names for the program and data structures it offers), and biology. Certain words appear in two or even all three of these domains, often playing a major role and having a different meaning in each. The result is unfortunate and unavoidable collisions of vocabulary. Care was taken to establish sufficient context for the use of these words to make their meanings clear, but the reader should be prepared for the occasional mental backtrack to try another meaning for a term while attempting to make sense of what is being said.

Even within a single domain, a term’s importance does not necessarily rescue it from ambiguity. Consider the almost unrelated meanings of the term “frame” as the offset from the start of a DNA sequence and in the phrase “open reading frame.” There can be many open reading frames in a frame and many frames with open reading frames. And sometimes there are three frames to consider, and sometimes also the reverse complement frames, making six. Open reading frames can appear in any of the six reading frames.

The vocabulary overlap is so omnipresent that it is almost humorous. Then again, the words involved are fine words that have meanings in a great many other domains too, so we should not be surprised to encounter them in our three. Even though you have not yet been properly introduced to them, Table 1 lists some of the most vexing examples. Stay particularly alert when you encounter these, especially when you see the words in code examples.

Table 1. Domain-ambiguous terms

Term

Biology

Programming

Python

Sequence

Part of a DNA or RNA molecule; more often refers to the abstraction thereof, as represented with letters

(Usually) one of a number of data structures that arrange their elements linearly

A linear, and therefore numerically indexable, collection of values

Base

A single nucleotide in a DNA or RNA molecule

Base 10, 16, 2, etc.

Base 10, 16, 2, etc., as used in input and output operations

String

A series of letters representing a DNA, RNA, or amino acid sequence

A sequence of characters, often a “primitive type” of a language

An immutable sequence type named str

Expression

The production of proteins under the control of cellular machinery influenced by life stage, the organ containing the cell, internal states (disease, hunger), and external conditions (dryness, heat)

(1) (Generally) a combination of primitive values, operators, and function calls, with specifics differing significantly among languages

(1) A combination of primitive values, operators, and function calls

 

(2) Regular expression: a pattern describing a set of strings with notations for types of characters, grouping, repetition, and so on, the details of which differ among languages and editors

(2a) A regular expression string

(2b) A regular expression string compiled into a regular expression object

Type

The specimen of an organism first used to describe and name it

A theoretical construct defined differently in different contexts and implemented differently by different programming languages; corresponds roughly to “the kind of thing” something is and “the kind of operations” it supports

Synonymous with “class,” but often used in the context of Python’s built-in types, as opposed to classes defined in a Python or externally obtained library or in user code

Translate, translate

Convert DNA codons (base triples) to amino acids according to the genetic code of the organism

Convert computer code in one language into computer code in another, typically lower-level, language

A method of str that uses a table to produce a new str with all the characters of the original replaced by the corresponding entries in the table

Class

One of the levels in the standard taxonomic classification of organisms

In languages that support object-oriented programming, the encapsulated definition of data and related code

As in programming; more specifically, the type of an object, which itself is an object that defines the methods for its instances

Loop

A property of RNA secondary structures (among other meanings)

An action performed repeatedly until some condition is no longer true

An action performed repeatedly until some condition is no longer true

Library

A collection of related sequences, most commonly used in the context of a library of expressed RNA in cDNA form

Like a program, but meant to be used by other programs rather than as a freestanding application; most languages use a core set of libraries and provide a large selection of optional ones

A collection of modules, each containing a collection of related definitions, as in “Python comes with an extensive library of optional tools and facilities”

Complement

The nucleotide with which another always pairs

“Two’s complement” is the standard representation of negative integers

 

R.E.

Restriction enzyme

 

Regular expression

Fortunately, while the term “sequence” has a conceptual meaning in Python, there is nothing defined in the language by that name, so we can use it in our descriptions and code examples. Likewise, the name of the string type is str, so we can use the term “string” in descriptions and examples. The lack of overlap in these instances saves a fair amount of awkward clarification that would otherwise be required.

Comments

Write code as you read: it will really help you understand and benefit from what you are reading. Read through the code examples. Look for more detailed code, additional examples, and exercises on the book’s website.

Bioinformatics is a fascinating field. Python is a wonderful language. Programming is an exciting challenge. Technologies covered here are important. This book is an invitation to investigate, experience, and learn (more) about all of these topics. Enjoy!

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, filenames, and file extensions, and is used for documentation and comments in code examples

Constant width

Used for names of values, functions, methods, classes, and other code elements, as well as the code in examples, the contents of files, and printed output

Constant width bold

Indicates user input to be typed at an interactive prompt, and highlights Python statement keywords in examples

Constant width italic

Shows text that should be replaced with user-supplied values; also used in examples for “pseudocode” meant to be replaced with actual Python code

Note

This icon signifies a tip, suggestion, or general note.

Warning

This icon indicates a warning or caution.

Some chapters include text and code in specially labeled sidebars. These capture explanations and examples in an abstract form. They are meant to serve as aids to learning the first time you go through the material, and useful references later. There are three kinds of these special sidebars:

We’d Like to Hear from You

Every example in this book has been tested, but occasionally you may encounter problems. Mistakes and oversights can occur, and we will gratefully receive details of any that you find, as well as any suggestions you would like to make for future editions. You can contact the author and editor at:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at:

http://www.oreilly.com/catalog/9780596154509/

To comment or ask technical questions about this book, send email to the following, quoting the book’s ISBN number (9780596154509):

For more information about our books, conferences, Resource Centers, and the O’Reilly Network, see our website at:

http://www.oreilly.com

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Bioinformatics Programming using Python by Mitchell L Model. Copyright 2010 Mitchell L Model, 978-0-596-15450-9.”

If you feel your use of code examples falls outside fair use or the permission given here, feel free to contact us at .

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that lets you easily search over 7,500 technology and creative reference books and videos to find the answers you need quickly.

With a subscription, you can read any page and watch any video from our library online. Read books on your cell phone and mobile devices. Access new titles before they are available for print, and get exclusive access to manuscripts in development and post feedback for the authors. Copy and paste code samples, organize your favorites, download chapters, bookmark key sections, create notes, print out pages, and benefit from tons of other time-saving features.

O’Reilly Media has uploaded this book to the Safari Books Online service. To have full digital access to this book and others on similar topics from O’Reilly and other publishers, sign up for free at http://my.safaribooksonline.com.

Acknowledgments

The many O’Reilly people who worked with me to turn my draft into a book were an impressive lot. Two of them deserve particular thanks. The term “editor” does not come close to describing the roles Mike Loukides played in this project, among them manager, confidante, and contributor; it was delightful to work with him and to have him as an audience for my technical musings. Rachel Head, copyeditor extraordinaire, contributed extensively to the clarity and accuracy of this book; I enjoyed working with her and was amazed by her ability to detect even tiny technical inconsistencies.

My thanks to James Tisdall, whose O’Reilly books Beginning Perl for Bioinformatics and Mastering Perl for Bioinformatics were the original impetus—though longer ago than I would like to remember—for my writing a similar book using Python and whose encouragements I much appreciated. A number of reviewers made helpful comments. Foremost among them was my friend and colleague Tom Stambaugh, founder of Zeetix LLC, who gave one draft an extremely close reading that resulted in many changes after hours of discussion. Though I initially resisted much of what reviewers suggested, I eventually acceded to most of it, which considerably improved the book.

I thank my students at Northeastern University’s Professional Masters in Bioinformatics program for their patience, suggestions, and error detection while using earlier versions of the book’s text and code. Special thanks go to Jyotsna Guleria, a graduate of that program, who wrote test programs for the example code that uncovered significant latent errors. (Extended versions of the test programs can be found on the book’s website.) Finally, I hope what I have produced justifies what my friends, family, and colleagues endured during its creation—especially Janet, my wife, whose unwavering support during the book’s writing made the project possible.



[2] You can find precompiled binaries for most platforms at http://sqlite.org/download.html.

[4] On OS X, a command-line shell is obtained by running the Terminal application, found in the Utilities folder in the Applications folder. On most versions of Windows, a “Command Prompt” window can be opened either by selecting Run from the Start menu and typing cmd or by selecting Accessories from the Start menu, then the Command Prompt entry of that menu. You may also find an Open Command Line Here entry when you right-click on a folder in a Windows Explorer window; this is perhaps the best way to start a command-line Python interpreter in Windows because it starts Python with the selected folder as the current directory. You may have to change your path settings to include the directory that contains the Python executable file. On a Unix-based system, you do that in the “rc” file of the shell you are using (e.g., ~/.bashrc). On Windows, you need to set the path as an environment variable, a rather arcane procedure that differs among different versions of Windows. You can also type the full path to the Python executable; on Windows, for example, that would probably be C:\\python3.1\\python.exe.

Get Bioinformatics Programming Using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.