Although the algorithms are described in words with explanations of the formulae involved, it's much more useful (and probably easier to follow) to have actual code for the algorithms and example problems. All the example code in this book is written in Python, an excellent, high-level language. I chose Python because it is:
Code written in dynamically typed languages such as Python tends to be shorter than code written in other mainstream languages. This means there's less typing for you when working through the examples, but it also means that it's easier to fit the algorithm in your head and really understand what it's doing.
Python has at times been referred to as "executable pseudocode." While this is clearly an exaggeration, it makes the point that most experienced programmers can read Python code and understand what it is supposed to do. Some of the less obvious constructs in Python are explained in the "Python Tips" section below.
Python comes standard with many libraries, including those for mathematical functions, XML (Extensible Markup Language) parsing, and downloading web pages. The nonstandard libraries used in the book, such as the RSS (Really Simple Syndication) parser and the SQLite interface, are free and easy to download, install, and use.
When working through an example, it's useful to try out the functions as you write them without writing another program just for testing. Python can run programs directly from the command line, and it also has an interactive prompt that lets you type in function calls, create objects, and test packages interactively.
Python supports object-oriented, procedural, and functional styles of programming. Machine-learning algorithms vary greatly, and the clearest way to implement one may use a different paradigm than another. Sometimes it's useful to pass around functions as parameters and other times to capture state in an object. Python supports both approaches.
Python has a single reference implementation for all the major platforms and is free for all of them. The code described in this book will work on Windows, Linux, and Macintosh.
For beginners interested in learning about programming in Python, I recommend reading Learning Python by Mark Lutz and David Ascher (O'Reilly), which gives an excellent overview. Programmers of other languages should find the Python code relatively easy to follow, although be aware that throughout this book I use some of Python's idiosyncratic syntax because it lets me more directly express the algorithm or fundamental concepts. Here's a quick overview for those of you who aren't Python programmers:
Python has a good set of primitive types and two that are used heavily throughout this book are list and dictionary. A list is an ordered list of any type of value, and it is constructed with square brackets:
number_list=[1,2,3,4] string_list=['a', 'b', 'c', 'd'] mixed_list=['a', 3, 'c', 8]
A dictionary is an unordered set of key/value pairs, similar to a hash map in other languages. It is constructed with curly braces:
The elements of lists and dictionaries can be accessed using square brackets after the list name:
string_list # returns 'c' ages['Sarah'] # returns 28
Unlike most languages, Python actually uses the indentation of the code to define code blocks. Consider this snippet:
if x==1: print 'x is 1' print 'Still in if block' print 'outside if block'
The interpreter knows that the first two print statements are
1 because the code is indented.
Indentation can be any number of spaces, as long as it is
consistent. This book uses two spaces for indentation. When entering
the code you'll need to be careful to copy the indentation
A list comprehension is a convenient way of converting one list to another by filtering and applying functions to it. A list comprehension is written as:
For example, the following code:
l1=[1,2,3,4,5,6,7,8,9] print [v*10 for v in l1 if v>4]
would print this list:
List comprehensions are used frequently in this book because
they are an extremely concise way to apply a function to an entire
list or to remove bad items. The other manner in which they are
often used is with the
l1=[1,2,3,4,5,6,7,8,9] timesten=dict([(v,v*10) for v in l1])
This code will create a dictionary with the original list being the keys and each item multiplied by 10 as the value: