You can interpret the clusters more clearly by viewing them as a dendrogram. Hierarchical clustering results are usually viewed this way, since dendrograms pack a lot of information into a relatively small space. Since the dendrograms will be graphical and saved as JPGs, you'll need to download the Python Imaging Library (PIL), which is available at http://pythonware.com.
This library comes with an installer for Windows and source
distributions for other platforms. More information on downloading and
installing the PIL is available in Appendix A. The PIL makes it very easy to
generate images with text and lines, which is all you'll really need to
construct a dendrogram. Add the
import statement to the beginning of clusters.py:
from PIL import Image,ImageDraw
The first step is to use a function that returns the total height of a given cluster. When determining the overall height of the image, and where to put the various nodes, it's necessary to know their total heights. If this cluster is an endpoint (i.e., it has no branches), then its height is 1; otherwise, its height is the sum of the heights of its branches. This is easily defined as a recursive function, which you can add to clusters.py:
def getheight(clust): # Is this an endpoint? Then the height is just 1 if clust.left==None and clust.right==None: return 1 # Otherwise the height is the same of the heights of # each branch return getheight(clust.left)+getheight(clust.right)
The other thing you need to know ...