Cover by Toby Segaran

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

Drawing the Dendrogram

You can interpret the clusters more clearly by viewing them as a dendrogram. Hierarchical clustering results are usually viewed this way, since dendrograms pack a lot of information into a relatively small space. Since the dendrograms will be graphical and saved as JPGs, you'll need to download the Python Imaging Library (PIL), which is available at http://pythonware.com.

This library comes with an installer for Windows and source distributions for other platforms. More information on downloading and installing the PIL is available in Appendix A. The PIL makes it very easy to generate images with text and lines, which is all you'll really need to construct a dendrogram. Add the import statement to the beginning of clusters.py:

from PIL import Image,ImageDraw

The first step is to use a function that returns the total height of a given cluster. When determining the overall height of the image, and where to put the various nodes, it's necessary to know their total heights. If this cluster is an endpoint (i.e., it has no branches), then its height is 1; otherwise, its height is the sum of the heights of its branches. This is easily defined as a recursive function, which you can add to clusters.py:

def getheight(clust):
  # Is this an endpoint? Then the height is just 1
  if clust.left==None and clust.right==None: return 1

  # Otherwise the height is the same of the heights of
  # each branch
  return getheight(clust.left)+getheight(clust.right)

The other thing you need to know ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required