O'Reilly logo

Building Tag Clouds in Perl and PHP by Jim Bumgardner

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Magnifying the Long Tail (Inverse Power Mapping in PHP)

The uniformity of the font sizes I noted earlier is still a problem. The reason forthis is that the tag counts are arranged in a power curve (Figure 23). Power curvesare a very common phenomenon found in popularity or frequency data collectedfrom human activity.

A power curve

Figure 23. A power curve

There tends to be a very few large values in the data, and lots and lots of small values. The problem with mapping a power curve to a limited set of font sizes is that the "long tail" of the power curve ends up getting represented by just one or two font sizes. Many of the intermediate font sizes won't get used at all because of the larger gaps between the counts of the most popular words.

The way to make this tag cloud look better is to use a logarithmic function to reverse the power curve's effects. Essentially, we will map the linear range of font values to the logarithmic range of tag counts, magnifying the differences between smaller counts and making the "long tail" of the power curve more visible (Figures 24 and 25).

Linear mapping of x to y

Figure 24. Linear mapping of x to y

Logarithmic mapping of x to y

Figure 25. Logarithmic mapping of x to y

To do this, we'll add a logarithmic measure of the tag ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required