Magnifying the Long Tail (Inverse Power Mapping in PHP)

The uniformity of the font sizes I noted earlier is still a problem. The reason forthis is that the tag counts are arranged in a power curve (Figure 23). Power curvesare a very common phenomenon found in popularity or frequency data collectedfrom human activity.

A power curve

Figure 23. A power curve

There tends to be a very few large values in the data, and lots and lots of small values. The problem with mapping a power curve to a limited set of font sizes is that the "long tail" of the power curve ends up getting represented by just one or two font sizes. Many of the intermediate font sizes won't get used at all because of the larger gaps between the counts of the most popular words.

The way to make this tag cloud look better is to use a logarithmic function to reverse the power curve's effects. Essentially, we will map the linear range of font values to the logarithmic range of tag counts, magnifying the differences between smaller counts and making the "long tail" of the power curve more visible (Figures 24 and 25).

Linear mapping of x to y

Figure 24. Linear mapping of x to y

Logarithmic mapping of x to y

Figure 25. Logarithmic mapping of x to y

To do this, we'll add a logarithmic measure of the tag ...

Get Building Tag Clouds in Perl and PHP now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.