2.17. Calculating Soundex

Problem

You need the Soundex code of a word or a name.

Solution

Use Jakarta Commons Codec’s Soundex. Supply a surname or a word, and Soundex will produce a phonetic encoding:

// Required import declaration
import org.apache.commons.codec.language.Soundex;

// Code body
Soundex soundex = new Soundex( );
String obrienSoundex = soundex.soundex( "O'Brien" );
String obrianSoundex = soundex.soundex( "O'Brian" );
String obryanSoundex = soundex.soundex( "O'Bryan" );

System.out.println( "O'Brien soundex: " + obrienSoundex );
System.out.println( "O'Brian soundex: " + obrianSoundex );
System.out.println( "O'Bryan soundex: " + obryanSoundex );

This will produce the following output for three similar surnames:

O'Brien soundex: O165
O'Brian soundex: O165
O'Bryan soundex: O165

Discussion

Soundex.soundex( ) takes a string, preserves the first letter as a letter code, and proceeds to calculate a code based on consonants contained in a string. So, names such as “O’Bryan,” “O’Brien,” and “O’Brian,” all being common variants of the Irish surname, are given the same encoding: “O165.” The 1 corresponds to the B, the 6 corresponds to the R, and the 5 corresponds to the N; vowels are discarded from a string before the Soundex code is generated.

The Soundex algorithm can be used in a number of situations, but Soundex is usually associated with surnames, as the United States historical census records are indexed using Soundex. In addition to the role Soundex plays in the census, ...

Get Jakarta Commons Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.