What is a phonetic encoding?

We can convert a word into a representation of its pronunciation. Of course, this might vary by accents, and by the conversion technique as well.

Yet, over time, two or three popular ways have emerged so that we can do this. Each of these methods takes a single string and returns a coded representation. I encourage you to Google each of these terms:

  • American Soundex (the 1930s): Implemented in popular database software such as PostgreSQL, MySQL, and SQLite
  • NYSIIS (New York State Identification and Intelligence System) (the 1970s)
  • Metaphone (the 1990s)
  • Match rating codex (the early 2000s)

Let's take a quick preview of the same:

jellyfish.soundex('Jellyfish')# 'J412'

For NYSIIS, we will use the following:  ...

Get Natural Language Processing with Python Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.