How do we handle Out Of Vocabulary words?

The authors of word2vec (Mikolov et al.) extended it to create fastText at Facebook. It works on character n-grams instead of entire words. Character n-grams are effective in languages with specific morphological properties.

We can create our own fastText embeddings, which can handle OOV tokens as well.

Get Natural Language Processing with Python Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.