Chapter 5. Embedding Words and Types

When implementing natural language processing tasks, we need to deal with different kinds of discrete types. The most obvious example is words. Words come from a finite set (aka vocabulary). Other examples of discrete types include characters, part-of-speech tags, named entities, named entity types, parse features, items in a product catalog, and so on. Essentially, when any input feature comes from a finite (or a countably infinite) set, it is a discrete type.

Representing discrete types (e.g., words) as dense vectors is at the core of deep learning’s successes in NLP. The terms “representation learning” and “embedding” refer to learning this mapping from one discrete type to a point in the vector space. When the discrete types are words, the dense vector representation is called a word embedding. We saw examples of count-based embedding methods, like Term-Frequency-Inverse-Document-Frequency (TF-IDF), in Chapter 2. In this chapter, we focus on learning-based or prediction-based (Baroni et al., 2014) embedding methods, in which the representations are learned by maximizing an objective for a specific learning task; for example, predicting a word based on context. Learning-based embedding methods are now de jure because of their broad applicability and performance. In fact, the ubiquity of word embeddings in NLP tasks has earned them the title of the “Sriracha of NLP,” because you can utilize word embeddings in any NLP task and expect the performance ...

Get Natural Language Processing with PyTorch now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.