Word embeddings are calculated by using a neural network built specifically for the task. I'll cover an overview of that network here. Once the word embeddings for some corpora are calculated, they can be easily reused for other applications, so that makes this technique a candidate for transfer learning, similar to techniques we looked at in Chapter 8, Transfer Learning with Pretrained CNNs.
When we're done training this word-embedding network, the weights of the single hidden layer of our network will become a lookup table for our word embeddings. For each word in our vocabulary, we will have learned a vector for that word.
This hidden layer will contain fewer neurons than the input space, forcing ...