We can use
execnet together to do distributed word scoring. In the Calculating high information words recipe in Chapter 7, Text Classification, we calculated the information gain of each word in the
movie_reviews corpus using a
ConditionalFreqDist. Now that we have
Redis, we can do the same thing using a
RedisHashFreqDist and a
RedisConditionalHashFreqDist, and then store the scores in a
RedisOrderedDict. We can use
execnet to distribute the counting in order to get a better performance out of
execnet must be installed, and an instance of
redis-server must be running on localhost.
We start by getting a list of
(label, words) ...