Chapter 8. Distributed Processing and Handling Large Datasets
In this chapter, we will cover the following recipes:
- Distributed tagging with execnet
- Distributed chunking with execnet
- Parallel list processing with execnet
- Storing a frequency distribution in Redis
- Storing a conditional frequency distribution in Redis
- Storing an ordered dictionary in Redis
- Distributed word scoring with Redis and execnet
Introduction
NLTK is great for in-memory, single-processor natural language processing. However, there are times when you have a lot of data to process and want to take advantage of multiple CPUs, multicore CPUs, and even multiple computers. Or, you might want to store frequencies and probabilities in a persistent, shared database so multiple processes can ...
Get Python 3 Text Processing with NLTK 3 Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.