Chapter 8. Distributed Processing and Handling Large Datasets

In this chapter, we will cover the following recipes:

  • Distributed tagging with execnet
  • Distributed chunking with execnet
  • Parallel list processing with execnet
  • Storing a frequency distribution in Redis
  • Storing a conditional frequency distribution in Redis
  • Storing an ordered dictionary in Redis
  • Distributed word scoring with Redis and execnet

Introduction

NLTK is great for in-memory, single-processor natural language processing. However, there are times when you have a lot of data to process and want to take advantage of multiple CPUs, multicore CPUs, and even multiple computers. Or, you might want to store frequencies and probabilities in a persistent, shared database so multiple processes can ...

Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.