Chapter 8. Distributed Processing and Handling Large Datasets

In this chapter, we will cover the following recipes:

Distributed tagging with execnet
Distributed chunking with execnet
Parallel list processing with execnet
Storing a frequency distribution in Redis
Storing a conditional frequency distribution in Redis
Storing an ordered dictionary in Redis
Distributed word scoring with Redis and execnet

Introduction

NLTK is great for in-memory, single-processor natural language processing. However, there are times when you have a lot of data to process and want to take advantage of multiple CPUs, multicore CPUs, and even multiple computers. Or, you might want to store frequencies and probabilities in a persistent, shared database so multiple processes can ...

Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Natural Language Processing: Python and NLTK by Nitin Hardeniya, Jacob Perkins, Deepti Chopra, Nisheeth Joshi, Iti Mathur

Chapter 8. Distributed Processing and Handling Large Datasets

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly