The Importance of Partitioning the Data

I’ll let you in on a secret now. Partitioning the data is the key to parallelizing the program. process_all_years(), which we used to build the data store, is defined like this:

 
process_all_years() ->
 
[process_year(I) || I <- mail_years()].

process_year(Year) processes all the data for a specific year, and mail_years/0 returns a list of years.

To parallelize the program, all you have to do is change the definition of process_all_years and call pmap, which we talked about in Parallelizing Sequential Code. With this small change, our function looks like this:

 
process_all_years() ->
 
lib_misc:pmap(​fun​(I) -> process_year(I) ​end​, mail_years()).

There is an additional and less obvious benefit. ...

Get Programming Erlang, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.