The Importance of Partitioning the Data

I’ll let you in on a secret now. Partitioning the data is the key to parallelizing the program. process_all_years(), which we used to build the data store, is defined like this:

process_all_years() ->
[process_year(I) || I <- mail_years()].

process_year(Year) processes all the data for a specific year, and mail_years/0 returns a list of years.

To parallelize the program, all you have to do is change the definition of process_all_years and call pmap, which we talked about in Parallelizing Sequential Code. With this small change, our function looks like this:

process_all_years() ->
lib_misc:pmap(​fun​(I) -> process_year(I) ​end​, mail_years()).

There is an additional and less obvious benefit. ...

