Cover by Oliver Sturm

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

IMPLEMENTING MAPREDUCE

The example in this chapter will break things down to the point where standard Map and Reduce functions from FCSlib can be used. There is no automatic parallelization going on, but because you know the functions are all functionally pure, they could be parallelized easily enough if all the issues of data exchange, distributed node management, and so forth are solved.

The first sample counts words, so it starts with a piece of text:

const string hamlet = @"Though yet of Hamlet our dear brother's death

The memory be green, and that it us befitted

To bear our hearts in grief and our whole kingdom

To be contracted in one brow of woe,

...

To business with the king, more than the scope

Of these delated articles allow.

Farewell, and let your haste commend your duty.";

Step 1 is the mapping of data:

var pairs = Functional.Collect(

  text => Functional.Map(

    word => Tuple.Create(word, 1),

    text.Split(new[] { " ", Environment.NewLine },

      StringSplitOptions.RemoveEmptyEntries)),

  new[] { hamlet });

There are two different Map calls here, so things are a bit confusing. The outer one is called Collect, which is an extension of the standard Map function: it assumes that each iteration of the source list produces not just a single element of output, but a list of items, and so it concatenates all the resulting sublists into one whole result before returning it. The function Collect and the helper Concat, which make this possible together, are here:

public ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required