This chapter is a guide to Saptarshi Guha’s
RHIPE package, the R and Hadoop Integrated
development history dates back to 2009 and it is still actively maintained
by the original author.
Compared to R+Hadoop,
abstracts you from raw Hadoop but still requires an understanding of the
Since you covered a lot of MapReduce and Hadoop details in the previous two chapters, this chapter will have a very short route to the examples.
Motivation: You like the power of MapReduce, as explained in the previous chapter, but you want something a little more R-centric.
Solution: Use the
RHIPE R package as your Hadoop emissary. Even
though you’ll still have to understand MapReduce, you won’t have to
directly touch Hadoop.
Good because: You get Hadoop’s
power without leaving the comfy confines of R’s language and interactive
RHIPE even includes tools to
work with HDFS.) This means you can MapReduce through a mountain of data
during an interactive session of exploratory analysis.
RHIPE sits between you and
Hadoop. You write your Map and Reduce functions as R code, and
RHIPE handles the scut work of invoking Hadoop
To give you a quick example, here’s a typical
rhipe.job.def <- rhmr( map= ... block of R code for Mapper reduce= ... block of R code for Reducer ifolder="/path/to/input" , ofolder="/path/to/output" , ... a couple other RHIPE options ) rhex( rhipe.job.ref )
That’s it! There’s no ...