Time for action – summarizing the shape data

Just as we provided a summarization for the overall UFO data set earlier, let's now do a more focused summarization on the data provided for UFO shapes:

  1. Save the following to shapemapper.rb:
    #!/usr/bin/env ruby
    
    while line = gets  
        parts = line.split("\t")    
        if parts.size == 6        
            shape = parts[3].strip     
            puts shape+"\t1" if !shape.empty?   
        end     
    end     
  2. Make the file executable:
    $ chmod +x shapemapper.rb
    
  3. Execute the job once again using the WordCount reducer:
    $ hadoop jar hadoop/contrib/streaming/hadoop-streaming-1.0.3.jarr --file shapemapper.rb -mapper shapemapper.rb -file wcreducer.rb -reducer wcreducer.rb -input ufo.tsv -output shapes
    
  4. Retrieve the shape info:
    $ hadoop fs -cat shapes/part-00000  
    

What just happened? ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.