Analyzing web log data using Pig

In this recipe, we will learn how to use Pig scripts to analyze web log data.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Pig installed on it.

How to do it...

In the previous chapter, we saw how to analyze web logs using the MapReduce program. In this recipe, we are going to take a look at how to use Pig scripts to analyze web log data. Let's consider two use cases:

Here is a sample of web log data:

106.208.17.105 - - [12/Nov/2015:21:20:32 -0800] "GET /tutorials/mapreduce/advanced-map-reduce-examples-1.html HTTP/1.1" 200 0 "https://www.google.co.in/" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.