Log files and Excel

Let's consider a somewhat realistic use case where you have been provided a number of modified web log files that you want to create some visualizations from.

In Chapter 4, Addressing Big Data Quality, we will discuss data profiling (in regards to data quality), but for now, we'll assume that we know the following about our data files:

  • The files are of various sizes and somewhat unstructured.
  • The data in the files contain information logged by Internet users.
  • The data includes such things as computer IP addresses, a date, timestamp, and a web address/URL. There is more information in the files, but for our exercise here we really just want to create a graphical representation showing the number of times each web address was hit ...

Get Big Data Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.