Server Log Analysis

Individual log records can be revealing but often even greater insights come from looking through access logs over a period of time and finding patterns in the data. There is a whole industry devoted to log analysis of large sites involved in news or e-commerce, trying to assess what visitors are most interested in, where they are coming from, how the server performs under load, and so on. I’m going to take a much simpler approach and use the tools that I have at hand to uncover some very interesting needles hidden in my haystack. Hopefully these examples will inspire you to take a closer look at your own server logs.

Googlebot Visits

Given that Google is such a powerful player in the field of Internet search, you might like to know how often they update their index of your site. To see how often their web robot, or spider, pays you a visit, simply search through the access log looking for a User-Agent called GoogleBot. Do this using the standard Unix command grep:

               % grep -i googlebot access_log | grep 'GET / ' | more

The first grep gets all GoogleBot page visits and the second limits the output to the first page of each site visit. Here is a sample of the output from my site:

 66.249.71.9 - - [01/Feb/2005:22:33:27 -0800] "GET / HTTP/1.0" 304 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)" 66.249.71.14 - - [02/Feb/2005:21:11:30 -0800] "GET / HTTP/1.0" 304 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)" 66.249.64.54 - - [03/Feb/2005:22:39:17 -0800] ...

Get Internet Forensics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.