Build Your Own Web Measurement Application: The Core Code

One thing that every web measurement application needs to deal with, regardless of price or sophistication, is stitching together multiple page views into a visit and assigning that visit to a unique visitor.

In “Build Your Own Web Measurement Application: An Overview and Data Collection” [Hack #12] , we saw how to write a small page tag script to record the visits to your web site. The program produced a logfile in this format:

   1104772080 192.168.17.32 /index.html?from=google http://www.google.com/
   search?q=widgets 192.168.17.32.85261104772101338
   1104772091 192.168.17.32 /products.html http://www.example.com/index.
   html?from=google 192.168.17.32.85261104772101338

In each line, the fields correspond to the time of the request, the client IP address, the page requested, the referring page, and the visitor’s cookie.

Now that we have such a logfile, what should we do with it? One possibility is to analyze it by using one of the existing logfile analyzer programs, as long as the program can be configured to read data in our nonstandard format. For example, you can read the file by using the free web measurement application Analog (www.analog.cx) [Hack #10] and supplying the command:

            LOGFORMAT %U\t%S\t%r\t%f\t%u

In this and subsequent “build your own” hacks we shall build a new program to read this logfile and produce a report. This will demonstrate the basics of what web analytics programs actually do under the hood. We will write ...

Get Web Site Measurement Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.