Processing many large files

Here is an example of a multiprocessing application. We'll scrape Common Log Format (CLF) lines in web log files. This is the generally used format for web server access logs. The lines tend to be long, but look like the following when wrapped to the book's margins:

99.49.32.197 - - [01/Jun/2012:22:17:54 -0400] "GET /favicon.ico  HTTP/1.1" 200 894 "-" "Mozilla/5.0 (Windows NT 6.0)  AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.52  Safari/536.5"

We often have large numbers of files that we'd like to analyze. The presence of many independent files means that concurrency will have some benefit for our scraping process.

We'll decompose the analysis into two broad areas of functionality. The first phase of ...

Get Functional Python Programming - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.