Filtering the access details

We'll look at several filters for the AccessDetails objects. The first is a collection of filters that reject a lot of overhead files that are rarely interesting. The second filter will be part of the analysis functions, which we'll look at later.

The path_filter() function is a combination of three functions:

  • Exclude empty paths
  • Exclude some specific filenames
  • Exclude files that have a given extension

An optimized version of the path_filter() function looks like this:

def path_filter(        access_details_iter: Iterable[AccessDetails]    ) -> Iterable[AccessDetails]:    name_exclude = {        'favicon.ico', 'robots.txt', 'index.php', 'humans.txt',        'dompdf.php', 'crossdomain.xml',        '_images', 'search.html', 'genindex.html', 'searchindex.js', ...

Get Functional Python Programming - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.