We'll look at several filters for the AccessDetails objects. The first is a collection of filters that reject a lot of overhead files that are rarely interesting. The second filter will be part of the analysis functions, which we'll look at later.
The path_filter() function is a combination of three functions:
- Exclude empty paths
- Exclude some specific filenames
- Exclude files that have a given extension
An optimized version of the path_filter() function looks like this:
def path_filter( access_details_iter: Iterable[AccessDetails] ) -> Iterable[AccessDetails]: name_exclude = { 'favicon.ico', 'robots.txt', 'index.php', 'humans.txt', 'dompdf.php', 'crossdomain.xml', '_images', 'search.html', 'genindex.html', 'searchindex.js', ...