Comparing Pages

In the case of phishing sites, the fake bank login page that you are directed to by the original email will have been copied from the real bank web site. The person behind the scam will then have added a HTML form or a link to another page that will ask for your account information. An easy way to see what has been added to the page is to download the version from the real bank site, compare the files, and look at the differences.

For this, you can use the standard Unix command diff to compare the two files, line by line. Lines that differ are output and identical lines are ignored. If consecutive lines differ in the two files, then these are output as two blocks, rather than pairs of individual lines.

The amount of whitespace at the start and end of lines can vary between similar files, downloaded from different sources. Perhaps this is a function of the browser that was used or the subsequent editing of the content. This can cause diff to report all lines as being different, which is not what you want. The -b option causes diff to ignore whitespace.

Here is an example of its output on a fake login page for http://keybank.com and the equivalent real page. The output has been edited down for the sake of readability.

            % diff -b fake.html real.html 7c7 < <link rel="stylesheet" href="http://accounts.keybank.com//ib2/ css/kco2obi.css" type="text/css" media="all" /> --- > <link rel="stylesheet" href="/ib2/css/kco2obi.css" type="text/css" media="all"/> 46c46 < <a href="login.htm?requester=signon" ...

Get Internet Forensics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.