O'Reilly logo

Statistics Hacks by Bruce Frey

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hack #65. Give Credit Where Credit Is Due

Stylometrics is a statistical procedure that identifies the underlying dimensions that define an author's style. It uses the method of factor analysis to judge who wrote what.

Professor Howe-Mutch had a problem. Two of his best students were sitting in his office, hoping to resolve a dispute. Dr. Howe-Mutch had awarded an A+ to Paul's final paper (on the historical importance of chocolate milk). The problem was that Lisa claimed to have written it. An accusation of plagiarism had been made! Both were good students who had written many quality papers for him in the past. So, the solution as to true authorship was not a simple one, nor was the realization that one of his favorite students was a cheat.

Fortunately, the good doctor of philosophy had many years of experience and was wiser than his adjunct position at State Community College and Trucking School might have suggested. Among other obscure statistical hobbies, Dr. Howe-Mutch dabbled in the art of stylometry, a statistical method for categorizing the style of written works. The method can also be used to identify anonymous authors. It works best when there are a couple of possibilities or suspects to choose from, and when the typical writing styles of the suspects are known and have been quantified. Let's watch as the broken-hearted professor applies these techniques to find the true author.

Building a Model

First, Dr. Howe-Mutch asks Paul and Lisa to bring in all the other papers they ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required