CS 480 Python Lab

Assigned: 8 November 2017   Due: 17 November 2017

Note: This lab must be completed using Python 2.7 rather than Python 3.

In this assignment you will experiment with and extend some existing Python code to do stylometry. Stylometry is the statistical analysis of a written text to try to identify characteristics of the author's writing style and (perhaps) to identify the authorship of anonymous or disputed works.

You will start with my code in the file mytext.py which is based on the text.py file from the textbook authors. This code in turn requires the files agents.py, logic.py, probability.py, search.py and utils.py. I suggest that you download all of these files to a new directory. In this same directory you can also download the text files that you can explore: bleakhouse.txt by Charles Dickens, and moonstone.txt by Wilkie Collins. flatland.txt by Abbott, sense.txt (aka Sense and Sensibility) by Jane Austen and pride.txt (aka Pride and Prejudice) also by Jane Austen. The file secret.txt contains a text whose authorship you will try to identify.

After downloading all of these files and "compiling" mytext.py you can look at the function mycode() toward the bottom of this file. This has my reasonably well documented code that does various stylometric analyses of the texts Flatland, Sense and Sensibility and Pride and Prejudice. You might look at my effort as trying to show that Pride is more likely to be written by the author of Sense than by the author of Flatland. We will discuss this code more in class.

Your assignment is to do stylometric analyses to compare the works of Dickens and Collins, then try to identify the author of the unknown text. You should write up a short document that describes (in English, perhaps laced with code) the tests that you ran and the conclusion that you have come to. I expect this document will amount to no more than a page or two, and it is entirely likely that you won't come any any definitive conclusions. But I want you to try to base your decision on evidence.

You should hand in a paper copy of your document that discusses your results and conclusions.