During the PAN 2012 Lab –a competition to test various programs on the 1.5 million Enron emails released to the public to determine authorship correctly identified the author of 70% of the emails tested.
View hereAuthorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. It is an important problem not only in information retrieval but in many other disciplines as well, from technology to teaching and from finance to forensics. The idea that authors have a statistical "fingerprint'' that can be detected by computers is a compelling one that has received a lot of research attention. Authorship Attribution surveys the history and present state of the discipline, presenting some comparative results where available. It also provides a theoretical and empirically-tested basis for further work. Many modern techniques are described and evaluated, along with some insights for application for novices and experts alike. Authorship Attribution will be of particular interest to information retrieval researchers and students who want to keep up with the latest techniques and their applications. It is also a useful resource for people in other disciplines, be it the teacher interested in plagiarism detection or the historian interested in who wrote a particular document.
View hereToday’s incoming students are more likely to be exposed to Java than ever before. Focusing on a modern architecture (the Java Virtual Machine, or JVM), this text provides a thorough treatment of the principles of computer organization in the context of today’s portable computer. Students are given simple but realistic examples to gain a complete understanding of how computation works on such a machine. Juola makes the material useful and relevant in a course that is often difficult for second-year CS students.
View hereAn asylum-seeker sought to establish his case on the basis of anonymously published news articles that he claimed to have written. Unfortunately, original manuscripts of these documents were not available for traditional (physical-document-based) analysis. Using statistical linguistics, we were able to analyze the writing style against an ad-hoc collection of distractor authors and to establish using non-parametric rank-order statistics that the documents had indeed been written by the seeker
View hereTraditional document analysis can fail when there is no traditional document, as in blog posts, email, or Word files. We describe an emerging forensic discipline of “stylometry,” the analysis of writing style with an eye to identifying or profiling the writer of a document. We describe the theory, methods, strengths, and weaknesses of this important subfield of forensic science, with an eye to practical applications.
View hereAs an expert in computational and forensic linguistics, I have reviewed the alleged Heartland memo to determine who the primary author of the report is, and more specifically whether the primary author was Peter Gleick or Joseph Bast. I conclude, based on a computational analysis, that the author is more likely to be Gleick than Bast.
View here