This is a course in stylometry, or the analysis of countable linguistic features of texts. While stylometry has been usually associated with authorship attribution, the same methods are successfully applied to more general text analysis, and, recently, even analysis of other modes such as music and image. The statistics of such features as word, word n-gram or character n-gram frequencies, are not only a highly precise tool for identifying authorship, but can in fact reveal patterns of similarity and difference between various works by one author, works by various authors, finally between authors differing in terms of chronology, gender, genre or narrative styles, between translations of the same author or group of authors, or specific voices such as idiolects of characters in novels. This provides a new opening in literary studies, and the results of a stylometric analysis can be compared and confronted with the findings of traditional stylistics and interpretation. It also opens a new set of questions about style and its transfer, as well as the nature of particular features and language.
The participants of our course will learn major stylometric tools and methods, from simple keywords extraction to machine-learning classification based on text features, followed by visualization techniques ranging from dendrograms to networks. The participants will learn how to identify the problem, define relevant research questions, and design an experiment. We will use our own package written for the R statistical programming environment — ‘stylo’, which allows us to avoid R’s usually steep learning curve – we don’t expect advanced programming skills. We will provide text corpora to use for training purposes, but also hope and expect participant bring their own data and problems to work on.
This course combines lecture and hands-on activities. Consider this offering to build on: Fundamentals of Coding / Programming for Human(s|ists); Web Development / Project Prototyping for Beginners with Ruby on Rails; Out-of-the-Box Text Analysis for the Digital Humanities. Consider this offering in complement with and / or to be built on by: Geographical Information Systems in the Digital Humanities; Understanding Topic Modelling; Data Mining for Digital Humanists; and more!