Out of the Box Text Analysis

This class will focus on using digital tools to enhance and deepen traditional ways of reading and analyzing texts. We will explore ways of answering questions about authorship, textual, chronological, and authorial style, genre, and meaning. The first sessions will introduce some freely-available tools and some widely available general software, and will address the issues of planning a project, and finding/creating and preparing the texts for analysis. We will begin with some prepared groups of texts for guided investigation as a group, so that we can concentrate on general problems, issues, and opportunities. Because my own background is in literature, the emphasis will be on literary texts. In later sessions, participants will be able to use these tools (and perhaps others, depending on their interests) to explore texts of their own choosing, or to examine some already-prepared sets of texts in greater detail and depth. The backgrounds and experiences of the participants will undoubtedly differ; therefore, we will aim for an intensely collegial and collaborative atmosphere, so as to capitalize on these differences.

Most of the tools and methods work across different languages, though there may be some problems with transliterated and accented languages, and there is a good deal of variation in how effective different techniques are for different languages. Most also require a substantial amount of text–either one long text or at least several texts of 1000 words or more. On the other hand, this class will focus on relatively detailed and intensive analysis, and is not appropriate for those who are interested in working with huge data sets or very large numbers of very long texts. For the purposes and methods of this class, a set of 100 novels should be considered a very large amount of data.

We will be meeting in a computer lab where all the software used will be available, though most of it can easily be installed and run on students’ own computers, if they want to. Much of the work will be done in Stylo and in tools that operate in Microsoft Excel. Potential participants whose own computers are Macs and/or who have specific (groups of) texts or kinds of problems in mind that they would like to work on in the class can contact the instructor to discuss any potential difficulties or challenges.

This is a hands-on course. Consider this offering to build on, or be built on by: Stylometry with R: Computer-Assisted Analysis of Literary Texts; Extracting Cultural Networks from Thematic Research Collections; or Wrangling Big Data for DH. Consider this offering in complement with Fundamentals of Programming/Coding for Human(s|ists); Text Analysis with Python and the Natural Language ToolKit; Geographical Information Systems in the Digital Humanities; Understanding the Pre-Digital Book; XPath for Processing XML and Managing Projects; and more!


