This course will introduce you to many techniques available to process, analyze, and visualize textual data with Python. You will learn the fundamental theories and methods used in Natural Language Processing (NLP) by writing code. We will begin with a swift introduction to Python syntax and Jupyter Notebooks, learning what we need to know to be effective in the course. We will emphasize Python’s built-in capabilities for handling text as we transition into using many of the most popular Python packages for NLP, including the Natural Language ToolKit (NLTK). The NLTK is a large library of tools and resources that will allow us to conduct part-of-speech tagging, sentiment analysis, entity recognition, and text classification. Because of its extensive documentation, NLTK remains an ideal choice for researchers interested in showing proof of work through citation and reproducibility. We will use other packages for Machine Learning (ML) tasks, such as Gensim for topic modeling and Stanza for multi-language capabilities and access to contemporary ML language models. We will learn to visualize our findings beautifully with packages such as Networkx, Seaborn, and Bokeh. Experience with Python is not strictly required for participation in the class, but a general understanding of programming methods and terms will be an asset. This class will help you think about humanities problems through computation. By the end of our time together, you will understand the kinds of questions we can answer with NLP methods and be ready to implement them in code.
This is a hands-on course with some lecture components. Consider this offering to be built on by and/or in complement with Fundamentals of Programming/Coding for Human(s|ists), Wrangling Big Data for DH, Out-of-the-Box Text Analysis for the Digital Humanities, Text Processing – Techniques & Traditions, Visualizing Information: Where Data Meets Design, Web APIs with Python, Parsing and Writing XML with Python, and more!