LLMs from Prompts to Pipelines for Text & Media Analysis & Creativity (DHSI 2026)

Format
in person/face-à-faceEvent Language
EnglishDescription
The course offers an effective hands-on intro and further deployable deliverables in large-language-model (LLM) deployment and adaptation, natural language processing (NLP), text and media analysis, and text and/or media corpus network visualization and analysis.
We will harness the power and amplitude of LLMs and other computing resources in analyzing single/discrete datums as well as big data and corpora, be they text or media or both. The skills, affordances, methods, and concepts will be paced and assembled into pipelines starting from locating, collecting/scraping, and (pre)processing relevant datasets, continuing by deploying/engineering best-fit LLMs and specialized libraries and developing algorithms for multi-feature data analysis, and culminating with fine-grained holistic networked assemblages modeling and scrutinizing the datasets in depth and comparatively across corpora and media.
We will be doing coding in Python and learning how to use (and compare) transformer-based, (sub)word, text, and media modeling open-source LLMs/frameworks such as GPT (3.5., 4.1., 5, and later), Mistral, (Nomic/Meta-)LLaMa, GPT-NeoX, T5, OLMo, (M)BERT, and a host of others in concurrence with a wide-range of relevant libraries and related APIs including Scikit-learn, NLTK, Sentence-Transformers, Hugging-Face-based models, FastText, and Stanza/SpaCy (displaCy), involving embeddings with text classifiers and/or image/video/audio vectorization, e.g.., Deep Learning architectures, CLIP, MediaPipe, TensorFlow & Keras, Pytorch, LibROSA, etc. In the context, we will learn how to zero/few-shot prompt, fine-tune or train our (own) LLMs and incorporate them into our task-specific Python pipelines.
After using BeautifulSoup, Selenium, and pytesseract (Python-tesseract) to automatically collect and (if needed) OCR our data, the subsequent computational analyses will be translated to networks ranging from plain (single-layer) graphs to multiplexes to most general multilayer networks to be visualized and/or analyzed by means of NetworkX or, in the more specific or complex cases, in-house/indie algorithms. The translation to networks will also involve correlations between various forms of vectorization applied to text (and/as inter)media as coexistent in or combined into modeling the data. The LLMs called into and adapted for our scripts and environments will make the difference in critical respects such as dynamic data curation and searching, trans-quantitative and/(f)or qualitative analysis, finesse-level processing and mega-scale coverage.
On the fifth day (Friday, June 19th), everybody will have the opportunity to participate in the #GraphPoem event, an intermedia social computing and data-commoning performance drawing on the algorithms, methods, and programming presented or developed in class.
The knowledge and skills acquired—alongside our in-class applications—will be useful in education, research, and analytical-creative work involving LLM-informed coding, NLP, automated text and (mono and multilingual) corpus analysis, network science (or graph theory) applications, inter/trans-disciplinary text (and) media studies, computational literary studies/analysis/criticism, computational linguistics, multimodal and intermedia(lity) studies and creativity, HCI & AI creative writing and experimental/intersemiotic/literary translation, digital editions, digital poetry/e-lit/digital art, social (media/network) analysis, complexity studies in/and social science, and applications in the philosophy of mathematics and computation.
Instructor(s)
Chris Tănăsescu is a poet and academic with backgrounds in English and computer science. The Graph Poem project he started 15 years ago has outputted natural language processing and network science-based poetry classifiers, intermedia performances, and computationally assembled poetry anthologies. His alias MARGENTO refers to a cyber cross-artform ensemble and international coalition of poets-translators, visual artists/musicians, and coders/AI throwing events and launching publications on and off-line in four continents since 2001 and at DHSI (#GraphPoem) since 2019. Chris is currently a DH Research Scientist at the University of Galway while continuing his affiliation as Senior Researcher in Global Literary Studies and Complex Systems at Universitat Oberta de Catalunya. Previous or ongoing positions and affiliations include Coordinator of Digital Humanities at the University of Ottawa, Altissia Chair in Digital Cultures and Ethics at Université Catholique de Louvain, and Visiting Scholar at the Electronic Textual Cultures Lab, University of Victoria. His latest publications include Literature and Computation (Routledge 2024) and A Computationally Assembled Anthology of Contemporary Belgian Poetry [MARGENTO, collective ed.] (co-edited with Raluca Tanasescu, featuring John Taylor as main translator, Peter Lang 2025).
Click here for an example of previous syllabus and course material (2025)
