This workshop introduces the basic infrastructure for statistical text processing in R using the quanteda package. We will focus on corpus construction and the exploratory data analysis process that should precede fitting statistical models. For this workshop we will assume documents are available in text format; a later workshop will address tools for text acquisition and web scraping.
The workshop will assume basic R competence.
This workshop is for anyone interested in working with small to medium sized bodies of text, that is, small enough to fit in memory but too large to work with individually, e.g. hundreds of thousands of newspaper articles.
If you are not a politics graduate student, please send email to [email protected] that you are planning to attend, so we can ensure enough space in the room.