Speaker
Details
This workshop shows you how to extract information from web pages programmatically from R. We start by showing how to locate information in a parsed html document using CSS selectors, how to fill in simple and more complex web forms to generate lists of pages to parse, and how to use website's APIs when they are provided.
Prerequisites
The workshop assume an intermediate level of R, some basic familiarity with creating functions and using lapply. We will use the packages rvest, httr and jsonlite, but these need not be installed beforehand.
Audience
This workshop is open to anyone interested in scraping data from the web. If you are not a politics graduate student, please send email to [email protected] that you are planning to attend, so we can ensure enough space in the room.
Materials: web-scraping.zip