Weekday | Date | Room | Start Time | End Time |
---|---|---|---|---|
Wednesday | September 10, 2025 | 555 Penn, B230 | 6:00pm | 8:00pm |
Wednesday | September 17, 2025 | 555 Penn, B230 | 6:00pm | 8:00pm |
Wednesday | September 24, 2025 | 555 Penn, B230 | 6:00pm | 8:00pm |
Wednesday | October 1, 2025 | 555 Penn, B230 | 6:00pm | 8:00pm |
Entry-level research positions in universities, government offices, think tanks, and multilateral institutions are increasingly expected to perform basic quantitative tasks using statistical software such as Stata, R, or Python. As data work has become near-ubiquitous in the policy world, so have basic tasks like aggregating, analyzing, summarizing, and visualizing data.
This course introduces you to statistical analysis programming using the R language. This course will also aim to provide you with the foundation to continue to develop your knowledge and experience of R beyond its duration.
By the end of this course, students will be able to set up their own R environment and feel comfortable using R for simple data tasks in coursework, internships, or entry-level research/data positions. They will have the foundation to continue to learn by practicing R beyond this course.
In more detail, students will be introduced to the use of:
By the end of this course, students will be familiar with the instruments they are expected to use in entry-level professional research positions. They will also have resources to further their education in R past the course.
In more detail, students will be introduced to the use of:
dplyr
’s select()
and mutate()
functions to explore and modify datasets;dplyr
’s group_by()
and
summarize()
functions to generate summary statistics;tidyr
’s pivot_longer()
and
pivot_wider()
functions to reshape data for easier
analysis;gt
and stargazer
packages to generate
HTML, PNG, or LateX summary tables;ggplot2
package to generate descriptive scatter
plots of dataStudents will also be taught how to clean raw data by:
Session | Description |
---|---|
I — Setting Up Your R Environment | - Introduction to Coding — Learn how to think as a coder, how to
identify the basic components of data analysis - Introduction to the RStudio Interface — Learn how to set up your environment to use R and RStudio - Troubleshooting R — How to identify and address basic errors in your R setup |
II — Visualization | - Creating Plots and Graphs — Learn how to create scatter and bar
plots using ggplot2 - Creating HTML and PDF Tables — Learn how to create shareable tables using gt |
III — Transformation | - The Building Blocks of R — Explore scalars, vectors, lists, and
tibbles in R - The Basic Verbs of R — Learn how to use mutate() , select() , filter() ,
group_by() , and summarize() - Tidy Data — Introduction to tidy datasets, pivot_longer() , and
pivot_wider() |
IV — Programming and Communication | - Programming in R — Learn about functions and iteration using
purrr in R - Using R Markdown — Introduction to communicating process and results using R Markdown |
Note — The course slides, syllabus, and other resources will be shared at the following link: https://mfiorina.github.io/sais_r_course/.
Each two-hour session will be split into two halves. The first half (approx. one hour) will consist of an interactive lecture using slides and live coding. The second half (approx. one hour) will consist of practical exercises that the students will accomplish with my support.
The last two sessions will begin with multiple-choice questionnaires on the topic of the previous week’s content. At the end of the course, there will be an open-ended assignment in which the students will have the option to create a script, which I will then review and provide feedback.
As there will be no time for this in class, YOU NEED TO DO THE FOLLOWING BEFORE THE FIRST SESSION:
R-4.5.1-win.exe
.R-4.5.1-arm64.pkg
if you have a more recent MacBook
with an M1 chip.R-4.5.1.pkg
if you have an older MacBook with an Intel
chip or run an older iOs.Note — Readings and resources below are optional and are provided for context and use after the course is finished. The session slides will cover everything needed for the course.
Hadley Wickham & Garrett Grolemund, R for Data Science. This is the foundational textbook for use of the “Tidyverse” package suite in R.
RStudio, RStudio Cheatsheets. Cheatsheets to help perform basic data tasks in R.
Thomas Mock’s The Mockup Blog has a great array of tutorials for all levels. You’ll see posts from there below.
The World Bank DIME Wiki. A wiki with open-source articles on how to be a research assistant with the World Bank. Great insights into collaborative data work, reproducibility, and the responsibilities of an entry-level data researcher.
ggplot2
package, the main instrument for plot creation in
R.Thomas Mock, “gt
- a (G)rammar of (T)ables”. Introduction to the gt
package, a more flexible instrument to export tables in PNG, PDF, or
HTML formats.
Marek Hlavac, “stargazer:
beautiful LATEX, HTML and ASCII tables from R statistical output”.
Vignette for the stargazer
package, main tool to export
regression tables to LateX
Dominic Royé, “A very short introduction to Tidyverse”. Blog post covering the basics of Tidyverse use in R.
Wickham and Grolemund, R for Data Science Chapter 12 — Tidy Data. How to structure (“tidy”) your dataset for flexible use in data analysis.
tidyr, “Pivoting”.
Vignette explaining how to reshape datasets using
pivot_longer
and pivot_wider
.
Hadley Wickham, “dplyr
1.0.0: working across columns”. Explains the basics for flexible
column-wise operations using across
in R.
Rebecca Barter, “Learn to
purrr”. Blog post covering the basics of using iterative functions
with the purrr
package in R.
Garrett Grolemund, “Introduction to R Markdown”. Blog post covering the basics of using R Markdown in R.
purrr
’s map
function is your friend. I
recommend Thomas Mock, “Functional
programming in R with Purrr” to get you started.xaringan
, a package that allows you to create slide
decks using R. Also explore the xaringanExtra
package.bookdown
or a blog using
blogdown
sf
package, commonly
used for geospatial work in R.sf
and ggplot2
to
visualize data using maps.For those interested in conducting data work in the development world: Kristoffer Bjarkefur, Luiza Cardoso de Andrade, Benjamin Daniels, and Maria Ruth Jones, Development Research in Practice — The DIME Analytics Data Handbook. A comprehensive account of tools and instruments to conduct quantitative development research.
For those looking for more hands-on, real-world data work: Ben Baldwin, “A beginner’s guide to nflfastR”. How to download and explore NFL play-by-play data. This is how I learnt how to use R. Further tutorials using this data can be found at the “Open Source Football” blog.
My name is Marc-Andrea Fiorina, and I am a public policy research analyst, most recently at OpenResearch. Over the past six years, I worked as an intern, research assistant, and analyst using R for impact evaluations and economic research programs with Development Impact (DIME) at the World Bank. I hold a Bachelor of Arts (Hons.) in Philosophy, Politics, and Economics from the University of Oxford (2017) and a Master of Arts in International Politics and Economics (Bologna 2018, DC 2019) from Johns Hopkins University SAIS.