Weekday | Date | Room | Start Time | End Time |
---|---|---|---|---|
Thursday | March 27, 2025 | 555 Penn, 656 | 6:00pm | 8:00pm |
Thursday | April 3, 2025 | 555 Penn, 656 | 6:00pm | 8:00pm |
Thursday | April 10, 2025 | 555 Penn, 656 | 6:00pm | 8:00pm |
Thursday | April 17, 2025 | 555 Penn, 656 | 6:00pm | 8:00pm |
Entry-level research and analysis positions in universities, government offices and contractors, think tanks, and multilateral institutions are increasingly expected to perform basic quantitative tasks using statistical software such as Stata, R, or Python. As data work has become near-ubiquitous in the policy world, so have basic tasks like aggregating, analyzing, summarizing, and visualizing data.
This course introduces you to statistical analysis programming using the R language. R is an open-source, statistical programming language used widely across a number of industries. This course will also aim to provide you with the foundation to continue to develop your knowledge and experience of R beyond its duration.
By the end of this course, students will be able to set up their own R environment and feel comfortable using R for simple data tasks in coursework, internships, or entry-level research/data positions. They will have the foundation to continue to learn by practicing R beyond this course.
In more detail, students will be introduced to the use of:
dplyr
’s select()
and mutate()
functions to explore and modify datasets;dplyr
’s group_by()
and
summarize()
functions to generate summary statistics;tidyr
’s pivot_longer()
and
pivot_wider()
functions to reshape data for easier
analysis;purrr
’s map()
and dplyr
’s
across()
functions to perform iterative coding;gt
and stargazer
packages to generate
HTML, PNG, or LateX summary tables;ggplot2
package to generate descriptive scatter
plots of data.Session | Description |
---|---|
I — Setting Up Your R Environment | - Introduction to Coding — Learn how to think as a coder, how to
identify the basic components of data analysis - Introduction to the RStudio Interface — Learn how to set up your environment to use R and RStudio - Troubleshooting R — How to identify and address basic errors in your R setup |
II — Visualization | - Creating Plots and Graphs — Learn how to create scatter and bar
plots using ggplot2 - Creating HTML and PDF Tables — Learn how to create shareable tables using gt |
III — Transformation | - The Building Blocks of R — Explore scalars, vectors, lists, and
tibbles in R - The Basic Verbs of R — Learn how to use mutate() , select() , filter() ,
group_by() , and summarize() - Tidy Data — Introduction to tidy datasets, pivot_longer() , and
pivot_wider() |
IV — Programming and Communication | - Programming in R — Learn about functions and iteration using
map and across in R |
Each two-hour session will be split into two halves. The first half (approx. one hour) will consist of an interactive lecture using slides and live coding. The second half (approx. one hour) will consist of practical exercises that the students will accomplish with my support.
The last two sessions will begin with multiple-choice questionnaires on the topic of the previous week’s content. At the end of the course, there will be an open-ended assignment in which the students will have the option to create a script, which I will then review and provide feedback.
As there will be no time for this in class, YOU NEED TO DO THE FOLLOWING BEFORE THE FIRST SESSION:
R-4.4.3-win.exe
.R-4.4.3-arm64.pkg
if you have a more recent MacBook
with an M1,2… chip.R-4.4.3.pkg
if you have an older MacBook with an Intel
chip.Note — Readings and resources below are optional and are provided for context and use after the course is finished. The session slides will cover everything needed for the course.
Hadley Wickham, Mine Çetinkaya-Rundel & Garrett Grolemund, R for Data Science (2e). This is the foundational textbook for use of the “Tidyverse” package suite in R.
RStudio, RStudio Cheatsheets. Cheatsheets to help perform basic data tasks in R.
Thomas Mock’s The Mockup Blog has a great array of tutorials for all levels. You’ll see posts from there below.
The World Bank DIME Wiki. A wiki with open-source articles on how to be a research assistant with the World Bank. Great insights into collaborative data work, reproducibility, and the responsibilities of an entry-level data researcher.
Thomas Mock, “gt - a (G)rammar of (T)ables”. Introduction to the gt package, a more flexible instrument to export tables in PNG, PDF, or HTML formats.
Marek Hlavac, “stargazer: beautiful LATEX, HTML and ASCII tables from R statistical output”. Vignette for the stargazer package, main tool to export regression tables to LateX
Dominic Royé, “A very short introduction to Tidyverse”. Blog post covering the basics of Tidyverse use in R.
Wickham, Çetinkaya-Rundel & Grolemund, R for Data Science (2e) Chapter 5 — Data Tidying. How to structure (“tidy”) your dataset for flexible use in data analysis.
tidyr, “Pivoting”. Vignette explaining how to reshape datasets using pivot_longer and pivot_wider.
Rebecca Barter, “Learn to purrr”. Blog post covering the basics of using iterative functions with the purrr package in R.
Hadley Wickham, “dplyr 1.0.0: working across columns”. Explains the basics for flexible column-wise operations using across in R.
xaringan
, a package that allows you to create slide
decks using R. Also explore the xaringanExtra
package.bookdown
or a blog using
blogdown
sf
package, commonly
used for geospatial work in R.sf
and ggplot2
to
visualize data using maps.For those interested in conducting data work in the development world: Kristoffer Bjarkefur, Luiza Cardoso de Andrade, Benjamin Daniels, and Maria Ruth Jones, Development Research in Practice — The DIME Analytics Data Handbook. A comprehensive account of tools and instruments to conduct quantitative development research.
For those looking for more hands-on, real-world data work: Ben Baldwin, “A beginner’s guide to nflfastR”. How to download and explore NFL play-by-play data. This is how I learnt how to use R. Further tutorials using this data can be found at the “Open Source Football” blog.
My name is Marc-Andrea Fiorina, and I am a research analyst at OpenResearch. Over the past six years, I worked as an intern, research assistant, and analyst using R for impact evaluations and economic research programs with Development Impact (DIME) at the World Bank. I hold a Bachelor of Arts (Hons.) in Philosophy, Politics, and Economics from the University of Oxford (2017) and a Master of Arts in International Politics and Economics (Bologna 2018, DC 2019) from Johns Hopkins University SAIS.
As a research assistant, I learnt how to work with data in a collaborative space and how to improve my coding language learning through continuous use and good practices. I hope to share those practices and resources with you through this course.