class: center, middle # Programming for Professional Research Using R ## Session 2 ### April 3, 2025 --- class: middle ## Today - Learn how to: - Create a scatter plot, density plot, and bar chart using the `ggplot2` package - Create flexible and easy-to-read tables of any dataset using the `gt` package - Create simple academic-standard regression output tables using the `stargazer` package - Practice the above! --- class: center, middle ## Data Visualization — Descriptive Statistics — Plots --- ## Descriptive Stats Plots `ggplot2` is the gold standard in data visualization in data work. It's one of the main reason that people use R over other programming languages. Very simple syntax and allows you to add elements very easily. You can use `ggplot2` to create any type of plot you can think of. I've included a lot of links at the end of these slides to explore the possibilities of `ggplot2` further. Strongly recommend you use them or at least save them somewhere. <img src="pics/plot_example2.png" width="60%" style="display: block; margin: auto;" /> --- ### The Magic of `ggplot2` Using `ggplot2` to create plots is great because the **structure** it sets up makes plot creation intuitive. .pull-left-wide[ ``` r ggplot(data = <DATA>) + <GEOM_FUNCTION>( mapping = aes(<MAPPINGS>), stat = <STAT>, position = <POSITION> ) + <SCALE_FUNTION> + <FACET_FUNCTION> + <THEME_FUNCTION> ``` ] .pull-right-wide[ 1. `Data`: The data that you want to visualize 2. `Layers`: geom\_ and stat\_ → The geometric shapes and statistical summaries representing the data 3. `Aesthetics`: aes() → Aesthetic mappings of the geometric and statistical objects 4. `Scales`: scale_ → Maps between the data and the aesthetic dimensions 5. `Facets`: facet_ → The arrangement of the data into a grid of plots 6. `Visual themes`: theme() and theme_ → The overall visual defaults of a plot ] --- ### Scatter Plot — Step-by-Step .panelset[ .panel[.panel-name[Dataset] Start with a dataset you want to visualize ``` r head(mtcars) ``` ``` ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 ``` ] .panel[.panel-name[Convert to Plot] .pull-left-wide[ ``` r ggplot(mtcars) ``` ] .pull-right-wide[ <img src="session_2_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Add Something] .pull-left-wide[ ``` r ggplot(mtcars) + geom_point( aes(x = mpg, y = wt) ) ``` ] .pull-right-wide[ <img src="session_2_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] ] ] --- ### Scatter Plot — Make It Better .pull-left-wide[ ``` r ggplot(mtcars) + geom_point( aes( x = mpg, y = wt, color = factor(cyl) ), size = 6 ) + xlab("Miles/Gallon") + ylab("Weight") + scale_color_discrete( name = "# of Cylinders" ) + theme_minimal(base_size = 24) + theme( legend.position = "bottom" ) ``` ] .pull-right-wide[ <img src="session_2_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] --- ### Bar Plot — Step-by-Step .panelset[ .panel[.panel-name[Dataset] Start with a dataset you want to visualize ``` r mtcars_summary ``` ``` ## # A tibble: 3 × 2 ## cyl mpg ## <dbl> <dbl> ## 1 4 26.7 ## 2 6 19.7 ## 3 8 15.1 ``` ] .panel[.panel-name[Convert to Plot] .pull-left-wide[ ``` r ggplot(mtcars_summary) ``` ] .pull-right-wide[ <img src="session_2_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Add Something] .pull-left-wide[ ``` r ggplot(mtcars_summary) + geom_col( aes( x = cyl, y = mpg ) ) ``` ] .pull-right-wide[ <img src="session_2_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Fix Class Issue] .pull-left-wide[ ``` r ggplot(mtcars_summary) + geom_col( aes( x = factor(cyl), y = mpg ) ) ``` `cyl` categorizes cars by number of cylinders. Although the values are numbers, it is a **categorical** variable. We communicate this to `ggplot()` using the `factor()` function. ] .pull-right-wide[ <img src="session_2_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> ] ] ] --- ### Bar Plot — Make It Better .pull-left-wide[ ``` r ggplot(mtcars_summary) + geom_col( aes( x = factor(cyl), y = mpg, fill = factor(cyl) ) ) + xlab("# of Cylinders") + ylab("Miles/Gallon") + scale_y_continuous( limits = c(0, 30) ) + theme_minimal(base_size = 24) + theme( legend.position = "none" ) ``` ] .pull-right-wide[ <img src="session_2_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> ] --- class: middle name: plot-standards ## Plot Standards .content-box-blue[ 1. Your plot should be <font color = "red">**properly labeled**</font>: - The plot should have a title describing its content - Axes should be labeled - Legend (if any) should have a title and labels 2. Your plot should be <font color = "red">**properly formatted**</font>: - Axis dimensions should be appropriate. What is appropriate varies depending on context, but usually you should aim to fill the plot space with data - Text size should be large enough for text to be legible 3. Your plot should be <font color = "red">**self contained**</font>. People should be able to understand your plot and its data without any other context or explanatory text. That means: - A caption note that includes data source and any important data construction notes - Title and subtitle that deliver the plot's *message* ] --- class: center, middle ## Data Visualization — Descriptive Statistics — Tables --- ## Descriptive Statistics Tables Thankfully, not every RA position requires academic-standard tables or use of LateX. It is still useful, however, to be able to communicate descriptive statistics about data. There are countless R packages to help do this. Today, we're looking at the `gt` package. It's simple to use and it's very easy to create good-looking tables using it. `gt` exports into .png, .pdf, or .html. You can add interactive elements, plots within columns. <img src="pics/gt_table_example.png" width="50%" style="display: block; margin: auto;" /> --- ### Descriptive Statistics Table — Step-by-Step We will mainly use the example in the script for this. To summarize, the steps are: - Create a dataset you want to export - Run the dataset through the `gt()` function to create a gt object - Customize the table using functions from the `gt` package (see online for further things you can do). Examples of what you can do include: - Modify column names — `cols_label()` - Modify borders — `tab_style()`, `cell_borders()` - Add colors conditional on cell value — `data_color()` - Add title/subtitle — `tab_header()` - Export the table using `gtsave()` --- class: middle, center ## Data Visualization — Simple Regression Table --- ### Regression Tables Regression tables are very common in economic/policy analysis. They're very simple to create using R and a software called **LateX** (pronounced latek). Unless you're getting into academic research, you don't need to know how to properly use LateX. Just enough to: - Export the LateX script from R - Copy/paste it into a LateX-reading software, e.g. Overleaf - Export the pdf or png to share <img src="session_2_files/figure-html/unnamed-chunk-21-1.png" width="50%" style="display: block; margin: auto;" /> --- ### Regression Table — Step by Step .panelset[ .panel[.panel-name[Run Regression in R] ``` r # Simplest regression format in R reg_example <- lm( outcome_variable ~ independent_variable + control_variables, data = dataset ) # Observe results summary(reg_example) ``` ] .panel[.panel-name[Convert to Exportable Table] Simply do one of these! ``` r reg_example_ht <- huxtable::huxreg(reg_example) ``` OR ``` r reg_example_sg <- stargazer::stargazer(reg_example) # Many options to make prettier ``` ] ] --- ### Regression Table — Step by Step .panelset[ .panel[.panel-name[Export Huxtable Table] Some simple options for the Huxtable table: ``` r huxtable::quick_latex( reg_example_ht, file = "filepath/filepath/filepath/reg_example_ht.tex" ) huxtable::quick_pdf reg_example_ht, file = "filepath/filepath/filepath/reg_example_ht.pdf" ) huxtable::quick_html( reg_example_ht, file = "filepath/filepath/filepath/reg_example_ht.html" ) ``` ] .panel[.panel-name[Export Stargazer Table] ``` r # You can export a LateX script using the 'writeLines' function writeLines( reg_example_sg, "filepath/filepath/filepath/reg_example_sg.tex" ) ``` To visualize your table, the easiest solution is to: - Create a free Overleaf account on [overleaf.com](https://www.overleaf.com) - Open a new document - Copy/paste your .tex output in between the `begin{document}` and `end{document}` lines - Click compile and then save! You can also install the `tinytex` package and use `pdftolatex` to save a PDF file. ] ] --- class: center, middle ## Practical Exercise — Using the World Values Survey Dataset --- <font size='+3'><b>World Values Survey</b></font> <font size='+2'><b>Background</b></font> <br> <br> *"The survey, which started in 1981, seeks to use the most rigorous, high-quality research designs in each country. The WVS consists of nationally representative surveys conducted in almost 100 countries which contain almost 90 percent of the world’s population, using a common questionnaire. [...] WVS seeks to help scientists and policy makers understand changes in the beliefs, values and motivations of people throughout the world."* <font size='+2'><b>Survey Contents</b></font> .pull-left[ - Social values, attitudes & stereotypes - Societal well-being - Social capital, trust and organizational membership - Economic values - Corruption - Migration - Post-materialist index ] .pull-right[ - Science & technology - Religious values - Security - Ethical values & norms - Political interest and political participation - Political culture and political regimes - Demography ] --- ### Today's practical component 1. Download the required data for this session from [this Dropbox folder](https://www.dropbox.com/scl/fo/6m5hzlrc82i04oi0qoam7/h?rlkey=ctf6b0stve3vgbck9ka7mj5ia&st=x9y9ce88&dl=0) 2. Successfully run the code in the `session_2.R` script 3. Attempt the challenges at the bottom of the script! --- class: middle ## Links <ins>**Tables**</ins> Marek Hlavac, **[“stargazer: beautiful LATEX, HTML and ASCII tables from R statistical output”](https://cran.rproject.org/web/packages/stargazer/vignettes/stargazer.pdf)** Thomas Mock, **[“gt - a (G)rammar of (T)ables”](https://themockup.blog/posts/2020-05-16-gt-a-grammer-of-tables/)** <ins>**Plots**</ins> Alicia Horsch, **[“A quick introduction to ggplot2”](https://towardsdatascience.com/a-quick-introduction-to-ggplot2-d406f83bb9c9)** RStudio, **[RStudio Cheatsheets](https://www.rstudio.com/resources/cheatsheets/)**