Installation

Having R and R Studio on your laptop will allow you to work on problem sets and explore the magnificent functionality of R outside the lab. R is the language and R Studio helps us interact with R. It is important that you install R before you install R Studio.

Install R

In your web browser,

  • Go to r-project.org

  • Click download R.

  • At this point, you’ll be directed to a page with a list of institutions that host the Comprehensive R Archive Network (CRAN). The idea is to pick an institution near you. Scroll down to USA and click the link for OSU.

Windows Instructions: If you have a Windows machine,

  • Click Download R for Windows

  • Click install R for the first time

  • Click Download R 4.1.2 for Windows

  • To complete installation, run the .exe file you downloaded.

Mac Instructions: If you have a Mac,

  • Click Download R for (Mac) OS X

  • Under “latest release”, click R-4.1.2.pkg if your mac has Intel chip

  • Click R-4.1.2-arm64.pkg if your mac has Apple silicon chip

  • To complete installation, run the .pkg file you downloaded.

Make sure you download different package depending on the type of chip your Mac has.

Linux Instructions: If you run a Linux distro, note that installation instructions vary by distro. That said, you probably know what you’re doing.

Install RStudio

In your web browser, go to rstudio.com/products/rstudio/, scroll down to R Studio Desktop, and then click Download RStudio Desktop under “Open Source Edition.” Scroll down to “Installers for Supported Platforms” and click the link that corresponds with your operating system. To complete installation, run the installer you downloaded.

R Basics

Mechanism that runs RStudio

RStudio is an integrated development environment (IDE) for R. In other words, RStudio is an environment that provides a set of necessary tools for programmers to easily write and execute codes. Because RStudio is just an environment, it won’t run any codes if R is not installed prior to installation of RStudio. Note that technically, all the codes written in scripts run on R, and not RStudio.

Using the console

When you open R Studio for the first time, you should notice three panels. The large panel to the left is the console. This is where you run code that tells R what to do. You can also use the console as a calculator. For example, if you type 5+5*2-1 in the console and hit Enter, then R will return

## [1] 14

in the console.

Everything is an object

The upper-right panel is the global environment. This is where R Studio stores datasets, user-defined functions, and other objects.

To define an object, you use the assignment operator <- or simply =.1 For example, suppose that you want to assign the number 5 to an object called a. In the console, you would type

a <- 5

which reads “a gets 5.” When you execute this code (by hitting Enter), a will show up in the global environment. Hovering your cursor over a in the global environment tells you that a is a numeric object.

There are other kinds of objects, too. For example,

b <- "I Love Metrics"

is a character object, and

mat <- matrix(c(1, 2, 3, 4),
              nrow = 2)

is a matrix.

Packages

R functions come in packages. When you open a fresh R session in RStudio, a number of packages come pre-loaded. These include packages with common math and statistics functions and are known collectively as “base R.” Base R is wonderful, but non-default packages offer a great deal of flexibility and functionality.

You could consider a package to be a combined set of multiple packages, or multiple data sets and functions defined by other coders in the R community. For example, one common package that we will use often later in the class is tidyverse package. In this package, multiple functions that are useful for data analysis are predefined so that we could simply call a specific function name to use it.

Install a package: install.packages("package.name.here")

  • Replace package.name.here with the name of the package you want to install.
  • Make sure the name of the package is in quotes.
  • You only have to install a package once.

Alternatively, you can click on the Packages tab of the bottom-right panel:

Load a package: library(package.name.here)

  • To use functions from a non-base package, you need to load the package at the beginning of each new R session.
  • Quotes around the package name are no longer needed.

pacman

We will often need to load several packages in a single session. One way to do this is to execute library(package.1), then library(package.2), then library(package.3), and so forth. A less cumbersome way to load multiple packages is to use the p_load function from the pacman package.

  • Use the instructions above to install pacman.
  • Load the pacman package with library(pacman).
  • When you want to load additional packages, you can then execute p_load(package.1, package.2, package.3).
  • Added bonus: p_load first checks to see if the packages are installed. If they aren’t, then it will install them for you.

R scripts

To produce reproducible2 R code, it is best to use scripts. Open a new R script file with the .R extension by clicking File then New File then R Script. We will write our first script to generate a histogram and scatter plot using ggplot2.

For your exercises, you are going to create and submit the knitted version of R scripts.

ggplot2

Start by writing code to install and load ggplot2.

library(pacman)
p_load(ggplot2)
  • To learn more about the package or the function of your using, simply put ? in front of the name of the package/function with no space in between the question mark and the name of the package/function.
  • To execute one line of your script, click the line you want to run and then click Run at the upper-right corner of your R script. A quicker alternative is to click the line you want to run and then use the keyboard shortcut Ctrl Enter.
  • To execute the entire script, click Source at the upper-right corner of your R script or use the keyboard shortcut Ctrl Alt R. You could adjust these hot keys per your taste by selecting Tools>Modify Keyboard Shortcuts....

Aside: It is useful to leave comments in your code to explain to your future self what your code is doing and why. You can leave a comment by typing a hash #. Notice also that when you put # in front of a code, R would skip this line and not run that code. This is why when you want to leave a comment, you would want to put a hash # in front of it so that R would keep it from running. :

# This is a comment. R will ignore it.

Check out the example dataset midwest from ggplot2. You can view the first few rows of the dataset with variable names by using the head function.

head(midwest)
## # A tibble: 6 × 28
##     PID county   state  area poptotal popdensity popwhite popblack popamerindian
##   <int> <chr>    <chr> <dbl>    <int>      <dbl>    <int>    <int>         <int>
## 1   561 ADAMS    IL    0.052    66090      1271.    63917     1702            98
## 2   562 ALEXAND… IL    0.014    10626       759      7054     3496            19
## 3   563 BOND     IL    0.022    14991       681.    14477      429            35
## 4   564 BOONE    IL    0.017    30806      1812.    29344      127            46
## 5   565 BROWN    IL    0.018     5836       324.     5264      547            14
## 6   566 BUREAU   IL    0.05     35688       714.    35157       50            65
## # … with 19 more variables: popasian <int>, popother <int>, percwhite <dbl>,
## #   percblack <dbl>, percamerindan <dbl>, percasian <dbl>, percother <dbl>,
## #   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
## #   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
## #   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
## #   percelderlypoverty <dbl>, inmetro <int>, category <chr>

Next, make a histogram of county poverty rates (measured by the variable percbelowpoverty) using the ggplot function. You will need to tell ggplot

  • The name of the dataset.
  • The name of the variable.
  • The type of graph (in this case, geom_histogram()).
ggplot(data = midwest, aes(x = percbelowpoverty)) + 
  geom_histogram()

To visualize relationships between variables, you can make a scatter plot. Do poverty rates appear positively or negatively correlated with race, as measured by the variable percblack?

ggplot(data = midwest, aes(x = percblack, y = percbelowpoverty)) + 
  geom_point()

Knitting

Knitting a document means that you could turn all the texts and codes written in R script into a nicely formatted document. For now, we are going to turn R script into .html document. You could knit your R script into .html document by clicking File>Knit Document or simply use shortcut, Ctrl Shift K. For those of you who cannot see Knit Document under File tab, you may find something similar to Compile Document. Click it and it’ll do the same trick.

Now your turn

Please open up the 01-Exercise.R and fill out your answer for each question.


  1. Pro tip: You can type the assignment operator with the keyboard shortcut Alt -.↩︎

  2. Reproducible = fewer headaches later.↩︎