Having R and R Studio on your laptop will allow you to work on problem sets and explore the magnificent functionality of R outside the lab. R is the language and R Studio helps us interact with R. It is important that you install R before you install R Studio.
In your web browser,
Go to r-project.org
Click download R.
At this point, you’ll be directed to a page with a list of institutions that host the Comprehensive R Archive Network (CRAN). The idea is to pick an institution near you. Scroll down to USA and click the link for OSU.
Windows Instructions: If you have a Windows machine,
Click Download R for Windows
Click install R for the first time
Click Download R 4.1.2 for Windows
To complete installation, run the .exe
file you downloaded.
Mac Instructions: If you have a Mac,
Click Download R for (Mac) OS X
Under “latest release”, click R-4.1.2.pkg
if your mac has Intel chip
Click R-4.1.2-arm64.pkg
if your mac has Apple silicon chip
To complete installation, run the .pkg
file you downloaded.
Make sure you download different package depending on the type of chip your Mac has.
Linux Instructions: If you run a Linux distro, note that installation instructions vary by distro. That said, you probably know what you’re doing.
In your web browser, go to rstudio.com/products/rstudio/, scroll down to R Studio Desktop, and then click Download RStudio Desktop under “Open Source Edition.” Scroll down to “Installers for Supported Platforms” and click the link that corresponds with your operating system. To complete installation, run the installer you downloaded.
RStudio is an integrated development environment (IDE) for R. In other words, RStudio is an environment that provides a set of necessary tools for programmers to easily write and execute codes. Because RStudio is just an environment, it won’t run any codes if R is not installed prior to installation of RStudio. Note that technically, all the codes written in scripts run on R, and not RStudio.
When you open R Studio for the first time, you should notice three panels. The large panel to the left is the console. This is where you run code that tells R what to do. You can also use the console as a calculator. For example, if you type 5+5*2-1
in the console and hit Enter
, then R will return
## [1] 14
in the console.
The upper-right panel is the global environment. This is where R Studio stores datasets, user-defined functions, and other objects.
To define an object, you use the assignment operator <-
or simply =
.1 For example, suppose that you want to assign the number 5 to an object called a
. In the console, you would type
a <- 5
which reads “a
gets 5.” When you execute this code (by hitting Enter
), a
will show up in the global environment. Hovering your cursor over a
in the global environment tells you that a
is a numeric object.
There are other kinds of objects, too. For example,
b <- "I Love Metrics"
is a character object, and
mat <- matrix(c(1, 2, 3, 4),
nrow = 2)
is a matrix.
R functions come in packages. When you open a fresh R session in RStudio, a number of packages come pre-loaded. These include packages with common math and statistics functions and are known collectively as “base R.” Base R is wonderful, but non-default packages offer a great deal of flexibility and functionality.
You could consider a package to be a combined set of multiple packages, or multiple data sets and functions defined by other coders in the R community. For example, one common package that we will use often later in the class is tidyverse
package. In this package, multiple functions that are useful for data analysis are predefined so that we could simply call a specific function name to use it.
Install a package: install.packages("package.name.here")
package.name.here
with the name of the package you want to install.Alternatively, you can click on the Packages
tab of the bottom-right panel:
Load a package: library(package.name.here)
pacman
We will often need to load several packages in a single session. One way to do this is to execute library(package.1)
, then library(package.2)
, then library(package.3)
, and so forth. A less cumbersome way to load multiple packages is to use the p_load
function from the pacman
package.
pacman
.pacman
package with library(pacman)
.p_load(package.1, package.2, package.3)
.p_load
first checks to see if the packages are installed. If they aren’t, then it will install them for you.To produce reproducible2 R code, it is best to use scripts. Open a new R script file with the .R
extension by clicking File
then New File
then R Script
. We will write our first script to generate a histogram and scatter plot using ggplot2
.
For your exercises, you are going to create and submit the knitted version of R scripts.
ggplot2
Start by writing code to install and load ggplot2
.
library(pacman)
p_load(ggplot2)
?
in front of the name of the package/function with no space in between the question mark and the name of the package/function.Run
at the upper-right corner of your R script. A quicker alternative is to click the line you want to run and then use the keyboard shortcut Ctrl
Enter
.Source
at the upper-right corner of your R script or use the keyboard shortcut Ctrl
Alt
R
. You could adjust these hot keys per your taste by selecting Tools
>Modify Keyboard Shortcuts...
.Aside: It is useful to leave comments in your code to explain to your future self what your code is doing and why. You can leave a comment by typing a hash #
. Notice also that when you put #
in front of a code, R would skip this line and not run that code. This is why when you want to leave a comment, you would want to put a hash #
in front of it so that R would keep it from running. :
# This is a comment. R will ignore it.
Check out the example dataset midwest
from ggplot2
. You can view the first few rows of the dataset with variable names by using the head
function.
head(midwest)
## # A tibble: 6 × 28
## PID county state area poptotal popdensity popwhite popblack popamerindian
## <int> <chr> <chr> <dbl> <int> <dbl> <int> <int> <int>
## 1 561 ADAMS IL 0.052 66090 1271. 63917 1702 98
## 2 562 ALEXAND… IL 0.014 10626 759 7054 3496 19
## 3 563 BOND IL 0.022 14991 681. 14477 429 35
## 4 564 BOONE IL 0.017 30806 1812. 29344 127 46
## 5 565 BROWN IL 0.018 5836 324. 5264 547 14
## 6 566 BUREAU IL 0.05 35688 714. 35157 50 65
## # … with 19 more variables: popasian <int>, popother <int>, percwhite <dbl>,
## # percblack <dbl>, percamerindan <dbl>, percasian <dbl>, percother <dbl>,
## # popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
## # poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
## # percchildbelowpovert <dbl>, percadultpoverty <dbl>,
## # percelderlypoverty <dbl>, inmetro <int>, category <chr>
Next, make a histogram of county poverty rates (measured by the variable percbelowpoverty
) using the ggplot
function. You will need to tell ggplot
geom_histogram()
).ggplot(data = midwest, aes(x = percbelowpoverty)) +
geom_histogram()
To visualize relationships between variables, you can make a scatter plot. Do poverty rates appear positively or negatively correlated with race, as measured by the variable percblack
?
ggplot(data = midwest, aes(x = percblack, y = percbelowpoverty)) +
geom_point()
Knitting a document means that you could turn all the texts and codes written in R script into a nicely formatted document. For now, we are going to turn R script into .html
document. You could knit your R script into .html
document by clicking File
>Knit Document
or simply use shortcut, Ctrl
Shift
K
. For those of you who cannot see Knit Document
under File
tab, you may find something similar to Compile Document
. Click it and it’ll do the same trick.
Please open up the 01-Exercise.R
and fill out your answer for each question.