::p_load(tidyverse, data.table, purrr) pacman
This set of notes is a total rip of Grant McDermotts lecture notes on functions here. I don’t want to take any credit here. I wrote this up to get at the core ideas that are best to start with. If you would like to learn more, you can go through Grant’s entire Data Science for Economist (PhD) lecture notes on this GitHub repo
Basic function syntax
function_name(ARGUMENTS)
This syntax is R
coding in a nutshell. Most of the time, all we are writing is variations of this syntax.
However, sometimes writing our own functions can be extremely useful. This is easy to do using the function()
function (i know lol).
The basic sytax of function()
= function(ARGUMENTS) {
my_func
OPERATIONSreturn(VALUE)
}
square(x)
Write a function that outputs the square of any number
= function(x) { # name of the function
square = x^2 # operation
double return(double) # output we want returned
}
Test it
square(3)
## [1] 9
Note: We can write this function is several different ways. Generally I follow the above format; explicitly using return()
= function(x = 2) { # name of the function
square = x^2 # operation
double return(double) # output we want returned
}
Thus without an input, the function will return the dfault
square()
## [1] 4
= function(x = 2) {
square if (class(x) == 'numeric' | class(x) == 'integer') {
= x^2
double return(double)
}else {
print('put in a number you dummy')
} }
Now lets iterate
Like I mentioned in class last time, there are several ways to write a loops in R
for () {}
*apply
family from base
map*
family from purrr
for
loopsThe standard for loop syntax is similar to other dynamic programming languages:
# create and empty (list) object
= NULL
square_list # for loop
for (i in 1:10) { ## state your index
= square(i) ## state the function you would like to loop over
square_list[i] }
Let’s check out the base::LETTERS()
function
for (i in 1:10) {
print(LETTERS[i])
}
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
## [1] "E"
## [1] "F"
## [1] "G"
## [1] "H"
## [1] "I"
## [1] "J"
lapply
I’m only going to go over lapply
from the apply*
family. If you want to learn about the whole family–see ?apply
and/or this blog
The syntax for the same two loops above is as follows:
lapply(1:10, function(i){
square(i)
})
## [[1]]
## [1] 1
##
## [[2]]
## [1] 4
##
## [[3]]
## [1] 9
##
## [[4]]
## [1] 16
##
## [[5]]
## [1] 25
##
## [[6]]
## [1] 36
##
## [[7]]
## [1] 49
##
## [[8]]
## [1] 64
##
## [[9]]
## [1] 81
##
## [[10]]
## [1] 100
lapply(1:10, function(i){
print(LETTERS[i])
})
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
## [1] "E"
## [1] "F"
## [1] "G"
## [1] "H"
## [1] "I"
## [1] "J"
## [[1]]
## [1] "A"
##
## [[2]]
## [1] "B"
##
## [[3]]
## [1] "C"
##
## [[4]]
## [1] "D"
##
## [[5]]
## [1] "E"
##
## [[6]]
## [1] "F"
##
## [[7]]
## [1] "G"
##
## [[8]]
## [1] "H"
##
## [[9]]
## [1] "I"
##
## [[10]]
## [1] "J"
Notice that the returned object is a list.. This is where the l
in lapply
comes from. If you would like a more s
implified output, use sapply
–it will match the input type.
sapply(1:10, function(i){
square(i)
})
## [1] 1 4 9 16 25 36 49 64 81 100
map
We’ll get to map another day lols
Basically the syntax is the same, but there are some differences that we will talk about next time. The same two loops as before can be written as
map(1:10, function(i){
square(i)
})
## [[1]]
## [1] 1
##
## [[2]]
## [1] 4
##
## [[3]]
## [1] 9
##
## [[4]]
## [1] 16
##
## [[5]]
## [1] 25
##
## [[6]]
## [1] 36
##
## [[7]]
## [1] 49
##
## [[8]]
## [1] 64
##
## [[9]]
## [1] 81
##
## [[10]]
## [1] 100
and
map(1:10, function(i){
print(LETTERS[i])
})
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
## [1] "E"
## [1] "F"
## [1] "G"
## [1] "H"
## [1] "I"
## [1] "J"
## [[1]]
## [1] "A"
##
## [[2]]
## [1] "B"
##
## [[3]]
## [1] "C"
##
## [[4]]
## [1] "D"
##
## [[5]]
## [1] "E"
##
## [[6]]
## [1] "F"
##
## [[7]]
## [1] "G"
##
## [[8]]
## [1] "H"
##
## [[9]]
## [1] "I"
##
## [[10]]
## [1] "J"
With this simple introduction, we can already make some serious improvements to our workflow. Let’s go back to project 001 for a minute. When we want to perform k-fold cross validation, loops can make our code much cleaner.
Recall the useful functions that we used in project 002:
sample_frac()
, or sample_n()
or sample()
setdiff()
lm()
predict()
Task:
(i.) Load the election_2016.csv
data into memory
(ii.) From part 1, create a function that automates the subparts 03 and 04–outputing a rmse
for 3 different models. The arguments of the function should include function(data)
(iii.) Now make the function more general by allowing it to flexibly change the lm
formula. Create a second argument, function(data, formula)
, where formula is the model you want the function to use.
(iv.) Write a loop over all three formulas, returning rmse
for each model.
(v.) Now let’s generalize your function to add cross validation. Add a third argument, function(data, formula, v)
, where v
is the group you want to omit as the validation group.
(vi.) Now write a nested (double) loop that loops over (1) each model (2) v
. Your output should return an rmse
for each combination of formula
and v