2019-02-27
Almost two decades of experience in Derivatives markets
Visiting Professor of Finance Roosevelt University
Organizer of The Chicago R User Group
Committee member R/Finance 2019
Author of the wooldridge R package, the official R package for “Introductory Econometrics” by Jeffrey M. Wooldridge.
Author of the neverhpfilter R package, implementing James Hamilton’s alternative to the Hodrick-Prescott filter.
Google summer of code mentor 2019, R and Finance applications?
library(data.table)
concise syntax: fast to type, fast to read
fast speed
memory efficient
careful API lifecycle management
community
feature rich
Problem: Model every single equity traded in the United States since 1926.
34000+ securities
Daily observations from 1926-to-present
Survivorship Bias free
## Import Data Actual
library(data.table)
CRSP_2659 <- fread("/media/jshea/data/CRSPdailyValue-1926-1959.csv")
CRSP_6080 <- fread("/media/jshea/data/CRSPdailyValue-1960-1979.csv")
CRSP_8000 <- fread("/media/jshea/data/CRSPdailyValue-1980-1999.csv")
CRSP_0019 <- fread("/media/jshea/data/CRSPdailyValue-2000-2019.csv")
CRSP <- rbind(CRSP_2659, CRSP_6080, CRSP_8000, CRSP_0019)
rm(CRSP_2659, CRSP_6080, CRSP_8000, CRSP_0019)
## Format Data types
CRSP$date <- as.Date(CRSP$date, "%Y/%m/%d")
CRSP$RET <- as.numeric(CRSP$RET)
# Remove missing values
sum(is.na(CRSP))
CRSP <- na.omit(CRSP)
CRSP_Betas <- matrix(c(9999, 0, 1, 0, 1, 100, 1, 23625), nrow = 1)
for(permno in unique(CRSP$PERMNO)) {
factor4 <- glm(RET ~ 0 + Q1 + Q2 + Q3 + Q4 + RmQ1 + RmQ2 + RmQ3 + RmQ4,
data = CRSP[permno])
betarow <- matrix(c(permno, alpha, beta, summary(factor4)$sigma,
summary(factor4)$adj.r.squared,
summary(factor4)$fstatistic), nrow = 1)
if(dim(betarow)[2] == 8) CRSP_Betas <- rbind(CRSP_Betas, betarow)
}
colnames(CRSP_Betas) <- c("PERMNO", "Alpha", "Beta", "sigma", "Adj.R2", "Fstat",
"numeratorDF", "denominatorDF")
From setindex documentation
binary search and joins are faster when they detect they can use an existing key
grouping by a leading subset of the key columns is faster because the groups are already gathered contiguously in RAM
simpler shorter syntax; e.g. DT[“id”,] finds the group “id” in the first column of DT’s key using binary search. It may be helpful to think of a key as super-charged rownames: multi-column and multi-type rownames
setindex doesn’t physically reorder the entire data.table in RAM. Instead, it only computes the order for the set of columns provided and stores that order vector in an additional attribute called index.
With one line of code and a bracket!
setindex(CRSP, PERMNO)
CRSP_Betas <- matrix(c(9999, 0, 1, 0, 1, 100, 1, 23625), nrow = 1)
for(permno in unique(CRSP$PERMNO)) {
factor4 <- glm(RET ~ 0 + Q1 + Q2 + Q3 + Q4 + RmQ1 + RmQ2 + RmQ3 + RmQ4,
data = CRSP[.(permno)])
betarow <- matrix(c(permno, alpha, beta, summary(factor4)$sigma,
summary(factor4)$adj.r.squared,
summary(factor4)$fstatistic), nrow = 1)
if(dim(betarow)[2] == 8) CRSP_Betas <- rbind(CRSP_Betas, betarow)
}
colnames(CRSP_Betas) <- c("PERMNO", "Alpha", "Beta", "sigma", "Adj.R2", "Fstat",
"numeratorDF", "denominatorDF")