Setup

Load required packages and set seed for replication

library(glmnet)   # for estimating ridge and lasso
library(MASS)     # for simulating multivariate normal
library(tidyverse) # for wrangling and plotting
set.seed(1203)

Set parameters

Before we start, let’s set the parameters for the simulation. Specifically, we will set the sample size, the number of features, and the number of features with non-zero coefficients. Note that we will simulate two scenarios: one with independent features and one with dependent features.

n <- 100 # sample size
p <- 100 # number of feature
s <- 3   # number of features with non-zero coefficients

Generate data

independent features

First, we will simulate data with independent features. Specifically, we will generate a sample of size n with p features, of which s have non-zero coefficients. The response is generated as a linear combination of the features plus some noise.

X_ind <- matrix(rnorm(n * p), ncol = p)  # generate independent features
beta  <- c(rep(5, s), rep(0, p - s))     # true coefficients
Y_ind <- X_ind %*% beta + rnorm(n)       # generate response

dependent features

For the dependent features scenario, we will generate a sample of size n with p features, of which s have non-zero coefficients. The features are generated from a multivariate normal distribution with a specific covariance matrix Sigma. The response is generated as a linear combination of the features plus some noise.

Sigma <- matrix(
  c(1,    0.9, 0.2,
    0.9,  1,   0,
    0.2,  0,   1  ),
  nrow = s,
  ncol = s
)  # covariance matrix (positive definite)

cov2cor(Sigma)  # convert covariance to correlation
     [,1] [,2] [,3]
[1,]  1.0  0.9  0.2
[2,]  0.9  1.0  0.0
[3,]  0.2  0.0  1.0
X_1   <- mvrnorm(n = n, rep(0, s), Sigma = Sigma)  # generate dependent features
X_2   <- matrix(rnorm(n * (p - s)), ncol = p - s)  # generate independent features
X_dep <- cbind(X_1, X_2)                           # combine dependent and independent features
beta  <- c(rep(5, s), rep(0, p - s))               # true coefficients
Y_dep  <- X_dep %*% beta + rnorm(n)                # generate response

Estimate regularization path

independent features

In the following code chunk, we estimate the regularization path for the independent features scenario. Specifically, we estimate the regularization path for both lasso and ridge regression.

fit_lasso <- glmnet(X_ind, Y_ind, alpha = 1)
plot(fit_lasso, xvar = "lambda", label = TRUE)

fit_ridge <- glmnet(X_ind, Y_ind, alpha = 0)
plot(fit_ridge, xvar = "lambda", label = TRUE)

As we can see, the lasso path is more sparse than the ridge path. This is because lasso performs feature selection by setting some coefficients to zero, whereas ridge regression keeps all features inside the model, while shrinking the coefficients towards zero.

Now, we will tune the hyperparameter \(\lambda\) for both lasso and ridge regression.

cv_lasso <- cv.glmnet(X_ind, Y_ind)
plot(cv_lasso, xvar = "lambda", label = TRUE)

cv_ridge <- cv.glmnet(X_ind, Y_ind, alpha = 0)
plot(cv_ridge, xvar = "lambda", label = TRUE)

Dependent features

We now do the same for the dependent features scenario.

fit_lasso <- glmnet(X_dep, Y_dep)
plot(fit_lasso, xvar = "lambda", label = TRUE)

fit_ridge <- glmnet(X_dep, Y_dep, alpha = 0)
plot(fit_ridge, xvar = "lambda", label = TRUE)

Interestingly, the lasso path is less sparse than the ridge path. This is because the features are correlated, and lasso tends to select one feature from a group of correlated features. Ridge regression, on the other hand, keeps all features inside the model, while shrinking the coefficients towards zero.

Tuning \(\lambda\)

cv_lasso <- cv.glmnet(X_dep, Y_dep)
plot(cv_lasso, xvar = "lambda", label = TRUE)

cv_ridge <- cv.glmnet(X_dep, Y_dep, alpha = 0)
plot(cv_ridge, xvar = "lambda", label = TRUE)