Abstract
This document demonstrates the behavior of ridge and lasso regression in the presence of independent and dependent features. We will simulate data with independent and dependent features, estimate the regularization path, and tune the hyperparameter \(\lambda\).
Load required packages and set seed for replication
Before we start, let’s set the parameters for the simulation. Specifically, we will set the sample size, the number of features, and the number of features with non-zero coefficients. Note that we will simulate two scenarios: one with independent features and one with dependent features.
n <- 100 # sample size
p <- 100 # number of feature
s <- 3 # number of features with non-zero coefficients
First, we will simulate data with independent features. Specifically,
we will generate a sample of size n
with p
features, of which s
have non-zero coefficients. The
response is generated as a linear combination of the features plus some
noise.
For the dependent features scenario, we will generate a sample of
size n
with p
features, of which
s
have non-zero coefficients. The features are generated
from a multivariate normal distribution with a specific covariance
matrix Sigma
. The response is generated as a linear
combination of the features plus some noise.
Sigma <- matrix(
c(1, 0.9, 0.2,
0.9, 1, 0,
0.2, 0, 1 ),
nrow = s,
ncol = s
) # covariance matrix (positive definite)
cov2cor(Sigma) # convert covariance to correlation
[,1] [,2] [,3]
[1,] 1.0 0.9 0.2
[2,] 0.9 1.0 0.0
[3,] 0.2 0.0 1.0
X_1 <- mvrnorm(n = n, rep(0, s), Sigma = Sigma) # generate dependent features
X_2 <- matrix(rnorm(n * (p - s)), ncol = p - s) # generate independent features
X_dep <- cbind(X_1, X_2) # combine dependent and independent features
beta <- c(rep(5, s), rep(0, p - s)) # true coefficients
Y_dep <- X_dep %*% beta + rnorm(n) # generate response
In the following code chunk, we estimate the regularization path for the independent features scenario. Specifically, we estimate the regularization path for both lasso and ridge regression.
As we can see, the lasso path is more sparse than the ridge path. This is because lasso performs feature selection by setting some coefficients to zero, whereas ridge regression keeps all features inside the model, while shrinking the coefficients towards zero.
Now, we will tune the hyperparameter \(\lambda\) for both lasso and ridge regression.
We now do the same for the dependent features scenario.
Interestingly, the lasso path is less sparse than the ridge path. This is because the features are correlated, and lasso tends to select one feature from a group of correlated features. Ridge regression, on the other hand, keeps all features inside the model, while shrinking the coefficients towards zero.
Tuning \(\lambda\)