MMiDS 4.5: Self-Assessment Quiz

What is the goal of principal components analysis (PCA)?

a) To find clusters in the data.
b) To find a low-dimensional representation of the data that captures the maximum variance.
c) To find the mean of each feature in the data.
d) To find the correlation between features in the data.

In PCA, what is the relationship between the first principal component and the data points?

a) It is the line that minimizes the sum of squared distances to the data points.
b) It is the direction of maximum variance in the data.
c) It is the average of all data points.
d) It is the line that passes through the most data points.

Formally, the first principal component is the linear combination of features \( t_{i1} = \sum_{j=1}^p \phi_{j1} x_{ij} \) that solves which optimization problem?

a) \( \max \left\{ \frac{1}{n-1} \|X\phi_1\|^2 : \|\phi_1\|^2 = 1\right\} \)
b) \( \min \left\{ \frac{1}{n-1} \|X\phi_1\|^2 : \|\phi_1\|^2 = 1\right\} \)
c) \( \max \left\{ \frac{1}{n-1} \|X\phi_1\|^2 : \|\phi_1\|^2 \leq 1\right\} \)
d) \( \min \left\{ \frac{1}{n-1} \|X\phi_1\|^2 : \|\phi_1\|^2 \leq 1\right\} \)

The second principal component is uncorrelated with the first principal component, which means that:

a) \( \frac{1}{n-1}\sum_{i=1}^n t_{i1}t_{i2} = 0 \)
b) \( \frac{1}{n-1}\sum_{i=1}^n t_{i1}t_{i2} = 1 \)
c) \( \sum_{j=1}^p \phi_{j1}\phi_{j2} = 0 \)
d) \( \sum_{j=1}^p \phi_{j1}\phi_{j2} = 1 \)

According to the lemma on uncorrelated principal components, the condition \( \frac{1}{n-1}\sum_{i=1}^n t_{i1}t_{i2} = 0 \) is equivalent to:

a) \( \langle t_1, t_2 \rangle = 0 \)
b) \( \langle X\phi_1, X\phi_2 \rangle = 0 \)
c) \( \langle \phi_1, \phi_2 \rangle = 0 \)
d) All of the above.

What is the relationship between the loadings in PCA and the singular vectors of the data matrix?

a) The loadings are the left singular vectors.
b) The loadings are the right singular vectors.
c) The loadings are the singular values.
d) There is no direct relationship between loadings and singular vectors.

What is the dimensionality of the matrix \( T \) in the principal component transformation \( T = XV^{(l)} \)?

a) \( n \times p \)
b) \( n \times l \)
c) \( l \times p \)
d) \( p \times l \)

If the data matrix \( X \) is \( n \times p \), what is the maximum number of principal components that can be computed?

a) \( n \)
b) \( p \)
c) \( \min(n, p) \)
d) \( \max(n, p) \)

In the numerical implementation of PCA, what is the purpose of the line ‘Y = X - mean’?

a) To standardize the data by dividing each column by its standard deviation.
b) To center the data by subtracting the mean of each column.
c) To normalize the data by dividing each row by its Euclidean norm.
d) To apply the SVD to the data matrix.

What is the purpose of centering the data in PCA?

a) To make the calculations easier.
b) To ensure the first principal component describes the direction of maximum variance.
c) To normalize the data.
d) To remove outliers.