MMiDS 8.4: Self-Assessment Quiz

In stochastic gradient descent (SGD), how is the gradient estimated at each iteration?

a) By computing the gradient over the entire dataset.
b) By using the gradient from the previous iteration.
c) By randomly selecting a subset of sample and computing their gradient.
d) By averaging the gradients of all samples in the dataset.

What is the key advantage of using mini-batch SGD over standard SGD?

a) It guarantees faster convergence to the optimal solution.
b) It reduces the variance of the gradient estimate at each iteration.
c) It eliminates the need for computing gradients altogether.
d) It increases the computational cost per iteration.

Which of the following statements is true about the expected update step in stochastic gradient descent?

a) It is always equal to the full gradient descent update.
b) It is always in the opposite direction of the full gradient descent update.
c) It is, on average, equivalent to the full gradient descent update.
d) It has no relationship to the full gradient descent update.

In multinomial logistic regression, what is the role of the softmax function (\(\gamma\))?

a) To compute the gradient of the loss function.
b) To normalize the input features.
c) To transform scores into a probability distribution over labels.
d) To update the model parameters during gradient descent.

What is the Kullback-Leibler (KL) divergence used for in multinomial logistic regression?

a) To measure the distance between the predicted probabilities and the true labels.
b) To normalize the input features.
c) To update the model parameters during gradient descent.
d) To compute the gradient of the loss function.

Which of the following is NOT a component of the forward pass in the backpropagation algorithm for multinomial logistic regression?

a) Computing the affine transformation of the input data.
b) Applying the softmax function to obtain probabilities.
c) Calculating the gradient of the loss function.
d) Initializing the input data.

In the analysis of the MNIST dataset, what is the purpose of the Flatten layer in PyTorch?

a) To reduce the dimensionality of the input data.
b) To reshape a multi-dimensional input (like an image) into a vector.
c) To normalize the input data.
d) To apply an activation function to the input data.

What is the purpose of the zero_grad() method in PyTorch optimizers?

a) To reset the gradients to zero before backpropagation.
b) To initialize the model parameters.
c) To evaluate the model's performance on a test dataset.
d) To update the model parameters based on the computed gradients.