MMiDS 3.5: Self-Assessment Quiz

In gradient descent, the update rule for moving from \(x_t\) to \(x_{t+1}\) is given by:

a) \(x_{t+1} = x_t + \alpha_t \nabla f(x_t)\)
b) \(x_{t+1} = x_t - \alpha_t \nabla f(x_t)\)
c) \(x_{t+1} = x_t + \frac{\nabla f(x_t)}{\|\nabla f(x_t)\|}\)
d) \(x_{t+1} = x_t - \frac{\nabla f(x_t)}{\|\nabla f(x_t)\|}\)

In the gradient descent update rule \(x_{t+1} = x_t - \alpha_t \nabla f(x_t)\), what does \(\alpha_t\) represent?

a) The gradient of \(f\) at \(x_t\)
b) The step size or learning rate
c) The direction of steepest ascent
d) The Hessian matrix of \(f\) at \(x_t\)

A function \(f : \mathbb{R}^d \to \mathbb{R}\) is said to be \(L\)-smooth if:

a) \(\|\nabla f(x)\| \leq L\) for all \(x \in \mathbb{R}^d\)
b) \(-LI_{d\times d} \preceq H_f(x) \preceq LI_{d\times d}\) for all \(x \in \mathbb{R}^d\)
c) \(f(y) \leq f(x) + \nabla f(x)^T(y - x) + \frac{L}{2}\|y - x\|^2\) for all \(x, y \in \mathbb{R}^d\)
d) Both b) and c)

What is the step size used in the convergence theorem for gradient descent in the smooth case?

a) \(\alpha_t = \frac{1}{L}\)
b) \(\alpha_t = \frac{1}{t}\)
c) \(\alpha_t = \frac{1}{L^2}\)
d) \(\alpha_t = \frac{1}{\sqrt{t}}\)

Suppose \(f : \mathbb{R}^d \to \mathbb{R}\) is \(L\)-smooth and bounded from below. According to the "Convergence of Gradient Descent: Smooth Case" theorem, gradient descent with step size \(\alpha_t = \frac{1}{L}\) started from any \(x_0\) produces a sequence \(\{x_t\}\) such that

a) \(\lim_{t \to +\infty} f(x_t) = 0\)
b) \(\lim_{t \to +\infty} \|\nabla f(x_t)\| = 0\)
c) \(\min_{t=0,\ldots,S-1} \|\nabla f(x_t)\| \leq \sqrt{\frac{2L[f(x_0) - \bar{f}]}{S}}\) after \(S\) steps
d) Both b) and c)

Suppose \(f : \mathbb{R}^d \to \mathbb{R}\) is \(L\)-smooth and \(m\)-strongly convex with a global minimizer at \(x^*\). According to the "Convergence of Gradient Descent: Strongly Convex Case" theorem, gradient descent with step size \(\alpha = \frac{1}{L}\) started from any \(x_0\) produces a sequence \(\{x_t\}\) such that after \(S\) steps:

a) \(f(x_S) - f(x^*) \leq \left(1 - \frac{m}{L}\right)^S[f(x_0) - f(x^*)]\)
b) \(f(x_S) - f(x^*) \geq \left(1 - \frac{m}{L}\right)^S[f(x_0) - f(x^*)]\)
c) \(f(x_S) - f(x^*) \leq \left(1 + \frac{m}{L}\right)^S[f(x_0) - f(x^*)]\)
d) \(f(x_S) - f(x^*) \geq \left(1 + \frac{m}{L}\right)^S[f(x_0) - f(x^*)]\)

If a function \(f\) is \(m\)-strongly convex, what can we say about its global minimizer?

a) It may not exist.
b) It exists and is unique.
c) It exists but may not be unique.
d) It always occurs at the origin.

What is the key property of strongly convex functions that allows us to establish a faster convergence rate for gradient descent compared to the smooth case?

a) Strong convexity implies smoothness.
b) Strong convexity allows us to relate the function value at a point to the norm of the gradient at that point.
c) Strong convexity guarantees the existence of a unique global minimum.
d) Strong convexity ensures that the function is bounded from below.

What mathematical concept does the Descent Guarantee for Smooth Functions primarily rely on?

a) Taylor's Theorem
b) Cauchy-Schwarz Inequality
c) Mean Value Theorem
d) Fundamental Theorem of Calculus