MMiDS 3.5: Self-Assessment Quiz

In gradient descent, the update rule for moving from \(x_t\) to \(x_{t+1}\) is given by:

In the gradient descent update rule \(x_{t+1} = x_t - \alpha_t \nabla f(x_t)\), what does \(\alpha_t\) represent?

A function \(f : \mathbb{R}^d \to \mathbb{R}\) is said to be \(L\)-smooth if:

What is the step size used in the convergence theorem for gradient descent in the smooth case?

Suppose \(f : \mathbb{R}^d \to \mathbb{R}\) is \(L\)-smooth and bounded from below. According to the "Convergence of Gradient Descent: Smooth Case" theorem, gradient descent with step size \(\alpha_t = \frac{1}{L}\) started from any \(x_0\) produces a sequence \(\{x_t\}\) such that

Suppose \(f : \mathbb{R}^d \to \mathbb{R}\) is \(L\)-smooth and \(m\)-strongly convex with a global minimizer at \(x^*\). According to the "Convergence of Gradient Descent: Strongly Convex Case" theorem, gradient descent with step size \(\alpha = \frac{1}{L}\) started from any \(x_0\) produces a sequence \(\{x_t\}\) such that after \(S\) steps:

If a function \(f\) is \(m\)-strongly convex, what can we say about its global minimizer?

What is the key property of strongly convex functions that allows us to establish a faster convergence rate for gradient descent compared to the smooth case?

What mathematical concept does the Descent Guarantee for Smooth Functions primarily rely on?