In gradient descent, the update rule for moving from \(x_t\) to \(x_{t+1}\) is given by:
In the gradient descent update rule \(x_{t+1} = x_t - \alpha_t \nabla f(x_t)\), what does \(\alpha_t\) represent?
A function \(f : \mathbb{R}^d \to \mathbb{R}\) is said to be \(L\)-smooth if:
What is the step size used in the convergence theorem for gradient descent in the smooth case?
Suppose \(f : \mathbb{R}^d \to \mathbb{R}\) is \(L\)-smooth and bounded from below. According to the "Convergence of Gradient Descent: Smooth Case" theorem, gradient descent with step size \(\alpha_t = \frac{1}{L}\) started from any \(x_0\) produces a sequence \(\{x_t\}\) such that
Suppose \(f : \mathbb{R}^d \to \mathbb{R}\) is \(L\)-smooth and \(m\)-strongly convex with a global minimizer at \(x^*\). According to the "Convergence of Gradient Descent: Strongly Convex Case" theorem, gradient descent with step size \(\alpha = \frac{1}{L}\) started from any \(x_0\) produces a sequence \(\{x_t\}\) such that after \(S\) steps:
If a function \(f\) is \(m\)-strongly convex, what can we say about its global minimizer?
What is the key property of strongly convex functions that allows us to establish a faster convergence rate for gradient descent compared to the smooth case?
What mathematical concept does the Descent Guarantee for Smooth Functions primarily rely on?