In gradient descent, the update rule for moving from \(\mathbf{x}_t\) to \(\mathbf{x}_{t+1}\) is given by:
In the gradient descent update rule \(\mathbf{x}_{t+1} = \mathbf{x}_t - \alpha_t \nabla f(\mathbf{x}_t)\), what does \(\alpha_t\) represent?
A function \(f : \mathbb{R}^d \to \mathbb{R}\) is said to be \(L\)-smooth if:
What is the step size used in the convergence theorem for gradient descent in the smooth case?
Suppose \(f : \mathbb{R}^d \to \mathbb{R}\) is \(L\)-smooth and bounded from below. According to the "Convergence of Gradient Descent: Smooth Case" theorem, gradient descent with step size \(\alpha_t = \frac{1}{L}\) started from any \(\mathbf{x}_0\) produces a sequence \(\{\mathbf{x}_t\}\) such that
Suppose \(f : \mathbb{R}^d \to \mathbb{R}\) is \(L\)-smooth and \(m\)-strongly convex with a global minimizer at \(\mathbf{x}^*\). According to the "Convergence of Gradient Descent: Strongly Convex Case" theorem, gradient descent with step size \(\alpha = \frac{1}{L}\) started from any \(\mathbf{x}_0\) produces a sequence \(\{\mathbf{x}_t\}\) such that after \(S\) steps:
If a function \(f\) is \(m\)-strongly convex, what can we say about its global minimizer?
What is the key property of strongly convex functions that allows us to establish a faster convergence rate for gradient descent compared to the smooth case?
What mathematical concept does the Descent Guarantee for Smooth Functions primarily rely on?