|
| 1 | +.. _alpha_max: |
| 2 | + |
| 3 | +========================================================== |
| 4 | +Critical regularization strength above which solution is 0 |
| 5 | +========================================================== |
| 6 | + |
| 7 | +This tutorial shows that for :math:`\lambda \geq \lambda_{\text{max}} = || \nabla f(0) ||_{\infty}`, the solution to |
| 8 | +:math:`\min f(x) + \lambda || x ||_1` is 0. |
| 9 | + |
| 10 | +In skglm, we thus frequently use |
| 11 | + |
| 12 | +.. code-block:: |
| 13 | +
|
| 14 | + alpha_max = np.max(np.abs(gradient0)) |
| 15 | +
|
| 16 | +and choose for the regularization strength :\math:`\alpha` a fraction of this critical value, e.g. ``alpha = 0.01 * alpha_max``. |
| 17 | + |
| 18 | +Problem setup |
| 19 | +============= |
| 20 | + |
| 21 | +Consider the optimization problem: |
| 22 | + |
| 23 | +.. math:: |
| 24 | + \min_x f(x) + \lambda || x||_1 |
| 25 | +
|
| 26 | +where: |
| 27 | + |
| 28 | +- :math:`f: \mathbb{R}^d \to \mathbb{R}` is a convex differentiable function, |
| 29 | +- :math:`|| x ||_1` is the L1 norm of :math:`x`, |
| 30 | +- :math:`\lambda > 0` is the regularization parameter. |
| 31 | + |
| 32 | +We aim to determine the conditions under which the solution to this problem is :math:`x = 0`. |
| 33 | + |
| 34 | +Theoretical background |
| 35 | +====================== |
| 36 | + |
| 37 | + |
| 38 | +Let |
| 39 | + |
| 40 | +.. math:: |
| 41 | +
|
| 42 | + g(x) = f(x) + \lambda || x||_1 |
| 43 | +
|
| 44 | +According to Fermat's rule, 0 is the minimizer of :math:`g` if and only if 0 is in the subdifferential of :math:`g` at 0. |
| 45 | +The subdifferential of :math:`|| x ||_1` at 0 is the L-infinity unit ball: |
| 46 | + |
| 47 | +.. math:: |
| 48 | + \partial || \cdot ||_1 (0) = \{ u \in \mathbb{R}^d : ||u||_{\infty} \leq 1 \} |
| 49 | +
|
| 50 | +Thus, |
| 51 | + |
| 52 | +.. math:: |
| 53 | + :nowrap: |
| 54 | +
|
| 55 | + \begin{equation} |
| 56 | + \begin{aligned} |
| 57 | + 0 \in \text{argmin} ~ g(x) |
| 58 | + &\Leftrightarrow 0 \in \partial g(0) \\ |
| 59 | + &\Leftrightarrow |
| 60 | + 0 \in \nabla f(0) + \lambda \partial || \cdot ||_1 (0) \\ |
| 61 | + &\Leftrightarrow - \nabla f(0) \in \lambda \{ u \in \mathbb{R}^d : ||u||_{\infty} \leq 1 \} \\ |
| 62 | + &\Leftrightarrow || \nabla f(0) ||_\infty \leq \lambda |
| 63 | + \end{aligned} |
| 64 | + \end{equation} |
| 65 | +
|
| 66 | +
|
| 67 | +We have just shown that the minimizer of :math:`g = f + \lambda || \cdot ||_1` is 0 if and only if :math:`\lambda \geq ||\nabla f(0)||_{\infty}`. |
| 68 | + |
| 69 | +Example |
| 70 | +======= |
| 71 | + |
| 72 | +Consider the loss function for Ordinary Least Squares :math:`f(x) = \frac{1}{2n} ||Ax - b||_2^2`, where :math:`n` is the number of samples. We have: |
| 73 | + |
| 74 | +.. math:: |
| 75 | + \nabla f(x) = \frac{1}{n}A^T (Ax - b) |
| 76 | +
|
| 77 | +At :math:`x=0`: |
| 78 | + |
| 79 | +.. math:: |
| 80 | + \nabla f(0) = -\frac{1}{n}A^T b |
| 81 | +
|
| 82 | +The infinity norm of the gradient at 0 is: |
| 83 | + |
| 84 | +.. math:: |
| 85 | + ||\nabla f(0)||_{\infty} = \frac{1}{n}||A^T b||_{\infty} |
| 86 | +
|
| 87 | +For :math:`\lambda \geq \frac{1}{n}||A^T b||_{\infty}`, the solution to :math:`\min_x \frac{1}{2n} ||Ax - b||_2^2 + \lambda || x||_1` is :math:`x=0`. |
| 88 | + |
| 89 | + |
| 90 | + |
| 91 | +References |
| 92 | +========== |
| 93 | + |
| 94 | +Refer to Section 3.1 and Proposition 4 in particular of [1] for more details. |
| 95 | + |
| 96 | +.. _1: |
| 97 | + |
| 98 | +[1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703. |
0 commit comments