Skip to content

Commit 776afdf

Browse files
wassimmazouzmathurinmBadr-MOUFAD
authored
DOC L1-regularization parameter tutorial (#264)
Co-authored-by: mathurinm <[email protected]> Co-authored-by: Badr-MOUFAD <[email protected]>
1 parent 63277c0 commit 776afdf

File tree

2 files changed

+105
-2
lines changed

2 files changed

+105
-2
lines changed

doc/tutorials/alpha_max.rst

+98
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
.. _alpha_max:
2+
3+
==========================================================
4+
Critical regularization strength above which solution is 0
5+
==========================================================
6+
7+
This tutorial shows that for :math:`\lambda \geq \lambda_{\text{max}} = || \nabla f(0) ||_{\infty}`, the solution to
8+
:math:`\min f(x) + \lambda || x ||_1` is 0.
9+
10+
In skglm, we thus frequently use
11+
12+
.. code-block::
13+
14+
alpha_max = np.max(np.abs(gradient0))
15+
16+
and choose for the regularization strength :\math:`\alpha` a fraction of this critical value, e.g. ``alpha = 0.01 * alpha_max``.
17+
18+
Problem setup
19+
=============
20+
21+
Consider the optimization problem:
22+
23+
.. math::
24+
\min_x f(x) + \lambda || x||_1
25+
26+
where:
27+
28+
- :math:`f: \mathbb{R}^d \to \mathbb{R}` is a convex differentiable function,
29+
- :math:`|| x ||_1` is the L1 norm of :math:`x`,
30+
- :math:`\lambda > 0` is the regularization parameter.
31+
32+
We aim to determine the conditions under which the solution to this problem is :math:`x = 0`.
33+
34+
Theoretical background
35+
======================
36+
37+
38+
Let
39+
40+
.. math::
41+
42+
g(x) = f(x) + \lambda || x||_1
43+
44+
According to Fermat's rule, 0 is the minimizer of :math:`g` if and only if 0 is in the subdifferential of :math:`g` at 0.
45+
The subdifferential of :math:`|| x ||_1` at 0 is the L-infinity unit ball:
46+
47+
.. math::
48+
\partial || \cdot ||_1 (0) = \{ u \in \mathbb{R}^d : ||u||_{\infty} \leq 1 \}
49+
50+
Thus,
51+
52+
.. math::
53+
:nowrap:
54+
55+
\begin{equation}
56+
\begin{aligned}
57+
0 \in \text{argmin} ~ g(x)
58+
&\Leftrightarrow 0 \in \partial g(0) \\
59+
&\Leftrightarrow
60+
0 \in \nabla f(0) + \lambda \partial || \cdot ||_1 (0) \\
61+
&\Leftrightarrow - \nabla f(0) \in \lambda \{ u \in \mathbb{R}^d : ||u||_{\infty} \leq 1 \} \\
62+
&\Leftrightarrow || \nabla f(0) ||_\infty \leq \lambda
63+
\end{aligned}
64+
\end{equation}
65+
66+
67+
We have just shown that the minimizer of :math:`g = f + \lambda || \cdot ||_1` is 0 if and only if :math:`\lambda \geq ||\nabla f(0)||_{\infty}`.
68+
69+
Example
70+
=======
71+
72+
Consider the loss function for Ordinary Least Squares :math:`f(x) = \frac{1}{2n} ||Ax - b||_2^2`, where :math:`n` is the number of samples. We have:
73+
74+
.. math::
75+
\nabla f(x) = \frac{1}{n}A^T (Ax - b)
76+
77+
At :math:`x=0`:
78+
79+
.. math::
80+
\nabla f(0) = -\frac{1}{n}A^T b
81+
82+
The infinity norm of the gradient at 0 is:
83+
84+
.. math::
85+
||\nabla f(0)||_{\infty} = \frac{1}{n}||A^T b||_{\infty}
86+
87+
For :math:`\lambda \geq \frac{1}{n}||A^T b||_{\infty}`, the solution to :math:`\min_x \frac{1}{2n} ||Ax - b||_2^2 + \lambda || x||_1` is :math:`x=0`.
88+
89+
90+
91+
References
92+
==========
93+
94+
Refer to Section 3.1 and Proposition 4 in particular of [1] for more details.
95+
96+
.. _1:
97+
98+
[1] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18, 1 (January 2017), 4671–4703.

doc/tutorials/tutorials.rst

+7-2
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,16 @@ Explore how ``skglm`` fits an unpenalized intercept.
2525

2626

2727
:ref:`Mathematics behind Cox datafit <maths_cox_datafit>`
28-
-----------------------------------------------------------------
28+
---------------------------------------------------------
2929

3030
Get details about Cox datafit equations.
3131

3232
:ref:`Details on the group Lasso <prox_nn_group_lasso>`
33-
-----------------------------------------------------------------
33+
-------------------------------------------------------
3434

3535
Mathematical details about the group Lasso, in particular with nonnegativity constraints.
36+
37+
:ref:`Critical regularization strength above which solution is 0 <alpha_max>`
38+
-----------------------------------------------------------------------------
39+
40+
How to choose the regularization strength in L1-regularization?

0 commit comments

Comments
 (0)