-
Notifications
You must be signed in to change notification settings - Fork 5
Lupi #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Lupi #151
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great review !
I corrected some typos.
For the propositions extracted from the article, I did not get the signification (and I do not think it is mandatory) but examples are great and illustrates the idea.
|
||
# Highlights | ||
|
||
* This paper introduces to a new leanrning paradigm called LUPI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* This paper introduces to a new leanrning paradigm called LUPI. | |
* This paper introduces a new learning paradigm called LUPI. |
# Highlights | ||
|
||
* This paper introduces to a new leanrning paradigm called LUPI. | ||
* The addition of privileged information during the training even if it is not during the inference improves the performances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The addition of privileged information during the training even if it is not during the inference improves the performances. | |
* The addition of privileged information during the training even if it is not present during the inference improves the performances. |
|
||
$$(x_1,y_1),(x_2, y_2),...,(x_n,y_n) \quad where \quad x_i \in X, \quad y_i \in \{-1,1\} $$ | ||
|
||
following an unknown probability measure $$P(x,y)$$ and the goal is to find among a collection of functions $$f(x, \alpha), \alpha \in \Lambda$$ the function $$y=f(x, \alpha*)$$ that minimizes the number of incorrect classifications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm perturbed by $$\Lambda$$
and y in $$y=f(x, \alpha*)$$
|
||
$$(x_1, x^*_1, y_1),(x_2, x^*_2, y_2),...,(x_n, x^*_n, y_n) \quad where \quad x_i \in X, \quad x^*_i \in X^*, \quad y_i \in \{-1,1\} $$ | ||
|
||
following the unknow probabilty measure $$P(x, x^*, y)$$ and the goal is to find among a collection of functions $$f(x, \alpha), \alpha \in \Lambda$$ the function $$y=f(x, \alpha*)$$ that minimizes the number of incorrect classifications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following the unknow probabilty measure $$P(x, x^*, y)$$ and the goal is to find among a collection of functions $$f(x, \alpha), \alpha \in \Lambda$$ the function $$y=f(x, \alpha*)$$ that minimizes the number of incorrect classifications. | |
following the unknown probability measure $$P(x, x^*, y)$$ and the goal is to find among a collection of functions $$f(x, \alpha), \alpha \in \Lambda$$ the function $$y=f(x, \alpha*)$$ that minimizes the number of incorrect classifications. |
> | ||
> $$ P(y[(w_l,x)+b_l]<0) \leq P(1-\xi^0 <0) + A \frac{h ln\frac{l}{h}-ln (\eta)}{l}$$ | ||
> | ||
> where $$P(y[(w_l,x)+b_l]<0)$$ is the probability of error for the Oracle SVM solution on the training set of size $$l$$, $$P(1-\xi^0 <0)$$ is the probability of error for the best solution in the asmissible set of functions, $$h$$ is the VC dimension of the admissible set of hyperplanes, ans $$A$$ is a constant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> where $$P(y[(w_l,x)+b_l]<0)$$ is the probability of error for the Oracle SVM solution on the training set of size $$l$$, $$P(1-\xi^0 <0)$$ is the probability of error for the best solution in the asmissible set of functions, $$h$$ is the VC dimension of the admissible set of hyperplanes, ans $$A$$ is a constant. | |
> where $$P(y[(w_l,x)+b_l]<0)$$ is the probability of error for the Oracle SVM solution on the training set of size $$l$$, $$P(1-\xi^0 <0)$$ is the probability of error for the best solution in the admissible set of functions, $$h$$ is the VC dimension of the admissible set of hyperplanes, ans $$A$$ is a constant. |
|
||
Some propositions are demontrated introducing oracle SVM. | ||
|
||
> Proposition 1: If any vector $$x \in X$$ belongs to one and only one of the classes and there exists an Oracle function with respect to the best decision rule in the asmissible set of hyperparameters, then with the probablity $$-\eta$$ the following bound holds true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> Proposition 1: If any vector $$x \in X$$ belongs to one and only one of the classes and there exists an Oracle function with respect to the best decision rule in the asmissible set of hyperparameters, then with the probablity $$-\eta$$ the following bound holds true | |
> Proposition 1: If any vector $$x \in X$$ belongs to one and only one of the classes and there exists an Oracle function with respect to the best decision rule in the admissible set of hyperparameters, then with the probablity $$-\eta$$ the following bound holds true |
> | ||
> That is the Oracle solution converges to the best possible solution in the admissible set of solutions with the rate $$O(h/l)$$. | ||
|
||
But in reality a teacher does not know the oracle function, but can supply privileged information instead with the admissible set of the correcting functions $$\phi(x^*, \delta), \delta \in \Delta$$ whiche defines that values of the oracle function $$ \xi^0(x_i) = \phi(x^*_i, \delta_0), \forall (x_i, x^*_i, y_i)$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in reality a teacher does not know the oracle function, but can supply privileged information instead with the admissible set of the correcting functions $$\phi(x^*, \delta), \delta \in \Delta$$ whiche defines that values of the oracle function $$ \xi^0(x_i) = \phi(x^*_i, \delta_0), \forall (x_i, x^*_i, y_i)$$ | |
But in reality a teacher does not know the oracle function, but can supply privileged information instead with the admissible set of the correcting functions $$\phi(x^*, \delta), \delta \in \Delta$$ which defines that values of the oracle function $$ \xi^0(x_i) = \phi(x^*_i, \delta_0), \forall (x_i, x^*_i, y_i)$$ |
|
||
## Future events as privileged information: Mackey-Glass series | ||
|
||
There are two parametrisations of such problem: quantitatively when with the value at time $$t$$ we predict the value at $$t+Dt$$ or qualitatively if with the value at time $$t$$ we estimate if it is greater of lower at time $$t+Dt$$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two parametrisations of such problem: quantitatively when with the value at time $$t$$ we predict the value at $$t+Dt$$ or qualitatively if with the value at time $$t$$ we estimate if it is greater of lower at time $$t+Dt$$. | |
There are two parametrisations of such problem: quantitatively when with the value at time $$t$$ we predict the value at $$t+Dt$$ or qualitatively, when with the value at time $$t$$ we estimate if it is greater of lower at time $$t+Dt$$. |
|
||
Some propositions are demontrated introducing oracle SVM. | ||
|
||
> Proposition 1: If any vector $$x \in X$$ belongs to one and only one of the classes and there exists an Oracle function with respect to the best decision rule in the admissible set of hyperparameters, then with the probablity $$-\eta$$ the following bound holds true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A negative probability? Should this be 1-\eta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting review. Just some minor typos to fix.
Co-authored-by: ThierryJudge <[email protected]>
Co-authored-by: ThierryJudge <[email protected]>
Co-authored-by: ThierryJudge <[email protected]>
Co-authored-by: ThierryJudge <[email protected]>
Co-authored-by: ThierryJudge <[email protected]>
Co-authored-by: ThierryJudge <[email protected]>
Co-authored-by: ThierryJudge <[email protected]>
Co-authored-by: ThierryJudge <[email protected]>
The paper is quite complicated and i didn't understand a great part of it so i limited the maths part. Still i copied 2 propositions that are demonstrated in the paper, but it might not be relevant without the rest that i surely not present, let me know if i keep it or not.
Also it will never be merged to main in the end because the paper is not available even if it is old.