-
Notifications
You must be signed in to change notification settings - Fork 5
Lupi #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JulietteMoreau
wants to merge
12
commits into
main
Choose a base branch
from
LUPI
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Lupi #151
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
7d7b104
creation des fichiers
dc7537b
completed review
5244963
completed review
0a1dce8
corrections proposed by Sophie
385b901
Update collections/_posts/2023-10-27-LUPI.md
JulietteMoreau 912a925
Update collections/_posts/2023-10-27-LUPI.md
JulietteMoreau 095b901
Update collections/_posts/2023-10-27-LUPI.md
JulietteMoreau ccbb871
Update collections/_posts/2023-10-27-LUPI.md
JulietteMoreau 8daca5a
Update collections/_posts/2023-10-27-LUPI.md
JulietteMoreau ad9f2ca
Update collections/_posts/2023-10-27-LUPI.md
JulietteMoreau 1a6b180
Update collections/_posts/2023-10-27-LUPI.md
JulietteMoreau 2d1f2fd
Update collections/_posts/2023-10-27-LUPI.md
JulietteMoreau File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
--- | ||
layout: review | ||
title: "LUPI: Learning Using Privileged Information" | ||
tags: machine learning, SVM, privileged information | ||
author: "Juliette Moreau" | ||
cite: | ||
authors: "Vladimir Vapnik and Akshay Vashist" | ||
title: "A new learning paradigm: learning using privileged information. doi: https://doi.org/10.1016/j.neunet.2009.06.042" | ||
venue: "Neural Networks, 2009" | ||
--- | ||
|
||
|
||
# Highlights | ||
|
||
* This paper introduces a new learning paradigm called LUPI. | ||
* The addition of privileged information during the training even if it is not present during the inference improves the performances. | ||
* The paradigm is tested on three different applications. | ||
|
||
# Introduction | ||
Unlike machine learning, where the role of the teacher is not important, in human learning the teacher is very important. He/she provides additional information hidden in explanations, comments or comparisons. The paper explores a new paradigm in which specific forms of privileged information are considered during the training but not during the test phase. The superiority of this paradigm over the classical one is demonstrated. it is called the LUPI paradigm for learning using privileged information. | ||
|
||
|
||
# Problematization | ||
Usually the training is based on a set of pairs: | ||
|
||
$$(x_1,y_1),(x_2, y_2),...,(x_n,y_n) \quad where \quad x_i \in X, \quad y_i \in \{-1,1\} $$ | ||
|
||
following an unknown probability measure $$P(x,y)$$ and the goal is to find among a collection of functions $$f(x, \alpha), \alpha \in \Lambda$$ the function $$y=f(x, \alpha*)$$ that minimizes the number of incorrect classifications. | ||
|
||
In the LUPI paradigm we have triplets: | ||
|
||
$$(x_1, x^*_1, y_1),(x_2, x^*_2, y_2),...,(x_n, x^*_n, y_n) \quad where \quad x_i \in X, \quad x^*_i \in X^*, \quad y_i \in \{-1,1\} $$ | ||
|
||
following the unknown probability measure $$P(x, x^*, y)$$ and the goal is to find among a collection of functions $$f(x, \alpha), \alpha \in \Lambda$$ the function $$y=f(x, \alpha*)$$ that minimizes the number of incorrect classifications. | ||
|
||
The objective is exactly the same, but during training there is additional information that is not available during the test, that's why it's privileged information. | ||
|
||
They imagine several types of privileged information and some examples in the medical domain. For temporal prediction, some information from intermediate steps can be added (knowing the patient's symptoms at 3, 6 and 9 months to predict the evolution at 12 months). For images, holistic descriptions can be used (doctor's description of biopsy to classify between healthy and cancerous). | ||
|
||
# Privileged information in SVM type algorithms: the SVM+ method | ||
|
||
They apply this paradigm to SVM, the goal is to show that the convergence rate is significantly improved with this method. | ||
|
||
Some propositions are demonstrated introducing oracle SVM. | ||
|
||
> Proposition 1: If any vector $$x \in X$$ belongs to one and only one of the classes and there exists an Oracle function with respect to the best decision rule in the admissible set of hyperparameters, then with the probability $$-\eta$$ the following bound holds true | ||
> | ||
> $$ P(y[(w_l,x)+b_l]<0) \leq P(1-\xi^0 <0) + A \frac{h ln\frac{l}{h}-ln (\eta)}{l}$$ | ||
> | ||
> where $$P(y[(w_l,x)+b_l]<0)$$ is the probability of error for the Oracle SVM solution on the training set of size $$l$$, $$P(1-\xi^0 <0)$$ is the probability of error for the best solution in the admissible set of functions, $$h$$ is the VC dimension of the admissible set of hyperplanes, and $$A$$ is a constant. | ||
> | ||
> That is the Oracle solution converges to the best possible solution in the admissible set of solutions with the rate $$O(h/l)$$. | ||
|
||
But in reality a teacher does not know the oracle function, but can supply privileged information instead with the admissible set of the correcting functions $$\phi(x^*, \delta), \delta \in \Delta$$ which defines that values of the oracle function $$ \xi^0(x_i) = \phi(x^*_i, \delta_0), \forall (x_i, x^*_i, y_i)$$ | ||
|
||
 | ||
|
||
> Proposition 2: Under the conditions of proposition 1 with probability $$1-\eta$$ the following bound holds true | ||
> | ||
> $$ P(y[(w_l,x)+b_l]<0) \leq P(1-\phi(x^*, \delta_l<0)+A \frac{(h+h^*) ln\frac{2l}{h+h^*}-ln (\eta)}{l}$$ | ||
> | ||
> where $$P(y[(w_l,x)+b_l]<0)$$ is the probability of error for solution of the problem $$R(w,b,\delta) = \sum_{i=1}^{l} \theta [\phi(x^*_i, \delta)-1]$$ (where $$\theta(u)=1$$ if $$u>0$$) subjects to constraints $$y_i[(w,x_i)+b] \geq -\phi(x^*_i, \delta), i=1,\ldots,l$$ on the training set of size $$l$$, $$P(1-\phi(x^*, \delta_l<0)$$ is the probability of event $$\{\phi(x^*, \delta_l)>1\}$$, $$h$$ is the VC dimension of the admissible set of hyperplanes and $$h^*$$ is the VC dimension of the admissible set of correcting functions. | ||
|
||
For the rest, two models of the set of correcting functions are considered: the general $$X^*$$SVM+ model (an admissible set of non-negative correcting functions is defined in the multi-dimensional $$X^*$$-space) and the particular dSVM+ model (a set of admissible non-negative correcting functions is defined in a special one-dimensional d-space). The "plus" designs the use of privileged information. | ||
|
||
# Three examples of privileged information | ||
|
||
## Advanced technical model as privileged information: protein folding | ||
|
||
Proteins are classified in a human-made hierarchical classification according to their 3D structure in order to know how they are connected through evolution. But this 3D structure is complicated and time-consuming to obtain, whereas the amino acid sequence is easy to obtain, but getting the first from the second is not straightforward. | ||
|
||
The aim is to classify the proteins in this classification, but only with the amino acid sequence thanks to a pattern recognition technique. SVMs are considered to be one of the best techniques for constructing this type of decision rule. | ||
|
||
They focus on finding homologies between amino acid sequences between superfamilies in 80 cases of binary classification with 80 superfamilies that are representative of protein diversity and have enough sequences to be significant. Similarity between sequences is computed with profile-kernel, similarity between 3D structures with MAMMOTH. The set of vectors $$x \in X$$ is the similarity between sequences used for training, while $$x^* \in X^*$$ is the similarity between 3D structures used as privileged information. | ||
|
||
 | ||
|
||
The classical paradigm is in no way superior to the LUPI paradigm. In 3 cases there is no error for either paradigm. In 11 cases there is no difference between them. In all other cases LUPI improves the results. dSVM+ is better than $$X^*$$SVM+, but the classification with the 3D structure remains better. | ||
|
||
## Future events as privileged information: Mackey-Glass series | ||
|
||
There are two parametrizations of such problem: quantitatively when with the value at time $$t$$ we predict the value at $$t+Dt$$ or qualitatively , when with the value at time $$t$$ we estimate if it is greater or lower at time $$t+Dt$$. | ||
|
||
In the first case LUPI can be used with a regression model and in the second case a pattern recognition model can be used. The latter case is studied in the article using chaotic series and the Mackey-Glass equation. These equations are used in several models and one in particular, which shows that the SVM is better than many other techniques for predicting time series, is used as a comparison for the LUPI paradigm. | ||
|
||
The goal is to predict whether $$x(t+T)>x(t)$$ with a 4-dimensional vector of time series observations: only one step ahead, or also 5 and 8 steps ahead of the time of interest. | ||
|
||
 | ||
|
||
The Bayesian law is approximated to evaluate the proximity between the error of the Oracle SVM and the Bayesian error considering that the SVM error converges to Bayesian error. | ||
|
||
|
||
## Hollistic description as privileged information: digits classification | ||
|
||
As an example of holistic description as privileged information, they use the MNIST classification between 5 and 8 with resized to 10x10 images. A poetic description of the numbers is produced and translated into a 21-dimensional feature vector that gives characteristics of the number. | ||
|
||
 | ||
|
||
# Discussion | ||
|
||
3D information is very strong, so classification based on it remains better, but there are numerous cases where the LUPI paradigm gives remarkable results, such as with time series. Finding an even better space $$X^*$$ for the privileged information can improve the results even more. | ||
|
||
The LUPI paradigm is useful to speed up the convergence to the Bayesian solution (especially with few data). |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm perturbed by
$$\Lambda$$
and y in$$y=f(x, \alpha*)$$