Skip to content

Conversation

@gdudziuk
Copy link

@gdudziuk gdudziuk commented Feb 10, 2020

I have implemented a simple form of hierarchical optimization in D2D and I would like to propose to pull this feature upstream.

The method of hierarchical optimization is described in detail in (Weber et al., 2011) and (Loos et al.,2018). Let me however illustrate this method for the following simplified parameter estimation problem:

(Eq. 1) \min_{p} J_1(p) = \sum_{j=1}^{J} (Y_j - y(p,t_j))^2/{std}_j^2

where Y_j and {std}_j represent experimental measurements and associated standard deviations given or arbitrary scale and y(p,t_j) is the model observable corresponding to the measurements Y_j. Assume that the observable y is of form y(p,t)=s*x(r,t) where x(r,t) denote model species (or some derived variable) corresponding to the measurements Y_j, s is a scale parameter to be estimated along with the dynamic parameters r and p=(r,s). Then, the above optimization problem becomes

(Eq. 2) \min_{r,s} J_2(r,s) = \sum_{j=1}^{J} (Y_j - s*x(r,t_j))^2/{std}_j^2

The core observation is that the function J_2 is quadratic and positive definite w.r.t. the scale s, assuming that at least some of x(r,t_j) are nonzero. Thus, given parameters r, the optimal scale s=s(r) is uniquely determined and the optimization problem (Eq. 2) is equivalent to

(Eq. 3) \min_{r} J_3(r) = \sum_{j=1}^{J} (Y_j - s(r)*x(r,t_j))^2/{std}_j^2

The optimal scales s(r) may be expressed explicitly. To do so, it is enough to resolve the condition \grad_{s} J_2(r,s)=0 w.r.t. s, which gives:

(Eq. 4) s(r) = \frac{\sum_{j} x_(r,t_j)*Y_j/{std}_j^2}
                    {\sum_{j} x_(r,t_j)^2/{std}_j^2}

Having this, we may also explicitly compute the gradient of J_3 w.r.t. r (assuming that we can compute the sensitivities of x w.r.t r) and use it to feed any gradient-based optimization algorithm.

To sum up, thanks to the specific form of the observable y, namely y=s*x, we can replace the optimization problem (Eq. 2) with the equivalent problem (Eq. 3) which has less parameters to optimize. This is the method of hierarchical optimization. From my experience (and as one could expect), this can significantly improve the optimization speed. The trade off is that we need to assume a specific form of the observable y. Nevertheless, I think that the case of y=s*x is common enough to implement hierarchical optimization for it.

Theoretically, the method outlined above may be generalized to more complex observables, like e.g. y=s*x+b where b denotes an offset parameter. Such observables perhaps cover vast majority of practical applications. Having y=s*x+b we proceed analogously as above, namely instead of (Eq. 2) we have

(Eq. 2a) \min_{r,s,b} J_2(r,s,b) = \sum_{j=1}^{J} (Y_j - s*x(r,t_j) - b)^2/{std}_j^2

and we need to compute both the optimal scale s=s(r) and the optimal offset b=b(r). The problem with such generalization is very pragmatic - the formulas for s(r) and b(r) become much more complex than in the simple case of y=s*x discussed above. For this reason I have implemented the hierarchical optimization just for the latter simple case and to be honest I think I will not implement the generalization to y=s*x+b soon.

Usage

There is function arInitHierarchical which detects all observables of form y=s*x end enables the "hierarchical mode" of optimization. To get it work, just put arInitHierarchical before arFit (or arFitLHS or anything like that). There is also function arDisableHierarchical which switches back to the "normal mode". So an example setup with hierarchical optimization may look like this:

arInit
arLoadModel(.....)
arLoadData(.....) % Assume that at least some data has observables of form y=s*x
arCompileAll

arFit % Here we optimize in the "normal mode", i.e. we optimize J_2(r,s) as in (Eq. 2)
arInitHierarchical % Here we enter the "hierarchial mode"...
arFit  % ... and fine-tune the fit by optimizing J_3(r) as in (Eq. 3)
arDisableHierarchical % Switch back to the "normal mode"

The optimal scales s(r) are computed by function arCalcHierarchical, which is called from arCalcRes and arCalcSimu. As a result, any D2D function that internally calls arSimu can benefit from the "hierarchical mode". For example, in the following snippet

arInitHierarchical
arPlot

the scale parameters (if any) will be automatically fine-tuned prior to generating the plots.

Limitations

The implementation in this pull request strongly relies on the specific form (Eq. 2) of the optimization problem. Consequently, any D2D feature that modifies J_2 as in (Eq. 2) into something else cannot be used with the hierarchical optimization, at least until suitable generalizations are implemented. Up to what I know, these interfering features are:

  • Interpreting observables and data in the log10 scale
  • Error fitting (Edit: Even more, it is required that errors does not depend on r nor s, even implicitly; cf. the comment below)
  • Detection limits
  • The features handled in arCollectRes, namely:
    • Intercondition constraints and steady state conditions
    • Priors other than standard flat box
    • Random effects
    • User-defined residuals

With this features enabled, the current implementation of the hierarchical optimization works incorrectly since the optimal scales s(r) are in such case different than in (Eq. 4). Function arCalcHierarchical raises an error if any of these features is in use. I have done my best to make the above list complete. However, I'm a newcomer to D2D so there may be some other features that I don't know yet and that may interfere with the hierarchical optimization by modifying the merit function J_2. So if anybody finds that there are some more features like that in D2D, please let me know.

Further development

The possible directions of generalization of the method are:

  1. Add support for the hierarchical optimization for observables with offsets, y=s*x+b, possibly with some restricting assumptions.
  2. Add support for the hierarchical optimization in combination with the features listed in the preceding section. In this respect, some features are more cumbersome than others. Specifically, support for priors should be simple to implement.

 - Improve the docstring in arCalcHierarchical
 - Remove a redundant line in arInitHierarchical
Previously, this argument was accepet but unused by arCalcHierarchical.
Now it is used to avoid computing scale gradients and sensitivities of
observables when not necessary.
Previously, arInitHierarchical was using isSymType, the latter being a
function introduced in R2019a. This commit removes this dependence and
instead implements a small custom function which does the trick.
The scale parameters suitable for hierarchical optimization are those
which appear _only_ in observables of form `s*xz` where `s` is the
subject scale parameter and `xz` is a model species or a derived
variable. Until now, arInitHierarchical has been detecting parameters
which appear _at least in one_ observable of form `s*xz` (meaning that
they were allowed to appear also in obervables of other form).
Consequently, parameters not suitable for the hierarchical optimization
could be detected by arInitHierarchical and the results of the
hierarchical optimization could be not correct. This commit fixes it by
adding extra checks in arInitHierarchical to make the latter function
detect only the right parameters.
The implementation of hierarchical optimization depends on the specific
form of the merit function. We currently assume the following one:

    J(s,p) = \sum_j (data_j - s*x(p,t_j))^2/std_j^2

Each feature that modifies this expression also breaks the method of
hierarchical optimization, making its results incorrect. To avoid this,
in arCalcHierarchical we try to check if d2d features affecting the form
of the merit function are enabled. This commit adds extra checks of this
kind in arCalcHierarchical, hopefully making the checklist complete (or
close to complete).
The removed settings changes are redundant since these settings are
tested anyway in arCalcHierarchical.
Currently arCalcHierarchical requires no error fitting, meaning that
there can be no dependence of errors on any fittable parameters. Until
now, arCalcHierarchical has detected error fitting incorrectly,
excluding some possible settings which actually had no error fitting. In
other words, the test for error fitting has been to strict. This commit
fixes the issue by implementing a better test for error fitting in
arCalcHierarchical.
This commit updates the help text of arInitHierarchical to reflect the
changes in its functionality introduced in commit c3bbe5a. This should
had been done in that commit. Additionally, the present commit fixes a
typo in the subject help text.
Hierarchical optimization is not feasible if the solution of the model
is zero for all measurement times. This commit adds the respective
assertion in arCalchierarchical.

This extra assertion is not strictly necessary. Degeneration of the
solution to zero would result in NaN scales, subsequently resulting in
NaN residuals, which would trigger an error anyway. Nevertheless, with
the extra assertion we get a more specific error message which may make
it easier to debug.
@gdudziuk
Copy link
Author

gdudziuk commented Feb 11, 2020

An update. As long as this implementation relies on the form (Eq. 2) of the target function, it is important that {std}_j do not depend on the dynamic parameters r nor on the scale parameter s, neither explicitly nor implicitly, e.g. via model observables. This excludes error models with relative measurement noise, e.g. std(q1,q2,s,r) = sqrt(q1^2 + q2^2*y(s,r)^2), even if the error parameters q1 and q2 are constant.

So I need to write another commit to fix this. I will enforce the use of experimental error if the hierarchical optimization is enabled. Such a solution seems restrictive but:

  1. Error models that don't have any dependence on s nor r and which error parameters are constant always can be represented as concrete error values for particular replicates, i.e. can be represented as experimental errors. So in fact such a solution is not more restrictive than necessary.
  2. It is also simpler to implement than detecting error definitions that has any dependence on s or r.

I will commit in a moment. (Edit: Done, cf. commit 1b7fb04)

@clemenskreutz
Copy link
Collaborator

Dear gdudziuk.
Thanks for your contribution. At a first glance, it looks very valuable. I also deeply acknowledge your efforts in documentation. We check the code during the next days.

Currently arCalcHierarchical requires no dependence of errors on any
fittable parameters, neither explicitly nor implicitly. Until now,
arCalcHierarchical has been asserting that there is no fitting of error
parameters, but at the same time has been allowing implicit dependence
of errors on fittable parameters of the model via observables. This is
a bug since with such implicit dependence of errors on fittable
parameters the results of arCalcHierarchical are incorrect. This commit
fixes it by asserting the use of experimental errors, which are
independent of the fittable parameters by definition.
@clemenskreutz
Copy link
Collaborator

Dear
Sorry for my delayed response.
I startet testing and found the following problem:
You (sometimes) do not consider that there might be several models, i.e. ar.model needs an index ar.model(m). So you always need a loop for m=1:length(ar.model), ar.model(m) ...

Example: Lines 24-25 in arInitHierarchical.m:
xSyms = cellfun(@sym, ar.model.x, 'UniformOutput', false);
zSyms = cellfun(@sym, ar.model.z, 'UniformOutput', false);

Could you please update the code accordingly?
Thanks in advance.
Best, Clemens.

@clemenskreutz
Copy link
Collaborator

clemenskreutz commented Feb 28, 2020

Another question: I did the following:

arCopyBenchmarkModels('Swa')
cd Swameye_PNAS2003\
Setup
arInitHierarchical
arDisableHierarchical % Switch back to the "normal mode"

There is a warning that I don't understand and obviously no scaling parameters were found. Why?

` arInitHierarchical
Warning: Unable to prove 'offset_tSTAT == 0'.
In symengine
In sym/isAlways (line 42)
In arInitHierarchical (line 80)
Warning: Unable to prove 'offset_pSTAT == 0'.
In symengine
In sym/isAlways (line 42)
In arInitHierarchical (line 80)
Warning: No scale parameters found for hierarchical optimization.
In arInitHierarchical (line 224)

arDisableHierarchical % Switch back to the "normal mode"
Error using arDisableHierarchical (line 18)
Hierarchical optimization is not enabled. You have not called
arInitHierarchical or there are no scale parameters suitable for
hierarchical optimization in your observables.`

gdudziuk added 2 commits March 2, 2020 13:45
At the moment, arCalcHierarchical allows experimental errors only, so
the lines which handle the case of model-based errors are redundant and
obscure the code. This commit removes such lines.

See also commit 1b7fb04.
In arInitHierarchical, the cell arrays xSyms and zSyms has been
created assuming that there is only one model loaded (i.e. the length of
the ar.model structure is 1). This commit corrects the support for
multiple models in arInitHierarchical.
@gdudziuk
Copy link
Author

gdudziuk commented Mar 2, 2020

You (sometimes) do not consider that there might be several models, i.e. ar.model needs an index ar.model(m).

Right. Fixed in commit a35790f. Thank you for pointing this out.

In addition I have found another small issue - I will fix it this week. I will also take a look at the model of Swameye and let you know what's going on there.

gdudziuk added 8 commits March 3, 2020 16:05
For certain purposes, in arCalcHierarchical we have been testing if all
elements ar.model(im).data(id).useHierarchical are NaNs. But this cannot
happen since the array useHierarchical is constructed such that its
elements may be either false or true (and not NaN). So we should rather
test if the elements of useHierarchical are all false. This commit fixes
this issue.
Until now, arCalcHierarchical has been working correctly only for data
series with no duplicate time points (or, more generally, predictor
values), despite D2D allows such duplicates in the data. This commit
fixes this issue.
There has been a code snippet in a loop over observables which has been
independent of the obervables. This has resulted in unnecessary repeated
assignments inside this snippet. So this commit moves this snippet to a
more suitable location in the body of arInitHierarchical.
Function arInitHierarchical calls function isAlways which by default
rasises a warning when testing a condition like `sym('a')==0`. Such
conditions appear in run of normal operation of arInitHierarchical so
the warning in question is unnecessary. This commit switches it off.
Some of the assertion tests in arCalcHierarchical should be required
only for those observables which have parameters suitable for
hierarchical optimization. Until now, arCalcHierarchical has been
testing such assertions for all observables, thus being overly
restrictive. This commit fixes this issue.
Until now, arCalcHierarchical has been working correctly only for data
series with no missing observations, despite D2D allows such data. This
commit fixes this issue.
@gdudziuk
Copy link
Author

gdudziuk commented Mar 5, 2020

I have played a bit more with example models shipped with D2D and discovered some issues in my code. They are already fixed. From the model "Becker_Science2010" I have learned that D2D allows repeated measurements in the data sheets (which I should had figured out earlier; see commit 7feb387) and from the model "Swameye_PNAS2003" I have learned that there also may be missing measurements in the data sheets (see commit 14319ec). I hope that now hierarchical optimization works correctly with all possible data.

Concerning the warning "Unable to prove 'some_parameter == 0'.", this is a warning raised by the symbolic function isAlways, which is used in arInitHierarchical. Since this warning was occurring during normal operation of arInitHierarchical, I have decided to switch it off (see commit 050f114).

Concerning the warning "No scale parameters found for hierarchical optimization.", this is correct since there are no such parameters in the model "Swameye_PNAS2003". At the moment, my implementation supports hierarchical optimization only for observables of form scale*xz and the model of Swameye has observables of form scale*xz + offset, which is a different case.

I suspect that this may be a problem for you. I can see that many D2D examples involve not only scales but also offsets so you probably would like to have the hierarchical optimization generalized to cover the setups involving offsets. But if this is acceptable for you to have that simple implementation as it is now, I can provide some toy model which demonstrates the use of the hierarchical optimization in the present state of development.

Of course, for the test purposes you can get hierarchical optimization work with the model of Swameye by manually removing the offset parameters from that model.

@gdudziuk
Copy link
Author

So, what do you think, @clemenskreutz?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants