-
Notifications
You must be signed in to change notification settings - Fork 541
Implementing multistart version of theta_est using multiple sampling methods #3575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@djlaky @adowling2 Please provide early feedback. |
Dynamic saving using flush, add. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes from our in-person discussion/informal code review
pyomo/contrib/parmest/parmest.py
Outdated
# # If only one restart, return an empty list | ||
# return [] | ||
|
||
# return {theta_names[i]: initial_theta[i] for i in range(len(theta_names))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed adding a "dataframe" sampling method that uses multistart points defined by the user. This is helpful if we want to try the same set of multistart points for multiple experiments.
"Multistart is not supported in the deprecated parmest interface") | ||
) | ||
|
||
assert isinstance(n_restarts, int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check that this is > 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look at other Pyomo code fgor exampels of throwing exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @adowling2 here, you need to throw an exception so you can test the exception is caught.
pyomo/contrib/parmest/parmest.py
Outdated
) | ||
|
||
|
||
results = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to create a dataframe and then add rows as you go. Or you could preallocate the dataframe size because you know how many restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could even have your generate_samples function generate this empty dataframe.
Extend existing tests for parmest to include multistart, add. |
Models provided need to include bounds, add exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some more comments for you to consider are you continue to refine this.
upper_bound = np.array([parmest_model.find_component(name).ub for name in theta_names]) | ||
# Check if the lower and upper bounds are defined | ||
if np.any(np.isnan(lower_bound)) or np.any(np.isnan(upper_bound)): | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably already know this, but you will need to check all the errors are raised when expected.
) | ||
|
||
if self.method == "random": | ||
np.random.seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to skip setting the random seed if seed=None (default)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is none for all the functions I use that set seed, so if it receives seed = None, it would work as expected. Would skipping it still be best practice?
pyomo/contrib/parmest/parmest.py
Outdated
elif self.method == "latin_hypercube": | ||
# Generate theta values using Latin hypercube sampling | ||
sampler = scipy.stats.qmc.LatinHypercube(d=len(theta_names), seed=seed) | ||
samples = sampler.random(n=self.n_restarts+1)[1:] # Skip the first sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you skipping the first sample? Please explain in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a comment in code to explain as well. The first sample generated using qmc.sobol is always the origin (zero vector). I thought logic applied to all qmc methods, but no only sobol. So to get nonzero points, you need to skip first sample
@@ -921,6 +1020,116 @@ def theta_est( | |||
cov_n=cov_n, | |||
) | |||
|
|||
def theta_est_multistart( | |||
self, | |||
buffer=10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to explain the buffer in the doc string.
pyomo/contrib/parmest/parmest.py
Outdated
"Multistart is not supported in the deprecated parmest interface" | ||
) | ||
|
||
assert isinstance(self.n_restarts, int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace all of these with more descriptive error messages. Remember that we need tests for each error message.
# optimization. It will take the theta names and the initial theta values | ||
# and return a dictionary of theta names and their corresponding values. | ||
def _generate_initial_theta(self, parmest_model, seed=None, n_restarts=None, multistart_sampling_method=None, user_provided=None): | ||
if n_restarts == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like just sending a warning, and not returning. For example, n_restarts might be 1 by default. You should check if n_restarts is an int
as well. Then, if n_restarts
is 1, you should send a warning that the tool is intended for this number to be greater than one and solve as normal.
|
||
# Get the theta names and initial theta values | ||
theta_names = self._return_theta_names() | ||
initial_theta = [parmest_model.find_component(name)() for name in theta_names] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to use the suffix for this? The suffix value shouldn't change, but the theta value may if the model has been solved for some reason. I don't know if this is a potential issue but I think that grabbing these values from the suffixes would be more dummy-proof.
|
||
# Add the output info values to the dataframe, starting values as nan | ||
for i in range(len(theta_names)): | ||
df_multistart[f'converged_{theta_names[i]}'] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all string characters legal for pandas names? I would think so but this line seems (maybe) dangerous? For instance, what if we start getting into block structuring with multi-index parameters? We should make sure there is a test to ensure the system is robust. I believe I have this on the back burner to make one for Pyomo.DoE as well.
Fixes # .
Summary/Motivation:
Currently, the optimization is only done from a single initial value. This implementation adds the ability to specify multiple initial values using selected sampling techniques: from a random uniform distribution, using Latin Hypercube Sampling, or using Sobol Quasi-Monte Carlo sampling.
Changes proposed in this PR:
TODO before converting from draft:
Legal Acknowledgement
By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution: