-
Notifications
You must be signed in to change notification settings - Fork 1
Add experiment planner #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR integrates a new experimental design planner into the project by adding utilities for dynamic experiment loading, instrument noise simulation, and a Bayesian optimization engine (ExperimentDesigner) for neutron reflectometry, along with example models and tests; it also updates project dependencies to support new components. Class diagram for ExperimentDesigner and related typesclassDiagram
class ExperimentDesigner {
- experiment: Experiment
- problem: FitProblem
- parameters_of_interest: Optional[List[str]]
- parameters: Dict[str, SampleParameter]
- all_model_parameters: Dict[str, Parameter]
+ __init__(experiment, parameters_of_interest)
+ __repr__()
+ set_parameter_to_optimize(param_name, value)
+ get_parameters()
+ prior_entropy()
+ perform_mcmc(q_values, noisy_reflectivity, errors, mcmc_steps, burn_steps)
+ _extract_marginal_samples(mcmc_samples)
+ _calculate_posterior_entropy_mvn(mcmc_samples)
+ _calculate_posterior_entropy_kdn(mcmc_samples)
+ _calculate_posterior_entropy(mcmc_samples, method)
+ optimize(param_to_optimize, param_values, parameters_of_interest, realizations, mcmc_steps, entropy_method, counting_time)
}
class SampleParameter {
+ name: str
+ value: float
+ bounds: Tuple[float, float]
+ is_of_interest: bool = True
+ h_prior: float = 0
+ h_posterior: float = 0
}
ExperimentDesigner --> SampleParameter
ExperimentDesigner --> FitProblem
FitProblem --> Experiment
SampleParameter <|-- BaseModel
Class diagram for add_instrumental_noise utilityclassDiagram
class add_instrumental_noise {
+ q_values: np.ndarray
+ reflectivity_curve: np.ndarray
+ counting_time: float = 1.0
+ returns: Tuple[np.ndarray, np.ndarray]
}
Class diagram for create_fit_experiment model loaderclassDiagram
class create_fit_experiment {
+ q: np.ndarray
+ dq: np.ndarray
+ data: np.ndarray
+ errors: np.ndarray
+ returns: Experiment
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In expt_from_model_file, avoid injecting into sys.path—use importlib.util.spec_from_file_location to load the module directly by path, which is safer and doesn’t pollute the import namespace.
- The ExperimentDesigner.optimize (and related methods) use print statements for progress and results—consider switching to the logger or exposing a verbosity flag so output can be captured or silenced consistently.
- test_planner.py has no assertions and relies on random noise—add proper assert-based checks and seed the RNG (or allow injecting a seed) to make tests deterministic and verifiable.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In expt_from_model_file, avoid injecting into sys.path—use importlib.util.spec_from_file_location to load the module directly by path, which is safer and doesn’t pollute the import namespace.
- The ExperimentDesigner.optimize (and related methods) use print statements for progress and results—consider switching to the logger or exposing a verbosity flag so output can be captured or silenced consistently.
- test_planner.py has no assertions and relies on random noise—add proper assert-based checks and seed the RNG (or allow injecting a seed) to make tests deterministic and verifiable.
## Individual Comments
### Comment 1
<location> `analyzer_tools/planner/experiment_design.py:204-205` </location>
<code_context>
+ """
+ # Prepare FitProblem
+
+ probe = QProbe(q_values, q_values * 0.025, R=noisy_reflectivity, dR=errors)
+ expt = Experiment(sample=self.experiment.sample, probe=probe)
+ problem = FitProblem(expt)
+ problem.model_update()
</code_context>
<issue_to_address>
**suggestion:** Hardcoded Q resolution scaling may not generalize to all instruments.
Consider making the Q resolution scaling factor configurable to accommodate different experimental setups.
Suggested implementation:
```python
"""
# Prepare FitProblem
probe = QProbe(q_values, q_values * q_resolution_scale, R=noisy_reflectivity, dR=errors)
expt = Experiment(sample=self.experiment.sample, probe=probe)
problem = FitProblem(expt)
problem.model_update()
```
```python
errors: Error bars
mcmc_steps: Number of MCMC steps
burn_steps: Number of burn-in steps
q_resolution_scale: Scaling factor for Q resolution (default: 0.025)
Returns:
Tuple of (2D numpy array of MCMC samples, list of parameter names)
"""
```
```python
def your_function_name(self, q_values, noisy_reflectivity, errors, mcmc_steps, burn_steps, q_resolution_scale=0.025):
```
- Replace `your_function_name` with the actual function name in your code.
- Update all calls to this function elsewhere in your codebase to pass the appropriate `q_resolution_scale` if a value other than the default is needed.
</issue_to_address>
### Comment 2
<location> `analyzer_tools/planner/experiment_design.py:297-306` </location>
<code_context>
+ def _calculate_posterior_entropy_kdn(self, mcmc_samples: np.ndarray) -> float:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** KDE fallback to MVN may mask underlying issues.
Consider logging exception details or improving error handling to ensure issues with KDE or sample data are visible.
Suggested implementation:
```python
import logging
def _calculate_posterior_entropy_kdn(self, mcmc_samples: np.ndarray) -> float:
"""
Calculate posterior entropy using kernel density estimation.
Args:
mcmc_samples: 2D array of MCMC samples
Returns:
Posterior entropy in bits
"""
if mcmc_samples.ndim != 2 or mcmc_samples.shape[0] < 2:
```
```python
try:
kde = gaussian_kde(mcmc_samples.T)
log_probs = kde.logpdf(mcmc_samples.T)
entropy_nats = -np.mean(log_probs)
return entropy_nats / np.log(2)
except Exception as e:
# Fallback to multivariate normal entropy if KDE fails
logging.error("KDE failed in _calculate_posterior_entropy_kdn: %s", str(e), exc_info=True)
print("Warning: KDE failed, falling back to MVN entropy. Exception: {}".format(e))
cov_matrix = np.cov(mcmc_samples, rowvar=False)
cov_matrix += 1e-10 * np.eye(cov_matrix.shape[0]) # Regularize
entropy_nats = multivariate_normal.entropy(cov=cov_matrix)
return entropy_nats / np.log(2)
```
</issue_to_address>
### Comment 3
<location> `analyzer_tools/planner/instrument.py:23-24` </location>
<code_context>
+ Tuple of (noisy_reflectivity, errors)
+ """
+ # Simple noise model: relative error inversely proportional to sqrt(R * counting_time)
+ min_relative_error = 0.01 # 1% minimum
+ base_relative_error = 0.05 # 5% base error
+
+ # Calculate relative errors - more realistic model
</code_context>
<issue_to_address>
**suggestion:** Hardcoded error parameters may limit flexibility.
Consider making min_relative_error and base_relative_error configurable via function arguments to support varying instrument requirements.
Suggested implementation:
```python
Args:
q_values: Q values
reflectivity_curve: Noise-free reflectivity
counting_time: Relative counting time (affects noise level)
min_relative_error: Minimum relative error (default 0.01)
base_relative_error: Base relative error (default 0.05)
Returns:
Tuple of (noisy_reflectivity, errors)
"""
# Simple noise model: relative error inversely proportional to sqrt(R * counting_time)
# Calculate relative errors - more realistic model
relative_errors = np.maximum(
min_relative_error,
base_relative_error
/ np.sqrt(np.maximum(reflectivity_curve * counting_time, 1e-10)),
)
```
```python
```
You will also need to update the function definition to include `min_relative_error=0.01` and `base_relative_error=0.05` as keyword arguments. For example:
```python
def your_function_name(q_values, reflectivity_curve, counting_time, min_relative_error=0.01, base_relative_error=0.05):
```
And update any calls to this function elsewhere in your codebase if you want to specify these parameters.
</issue_to_address>
### Comment 4
<location> `analyzer_tools/planner/instrument.py:34-35` </location>
<code_context>
+ )
+
+ # Add Q-dependent component (higher Q = higher relative error)
+ q_factor = 1 + 0.5 * (q_values / np.max(q_values)) ** 2
+ relative_errors *= q_factor
+
+ # Absolute errors
</code_context>
<issue_to_address>
**issue (bug_risk):** Division by zero risk if q_values contains only zeros.
Add a condition to handle cases where np.max(q_values) is zero to prevent division by zero errors.
</issue_to_address>
### Comment 5
<location> `analyzer_tools/utils/model_utils.py:296-302` </location>
<code_context>
+ -------
+ Experiment
+ """
+ model_path = Path(model_file).absolute()
+ model_file = model_path.stem
+
+ # Add project root to path to allow importing from 'models'
+ model_dir = str(model_path.parent.resolve())
+ if model_dir not in sys.path:
+ sys.path.insert(0, model_dir)
+
</code_context>
<issue_to_address>
**suggestion:** Modifying sys.path at runtime can have unintended side effects.
This change may impact module imports elsewhere, particularly in persistent or multi-threaded contexts. Prefer importlib or context management to limit scope and avoid global effects.
```suggestion
import importlib.util
model_path = Path(model_file).absolute()
model_file = model_path.stem
# Dynamically import the model module from the given file path
spec = importlib.util.spec_from_file_location(model_file, str(model_path))
model_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(model_module)
```
</issue_to_address>
### Comment 6
<location> `test_planner.py:23` </location>
<code_context>
+
+
+# Create base experiment
+# TODO: write a test for this.
+experiment = expt_from_model_file(model_name, q_values, dq_values)
+
</code_context>
<issue_to_address>
**issue (testing):** Missing actual test assertions and structure.
Please refactor this file to use a test framework and add assertions to validate expected outputs and error conditions.
</issue_to_address>
### Comment 7
<location> `test_planner.py:40-42` </location>
<code_context>
+# TODO: write a test for this.
+print(designer._model_parameters_to_dict())
+
+designer.set_parameter_to_optimize("THF rho", 5.0)
+
+results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
</code_context>
<issue_to_address>
**suggestion (testing):** No test for invalid parameter names in set_parameter_to_optimize.
Add a test that sets a non-existent parameter name and checks for the expected exception to verify error handling in ExperimentDesigner.
```suggestion
designer.set_parameter_to_optimize("THF rho", 5.0)
# Test: setting a non-existent parameter should raise an exception
import pytest
with pytest.raises((ValueError, KeyError)):
designer.set_parameter_to_optimize("nonexistent_param", 1.0)
results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
```
</issue_to_address>
### Comment 8
<location> `test_planner.py:42-44` </location>
<code_context>
+
+designer.set_parameter_to_optimize("THF rho", 5.0)
+
+results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
+
+# Print results
</code_context>
<issue_to_address>
**suggestion (testing):** No test for edge cases in optimize (e.g., empty param_values, invalid param_to_optimize).
Add tests for cases like empty param_values, invalid param_to_optimize, and realizations=0 to verify proper error handling and output.
```suggestion
results = designer.optimize(param_to_optimize="THF rho", param_values=[1.5, 2.5,3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5], realizations=5)
# Edge case tests for optimize
import pytest
# Test: empty param_values
try:
designer.optimize(param_to_optimize="THF rho", param_values=[], realizations=5)
print("FAILED: optimize did not raise error for empty param_values")
except Exception as e:
print(f"PASSED: optimize raised error for empty param_values: {e}")
# Test: invalid param_to_optimize
try:
designer.optimize(param_to_optimize="INVALID_PARAM", param_values=[1.0, 2.0], realizations=5)
print("FAILED: optimize did not raise error for invalid param_to_optimize")
except Exception as e:
print(f"PASSED: optimize raised error for invalid param_to_optimize: {e}")
# Test: realizations=0
try:
designer.optimize(param_to_optimize="THF rho", param_values=[1.0, 2.0], realizations=0)
print("FAILED: optimize did not raise error for realizations=0")
except Exception as e:
print(f"PASSED: optimize raised error for realizations=0: {e}")
# Print results
```
</issue_to_address>
### Comment 9
<location> `test_planner.py:24` </location>
<code_context>
+
+# Create base experiment
+# TODO: write a test for this.
+experiment = expt_from_model_file(model_name, q_values, dq_values)
+
+designer = ExperimentDesigner(experiment)
</code_context>
<issue_to_address>
**suggestion (testing):** No test for error handling in expt_from_model_file.
Add a test for error cases in expt_from_model_file, including non-existent model files and invalid q/dq arrays, to verify error handling.
Suggested implementation:
```python
# Create base experiment
# TODO: write a test for this.
experiment = expt_from_model_file(model_name, q_values, dq_values)
designer = ExperimentDesigner(experiment)
import pytest
def test_expt_from_model_file_errors():
# Test non-existent model file
with pytest.raises(FileNotFoundError):
expt_from_model_file("models/non_existent_model", q_values, dq_values)
# Test invalid q_values (empty array)
with pytest.raises(ValueError):
expt_from_model_file(model_name, np.array([]), dq_values)
# Test invalid dq_values (wrong shape)
with pytest.raises(ValueError):
expt_from_model_file(model_name, q_values, np.array([1, 2, 3]))
# Test q_values and dq_values of different lengths
with pytest.raises(ValueError):
expt_from_model_file(model_name, np.array([0.1, 0.2]), np.array([0.1]))
```
Make sure that `expt_from_model_file` raises appropriate exceptions (`FileNotFoundError`, `ValueError`) for these error cases. If it does not, you will need to update its implementation accordingly.
</issue_to_address>
### Comment 10
<location> `analyzer_tools/planner/experiment_design.py:145-152` </location>
<code_context>
def get_parameters(self):
"""
Get the variable parameters and their prior.
"""
# Extract parameters from the problem
parameters: Dict[str, SampleParameter] = {}
for param in self.problem.parameters:
is_of_interest = (
self.parameters_of_interest is None
or param.name in self.parameters_of_interest
)
p = SampleParameter(
name=param.name,
value=param.value,
bounds=param.bounds,
is_of_interest=is_of_interest,
)
parameters[param.name] = p
# Warn if parameters of interest not found
if self.parameters_of_interest:
missing_params = [
param
for param in self.parameters_of_interest
if param not in self.base_params
]
if len(missing_params) > 0:
logger.warning(f"Parameters {missing_params} not found in model")
logger.info(f"Available parameters: {list(self.base_params.keys())}")
return parameters
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Simplify sequence length comparison ([`simplify-len-comparison`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/simplify-len-comparison/))
</issue_to_address>
### Comment 11
<location> `analyzer_tools/utils/model_utils.py:308-309` </location>
<code_context>
def expt_from_model_file(
model_file: str,
q: np.ndarray,
dq: np.ndarray,
reflectivity: np.ndarray = None,
errors: np.ndarray = None,
) -> Experiment:
"""
Load an Experiment from a model file.
Parameters
----------
model_file : str
Python file containing a create_fit_experiment(q, dq, data, errors) function
q : np.ndarray
Q values
dq : np.ndarray
Q resolution values (1 sigma)
reflectivity : np.ndarray
Reflectivity values
errors : np.ndarray
Reflectivity error values
Returns
-------
Experiment
"""
model_path = Path(model_file).absolute()
model_file = model_path.stem
# Add project root to path to allow importing from 'models'
model_dir = str(model_path.parent.resolve())
if model_dir not in sys.path:
sys.path.insert(0, model_dir)
model_module = importlib.import_module(model_file)
create_experiment = model_module.create_fit_experiment
experiment = create_experiment(q, dq, reflectivity, errors)
return experiment
</code_context>
<issue_to_address>
**suggestion (code-quality):** Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))
```suggestion
return create_experiment(q, dq, reflectivity, errors)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2 +/- ##
==========================================
+ Coverage 47.71% 55.96% +8.24%
==========================================
Files 12 17 +5
Lines 1203 1703 +500
==========================================
+ Hits 574 953 +379
- Misses 629 750 +121
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…error handling, importlib usage Co-authored-by: mdoucet <[email protected]>
…t structure with comprehensive assertions and edge case testing Co-authored-by: mdoucet <[email protected]>
…f-c821d45457a6 Address PR review comments: improve configurability, error handling, and testing
Summary by Sourcery
Add a new experiment planning feature set: dynamic model loading, a Bayesian optimizer with information‐gain calculations, noise simulation, an example model, and a basic test script
New Features:
Tests: