Commit cb93fb0
* Tune preference comparison example hyperparameters
The preference comparison example previously did not show significant
learning. It usually ended with a reward < -1000, which can be
considered "failed" in the Pendulum environment. This commit updates the
parameters to avoid this. It could be argued that hyperparameter
optimization for the examples is bad, since it gives a skewed impression
of the library. I think as long as we acknowledge that the parameters
were optimized this is okay though, and it is much nicer if we have a
working example as a starting point.
I have tuned the hyperparameters with a mix of syne_tune [1] and manual
tuning. Since the training can have very high variance, I repeated each
training run multiple (up to 100) times and used multi-fidelity
optimization (PASHA and ASHA) to find a good configuration. I set the
objective to the 90% upper-confidence-bound of the mean final-evaluation
reward over all the training runs.
Unfortunately the optimization process was a bit messy since I was just
getting started with syne_tune, so it is difficult to provide a full
script to cleanly reproduce the results. I used something akin to this
configuration space:
```py
import syne_tune.config_space as cs
config_space = {
"reward_epochs": cs.randint(1, 20),
"ppo_clip_range": cs.uniform(0.0, 0.3),
"ppo_ent_coef": cs.uniform(0.0, 0.01),
"ppo_gae_lambda": cs.uniform(0.9, 0.99),
"ppo_n_epochs": cs.randint(5, 25),
"discount_factor": cs.uniform(0.9, 1.0),
"use_sde": cs.choice(["true", "false"]),
"sde_sample_freq": cs.randint(1, 5),
"ppo_lr": cs.loguniform(1e-4, 5e-3),
"exploration_frac": cs.uniform(0, 0.1),
"num_iterations": cs.randint(5, 100),
"initial_comparison_frac": cs.uniform(0.05, 0.25),
"initial_epoch_multiplier": cs.randint(1, 4),
"query_schedule": cs.choice(["constant", "hyperbolic", "inverse_quadratic"]),
"total_timesteps": 50_000,
"total_comparisons": 200,
"max_evals": 100,
}
```
and the configuration I selected in the end is this one
```py
{
"reward_epochs": 10,
"ppo_clip_range": 0.1,
"ppo_ent_coef": 0.01,
"ppo_gae_lambda": 0.90,
"ppo_n_epochs": 15,
"discount_factor": 0.97,
"use_sde": "false",
"sde_sample_freq": 1,
"ppo_lr": 2e-3,
"exploration_frac": 0.05,
"num_iterations": 60,
"initial_comparison_frac": 0.10,
"initial_epoch_multiplier": 4,
"query_schedule": "hyperbolic",
}
```
Here are the (rounded) evaluation results of the 100 runs of the
configuration:
```
[ -155, -100, -132, -150, -164, -110, -195, -194, -168,
-148, -177, -113, -176, -205, -106, -169, -123, -104,
-151, -169, -157, -184, -130, -151, -108, -111, -202,
-142, -198, -138, -178, -104, -174, -149, -113, -107,
-122, -198, -428, -221, -217, -141, -192, -158, -139,
-219, -230, -209, -141, -173, -118, -176, -108, -290,
-810, -182, -159, -178, -247, -205, -165, -672, -250,
-138, -166, -282, -133, -147, -111, -145, -148, -116,
-436, -140, -190, -137, -194, -177, -193, -1043, -243,
-183, -156, -183, -184, -186, -141, -144, -194, -112,
-178, -146, -140, -130, -143, -618, -402, -236, -171,
-163]
```
Mean (before rounding): 196.49
Fraction of runs <-800: 2/100
Fraction of runs >-200: 79/100
This is far from perfect. I didn't include all parameters in the
optimization. The 50,000 steps and 200 queries are likely overkill.
Still, it significantly improves the example that users see first.
I only changed the example on the main documentation page, not the
notebooks. Those are already out of sync with the main example, so I am
not sure how best to proceed with them.
[1] https://github.com/awslabs/syne-tune
* Add changes to notebook
* Change number notation in cell.
* clear outputs from notebook
* remove empty code cell
* fix variable name in preference_comparison
* Run black
* remove whitespace
---------
Co-authored-by: Timo Kaufmann <[email protected]>
1 parent 5c85ebf commit cb93fb0
File tree
2 files changed
+55
-31
lines changed- docs
- algorithms
- tutorials
2 files changed
+55
-31
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
53 | | - | |
| 54 | + | |
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
| |||
62 | 63 | | |
63 | 64 | | |
64 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
65 | 72 | | |
66 | 73 | | |
67 | 74 | | |
68 | 75 | | |
69 | 76 | | |
70 | 77 | | |
71 | | - | |
| 78 | + | |
72 | 79 | | |
73 | 80 | | |
74 | 81 | | |
75 | 82 | | |
76 | 83 | | |
77 | 84 | | |
78 | | - | |
| 85 | + | |
79 | 86 | | |
80 | 87 | | |
81 | 88 | | |
82 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
83 | 92 | | |
84 | | - | |
| 93 | + | |
85 | 94 | | |
86 | | - | |
87 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
88 | 99 | | |
89 | 100 | | |
90 | 101 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
27 | 26 | | |
28 | 27 | | |
29 | 28 | | |
30 | 29 | | |
31 | 30 | | |
32 | | - | |
33 | 31 | | |
34 | 32 | | |
35 | 33 | | |
| |||
54 | 52 | | |
55 | 53 | | |
56 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
57 | 61 | | |
58 | 62 | | |
59 | 63 | | |
| |||
64 | 68 | | |
65 | 69 | | |
66 | 70 | | |
67 | | - | |
68 | | - | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
69 | 76 | | |
70 | 77 | | |
71 | 78 | | |
72 | 79 | | |
73 | 80 | | |
74 | 81 | | |
75 | 82 | | |
76 | | - | |
| 83 | + | |
77 | 84 | | |
78 | 85 | | |
79 | 86 | | |
80 | 87 | | |
81 | 88 | | |
82 | 89 | | |
83 | | - | |
| 90 | + | |
84 | 91 | | |
85 | 92 | | |
86 | 93 | | |
87 | 94 | | |
88 | 95 | | |
89 | 96 | | |
90 | 97 | | |
91 | | - | |
| 98 | + | |
| 99 | + | |
92 | 100 | | |
93 | 101 | | |
94 | 102 | | |
| |||
106 | 114 | | |
107 | 115 | | |
108 | 116 | | |
109 | | - | |
110 | | - | |
| 117 | + | |
| 118 | + | |
111 | 119 | | |
112 | 120 | | |
113 | 121 | | |
| |||
126 | 134 | | |
127 | 135 | | |
128 | 136 | | |
129 | | - | |
130 | 137 | | |
131 | 138 | | |
132 | 139 | | |
133 | 140 | | |
134 | 141 | | |
135 | 142 | | |
136 | 143 | | |
137 | | - | |
| 144 | + | |
138 | 145 | | |
139 | 146 | | |
140 | 147 | | |
| |||
143 | 150 | | |
144 | 151 | | |
145 | 152 | | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | 153 | | |
150 | | - | |
151 | | - | |
152 | 154 | | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
153 | 161 | | |
154 | | - | |
155 | | - | |
| 162 | + | |
156 | 163 | | |
157 | | - | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
158 | 169 | | |
159 | | - | |
| 170 | + | |
160 | 171 | | |
161 | 172 | | |
162 | 173 | | |
| |||
174 | 185 | | |
175 | 186 | | |
176 | 187 | | |
177 | | - | |
178 | | - | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
179 | 192 | | |
180 | 193 | | |
181 | 194 | | |
| |||
198 | 211 | | |
199 | 212 | | |
200 | 213 | | |
201 | | - | |
| 214 | + | |
202 | 215 | | |
203 | 216 | | |
204 | 217 | | |
| |||
0 commit comments