A follow-up issue for #967
Algo development has a dependency on the specific benchmarking to ensure repeatable behavior. This issue is to track and make sure that the canonical benchmark is well defined and repeatable. I think this may exist, but this issue would validate & make these benchmarks clear.
Additional info, GKE has internal benchmarks that run nightly, this issue would ensure that those nightly runs use the same benchmarks that we publish.