Description
Background
Looking at the Bayesian r2_sample
. We then return the mean and standard deviation of the distribution with r2_score
.
Similarly, the supplement linked above introduces an approach for estimating a LOO-
For multivariate models,
Proposal
I propose the following improvements:
- Rename
r2_sample
tor2_score
so that we return the MCMC draws. (a user can trivially pass these tosummarize
to get whichever summary statistics they'd like). Note that this differs from Python ArviZ. - Generalize
r2_score
to support multiple outputs.
Misc
Something like loo_r2_score
would be nice to have as well, though it's worth thinking carefully about the API here and the core primitives we need to support similar functionality for other predictive metrics. Some metrics just compare the LOO-predictive mean with the data so just require posterior predictions, data, and log-likelihood evaluations and could accept an arbitrary user-provided binary metric (see e.g. https://mc-stan.org/loo/reference/loo_predictive_metric.html). Others like CRPS (https://mc-stan.org/loo/reference/crps.html) are computed pointwise and may require additional inputs (e.g. CRPS requires 2 independent predictive draws per posterior draw), and it's difficult to unify these into a single function. In each of these cases, there's a way to compute the metric both on the posterior and on the LOO-posteriors.