Skip to content

Conversation

@hfrick
Copy link
Member

@hfrick hfrick commented Oct 13, 2025

I've written about how we make the automatic calibration splits. The article covers the guiding principles of our approach, both for how and why (although not in academic paper depth).

The goal is also to let people understand in detail what happens for the sliding resamples. It has gotten relatively lengthy, though. I'm wondering if it should stay here or, e.g., go into a separate article, similar to how we split out the details on how we deal with censoring for the dynamic survival metrics. I think there's value in working through those details somewhere (other than the source code directly), but we could also experiment with collapsible text. Do you have any preferences or suggestions? Or do you think the length is fine as it is?

Copy link
Member

@EmilHvitfeldt EmilHvitfeldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very valuable and high quality work.

I think this would be a good place for this content. It would also fit in rsample pkgdown for that matter.

there are small comments and nit picks but the overall structure and style i find very nice.

I'm also fine with the length. It is a complicated topic and without the prose and diagrams it would be hard to understand

```

While preprocessing is the transformation of the predictors prior to a model fit, post-processing is the transformation of the predictions after the model fit. This could be as straightforward as limiting predictions to a certain range of values to as complicated as transforming them based on a separate calibration model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below we are using the term primary model which we just started using. I like the term but i think it be nice to properly define it, in terms of pre/model/post diagram/terminology

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I would add text to this paragraph after the first sentence and start the next paragraph with "An additional..."


A calibration model is used to model the relationship between the predictions based on the primary model and the true outcomes. An additional model means an additional chance to accidentially overfit. So when working with calibration, this is crucial: we cannot use the same data to fit our calibration model as we use to assess the combination of primary and calibration model. Using the same data to fit the primary model and the calibration model means the predictions used to fit the calibration model are re-predictions of the same observations used to fit the primary model. Hence they are closer to the true values than predictions on new data would be and the calibration model doesn't have accurate information to estimate the right trends (so that they then can be removed).

rsample provides a collection of functions to make resamples for empirical validation of prediction models. So far, the assumption was that the prediction model is the only model that needs fitting, i.e., a resample consists of an analysis set and an assessment set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a rsample doc page for analysis set and assessment set?


Let's start with the row-based splitting done by `sliding_window()`. We'll use a very small example dataset. This will make it easier to illustrate how the different subsets of the data are created but note that it is too small for real-world purposes. Let's use a data frame with 11 rows and say we want to use 5 for the analysis set, 3 for the assessment set, and leave a gap of 2 in between those two sets. We can make two such resamples from our data frame.

![](images/calibration-split-window.jpg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we do 4 for assessment set and a gap of 1?

right now there is very little air inside the assessment set with regards to the text

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm realizing that this would be a huge undertaking


![](images/calibration-split-index.jpg)

We still get two resamples, however, the analysis set contains only 4 rows because only those fall into the window defined by the index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be beneficial to add the missing value at 1 or 5 such that the analysis sets have different length?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or i guess, that isn't interesting at all. because then we do the same as the previous section

analysis(r_split)
```

The sliding splits slide over _the data_, meaning they slide over observed values of the index and they slide only within the boundaries of the observed index values. So here, we can only slide within [3, 6] and thus cannot fit an inner analysis set of three and a calibration set of two into it. As established earlier, we fall back onto an empty calibration set in such a situation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we mention a couple of times that we fall back. should we mention that we fall back with a warning?


![](images/calibration-split-period.jpg)

The principle of how to contruct a calibration split on the (outer) analysis set remains the same. The challenges of abstracting away from the rows, as illustrated for sliding over observed instances of an index also remain. Here, we slide over observed periods. We observe a period, if we observe an index within that period.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The principle of how to contruct a calibration split on the (outer) analysis set remains the same. The challenges of abstracting away from the rows, as illustrated for sliding over observed instances of an index also remain. Here, we slide over observed periods. We observe a period, if we observe an index within that period.
The principle of how to construct a calibration split on the (outer) analysis set remains the same. The challenges of abstracting away from the rows, as illustrated for sliding over observed instances of an index also remain. Here, we slide over observed periods. We observe a period, if we observe an index within that period.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We observe a period, if we observe an index within that period.

I'd reword here too. Maybe it is comma placement but it doesn't make immediate sense to me.

```

While preprocessing is the transformation of the predictors prior to a model fit, post-processing is the transformation of the predictions after the model fit. This could be as straightforward as limiting predictions to a certain range of values to as complicated as transforming them based on a separate calibration model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I would add text to this paragraph after the first sentence and start the next paragraph with "An additional..."

#| echo: false
#| out.height: 350
#| fig.align: "center"
knitr::include_graphics("images/analysis-calibration-assessment.jpg")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Chrome, I get:

image


If you compare a model with calibration to one without, and you use the same resamples, you are also using the same assessment sets.

"Taking data from the analysis set" means splitting up the analysis set to end up with ... an analysis set and a calibration set. Now we have two sets called analysis set, that's confusing. If we need to distinguish them, we'll refer to them as "outer" and "inner" analysis set for "before" and "after" the split for a calibration set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the notion of "two sets called analysis set" confusing. I think it should say that we "further split our initial analysis set into two partitions..." or something similar. I still like the inner/outer notation, but I think the process could be better worded here.

- If we can't make a calibration split based on these basic principles, we skip the calibration.

For sliding splits of ordered data, applying those principles is a bit more complex than for other types of splits as the outer split into analysis and assessment is already a bit more complex. We've laid out the details of this here for reference.
For bootstrap splits, we don't directly split the (outer) analysis set but rather sample the (inner) analysis set from the unique rows in the (outer) analysis set to avoid data leakage between (inner) analysis and calibration set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these to be two different sentences (based on the line break)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants