-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
I recently updated my dplyr version (late to the party), and I'm hitting some increased memory usage. I've traced it back to how warnings are handled. Beginning in dplyr 1.1.1, I get the following output:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
identity <- function(x, warn) {
if (warn) {
warning("fake warning")
}
x
}
df <- tibble::tibble(e = rep(1, 1e8))
print(gc())
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 620941 33.2 1306337 69.8 1306337 69.8
#> Vcells 101049658 771.0 148096356 1129.9 101084366 771.3
df <- df %>% mutate(e = identity(e, warn = TRUE))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `e = identity(e, warn = TRUE)`.
#> Caused by warning in `identity()`:
#> ! fake warning
print(gc())
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 729780 39.0 1306337 69.8 1306337 69.8
#> Vcells 101287706 772.8 148096356 1129.9 102369359 781.1
rm(df)
print(gc())
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 729742 39.0 1306337 69.8 1306337 69.8
#> Vcells 101287654 772.8 148096356 1129.9 102369359 781.1Created on 2025-01-31 with reprex v2.0.2
If I restart R and rerun with warn = FALSE, the final memory usage is only 7.9 MB rather than 772.8 MB. Additionally, if I rewrite the mutate to avoid using a pipe via df <- mutate(df, e = identity(e, warn = TRUE)), the final memory usage is only 8.8 MB. Switching the pipe to |> also yields low memory usage. Under dplyr 1.1.0, the above reprex yields 18.8 MB.
I don't have a full appreciation for whether warnings would capture my environment, but I'm wondering if that's perhaps happening within either base R or dplyr's own record of warnings.