Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

232 functions used in formula= argument in defdata not available for functions defined out of global namespace #233

Conversation

kgoldfeld
Copy link
Owner

No description provided.

Keith Goldfeld and others added 7 commits July 18, 2024 21:57
…available-for-functions-defined-out-of-global-namespace' of https://github.com/kgoldfeld/simstudy into 232-functions-used-in-formula=-argument-in-defdata-not-available-for-functions-defined-out-of-global-namespace
@kgoldfeld
Copy link
Owner Author

@assignUser Everything seems to be good, except there is this very strange issue with one of the vignettes (double_dot_extension.Rmd) that fails to build during R CMD check. The message is kind of odd, because there is no times argument anywhere to be found:

Quitting from lines 157-159 [unnamed-chunk-11] (double_dot_extension.Rmd)
   Error: processing vignette 'double_dot_extension.Rmd' failed with diagnostics:
   invalid 'times' argument

When I knit the rmd file, everything is fine.

And I can run the code inside a function, so clearly the environment issue has been fixed here:

library(simstudy)

environment()
#> <environment: R_GlobalEnv>

test <- function() {
  
  print(environment())
  
  mult <- function(x) {
    2*x
  }
  
  defblk <- defData(varname = "blksize", 
                    formula = "mult(..sizes[1]) | .5 + mult(..sizes[2]) | .5", dist = "mixture")
  
  sizes <- c(2, 4)
  genData(1000, defblk)  
}

set.seed(123)
test()
#> <environment: 0x13b8f30e8>
#> Key: <id>
#>          id blksize
#>       <int>   <num>
#>    1:     1       4
#>    2:     2       8
#>    3:     3       4
#>    4:     4       8
#>    5:     5       8
#>   ---              
#>  996:   996       8
#>  997:   997       8
#>  998:   998       4
#>  999:   999       8
#> 1000:  1000       4

But, everything I have read indicates that this is possibly an environment issue with the devtools::check() environment. This makes sense since my changes have all been environment-related. But, I can't really understand the specific error message, nor have I been able to see where the environment is properly getting passed.

I'll keep looking, but this is the last remaining piece.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 75053fd is merged into main:

  • ✔️define_data: 69.9ms -> 70ms [-5.59%, +5.7%]
  • ✔️dist_beta: 227ms -> 229ms [-1.26%, +2.81%]
  • ✔️dist_binary: 13.7ms -> 13.7ms [-3.17%, +3.55%]
  • ❗🐌dist_binomial: 18ms -> 18.1ms [+0.1%, +1.04%]
  • ✔️dist_categorical: 75.4ms -> 83.1ms [-12.71%, +33.38%]
  • ✔️dist_exponential: 13.5ms -> 13.7ms [-0.71%, +3.92%]
  • ❗🐌dist_gamma: 21.9ms -> 22.1ms [+0.28%, +1.46%]
  • ✔️dist_mixture: 244ms -> 246ms [-1.5%, +3.3%]
  • ✔️dist_normal: 15.8ms -> 16.4ms [-4.77%, +12.09%]
  • ✔️gen_all_dists: 73.4ms -> 74.2ms [-0.68%, +2.82%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f3a06bb is merged into main:

  • ✔️define_data: 62.7ms -> 63.3ms [-3.43%, +5.16%]
  • ✔️dist_beta: 228ms -> 229ms [-0.85%, +2.44%]
  • ✔️dist_binary: 13.8ms -> 14ms [-0.61%, +3.35%]
  • ✔️dist_binomial: 18.6ms -> 18.4ms [-3.76%, +2.19%]
  • ✔️dist_categorical: 76.1ms -> 76ms [-1.1%, +0.84%]
  • ✔️dist_exponential: 13.7ms -> 13.6ms [-1.61%, +0.91%]
  • ✔️dist_gamma: 22.4ms -> 22.3ms [-1.51%, +0.9%]
  • ✔️dist_mixture: 245ms -> 244ms [-1.98%, +1.21%]
  • ✔️dist_normal: 16ms -> 16.2ms [-1.18%, +3.07%]
  • ✔️gen_all_dists: 71.5ms -> 72.3ms [-0.32%, +2.48%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@kgoldfeld
Copy link
Owner Author

I think I found the problem - there was some conflict with using the variable name n in .evalWith since we were using list2evn. There must be a a variable with that name in the devtools::check() environment. In any case,things checked OK on my laptop - hoping the tests here show the same thing. If they do, you should feel free to take a look and approve it seems OK. Thank goodness for all the tests.

Copy link
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good!

Thank goodness for all the tests.

Right? Tests give peace of mind :D In that regard it probably makes sense to add a test to make sure evaluation in non-global envs works for vars and functions?

@kgoldfeld
Copy link
Owner Author

Yes - I'll add those tests.

And also, I realize I am not totally comfortable with pulling in the entire environment - I would rather just pull in the functions. This way, if someone happens to have a variable that is the same as one that is used in the function, there is no conflict (as I experienced with the check environment). I am working on that, and will update .evalWith.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if c69acec is merged into main:

  • ✔️define_data: 69.8ms -> 69.8ms [-6.6%, +6.55%]
  • ✔️dist_beta: 228ms -> 228ms [-2.68%, +2.84%]
  • ❗🐌dist_binary: 14.1ms -> 14.5ms [+1.08%, +4.84%]
  • ✔️dist_binomial: 18.7ms -> 18.9ms [-2.15%, +4%]
  • ✔️dist_categorical: 75.9ms -> 76.6ms [-0.31%, +2.16%]
  • ✔️dist_exponential: 13.9ms -> 14ms [-1.87%, +2.93%]
  • ✔️dist_gamma: 22.7ms -> 22.8ms [-0.91%, +1.48%]
  • ✔️dist_mixture: 247ms -> 246ms [-2.03%, +0.9%]
  • ✔️dist_normal: 16.5ms -> 16.5ms [-3.12%, +3.55%]
  • ❗🐌gen_all_dists: 73.4ms -> 76.6ms [+1.28%, +7.52%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@kgoldfeld
Copy link
Owner Author

@assignUser A few things:

  • I updated .evalWith to pass only function objects from the parent environment. This seems a little safer. Unless you think this is problematic, I will go ahead and merge.
  • I received a message from the creator of package gtsummary that I need to "suggest" package broom.helper in order to use the package, which I do in at least one vignette. Indeed, a couple of days later, I received an e-mail from CRAN indicating that simstudy now has errors on CRAN Package Check Results. And those errors are related to calls to gtsummary. Which I guess highlights the downsides of using other packages that are evolving over time. Granted, this is only the second time in like 10 years (or however how old simstudy is) where I needed to make a change as a result of an external package change.
  • Finally, the error was generated on r-devel-linux-x86_64-fedora-clang. This led me to wonder how we determined which servers to check on the GitHub workflow. Shouldn't we be checking all the servers flavors that CRAN checks?

Copy link
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good. One minor suggestion.

R/internal_utility.R Outdated Show resolved Hide resolved
@assignUser
Copy link
Collaborator

Which I guess highlights the downsides of using other packages that are evolving over time.

Yep, it is always good to think about dependencies but you seem to have a good hand for that ;)

This led me to wonder how we determined which servers to check on the GitHub workflow

I looked at the recommendations from the r-lib people: https://github.com/r-lib/actions/blob/v2-branch/examples/check-standard.yaml and modified those a bit.

Testing all flavors is not really worth it for a package without any complex build dependencies like simstudy as it can produce frequent errors that just vanish after a few days (for devel) and are a waste to investigate. In addition there is no real way to accuratly reproduce the cran environments as they don't provide any form of reproducible script and a lot of things are manually tweaked, so even if you try to test on these there is no gurantee that it will accuratley reflect the results you will get on CRAN (yes, this is from painful experience...).

And even if some real issue pops up on CRAN you are not constrained by any release process (compared to arrow for example) that slows you down in deploying a fix.

Looking at the current error that happens on multiple versions and flavors I have no idea, haven't seen that one before. Looks like something broken with knitr? As it wasn't the case when initially submitted I am guessing this is on CRANs end.

@kgoldfeld
Copy link
Owner Author

Looking at the current error that happens on multiple versions and flavors I have no idea, haven't seen that one before. Looks like something broken with knitr? As it wasn't the case when initially submitted I am guessing this is on CRANs end.

I am quite sure that the error is related to the changes in package gtsummary - which should go away since I have modified the suggests as they have recommended:

I am preparing the submission of gtsummary v2.0 to CRAN, and I wanted to alert you to a change you may need to incorporate in simstudy.

The gtsummary package utilizes the broom.helpers package to prepare regression model results. In the updated version of the package, the broom.helpers package has moved from Imports to Suggests. As a result, you may also need to add broom.helpers to the Suggests field so the package is available when your vignettes are created.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if cc418dd is merged into main:

  • ✔️define_data: 66.4ms -> 68.2ms [-6.27%, +11.69%]
  • ✔️dist_beta: 229ms -> 229ms [-2.36%, +1.98%]
  • ✔️dist_binary: 13.9ms -> 14.7ms [-1.26%, +13.28%]
  • ✔️dist_binomial: 18.8ms -> 18.9ms [-1.53%, +3.06%]
  • ✔️dist_categorical: 76.9ms -> 76.3ms [-3.84%, +2.23%]
  • ✔️dist_exponential: 13.9ms -> 13.8ms [-2.57%, +1.72%]
  • ❗🐌dist_gamma: 22.1ms -> 22.5ms [+0.27%, +3.94%]
  • ✔️dist_mixture: 246ms -> 245ms [-2.18%, +1.61%]
  • ✔️dist_normal: 15.8ms -> 16.1ms [-0.24%, +3.97%]
  • ✔️gen_all_dists: 72.9ms -> 78.6ms [-4.84%, +20.38%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@kgoldfeld kgoldfeld merged commit 89e852a into main Jul 29, 2024
9 checks passed
@kgoldfeld kgoldfeld deleted the 232-functions-used-in-formula=-argument-in-defdata-not-available-for-functions-defined-out-of-global-namespace branch July 29, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

functions used in formula= argument in defData not available for functions defined out of global namespace
2 participants