Merge pull request #76 from kgoldfeld/joss-submission

Joss submission
kgoldfeld · Oct 26, 2020 · 529ee03 · 529ee03
2 parents 75b5d8d + b2d3382
commit 529ee03
Show file tree

Hide file tree

Showing 12 changed files with 794 additions and 16 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -18,3 +18,5 @@
 ^tests/\.lintr$
 ^File_management$
 ^simstudy\.code-workspace$
+^codemeta\.json$
+^paper$
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,8 +1,8 @@
 Type: Package
 Package: simstudy
 Title: Simulation of Study Data
-Version: 0.2.1.9000
-Date: 2020-10-07
+Version: 0.2.2
+Date: 2020-10-26
 Authors@R: 
     c(person(given = "Keith",
              family = "Goldfeld",

diff --git a/NEWS.md b/NEWS.md
@@ -1,4 +1,5 @@
-# simstudy (development version)
+# simstudy 0.2.2
+* Improve documentation and vignettes.
 
 # simstudy 0.2.1
 * Add 'backports' for compatibility with R < 4.0 

diff --git a/R/add_correlated_data.R b/R/add_correlated_data.R
@@ -292,13 +292,15 @@ addCorFlex <- function(dt, defs, rho = 0, tau = NULL, corstr = "cs",
 #' @param method Two methods are available to generate correlated data. (1) "copula" uses
 #' the multivariate Gaussian copula method that is applied to all other distributions; this
 #' applies to all available distributions. (2) "ep" uses an algorithm developed by
-#' Emrich and Piedmonte.
+#' Emrich and Piedmonte (1991).
 #' @param formSpec The formula (as a string) that was used to generate the binary
 #' outcome in the `defDataAdd` statement. This is only necessary when method "ep" is
 #' requested.
 #' @param periodvar A string value that indicates the name of the field that indexes
 #' the repeated measurement for an individual unit. The value defaults to "period".
 #' @return Original data.table with added column(s) of correlated data
+#' @references Emrich LJ, Piedmonte MR. A Method for Generating High-Dimensional
+#' Multivariate Binary Variates. The American Statistician 1991;45:302-4.
 #' @examples
 #' # Wide example
 #'

diff --git a/R/generate_correlated_data.R b/R/generate_correlated_data.R
@@ -250,10 +250,12 @@ genCorFlex <- function(n, defs, rho = 0, tau = NULL, corstr = "cs", corMatrix =
 #' @param method Two methods are available to generate correlated data. (1) "copula" uses
 #' the multivariate Gaussian copula method that is applied to all other distributions; this
 #' applies to all available distributions. (2) "ep" uses an algorithm developed by
-#' Emrich and Piedmonte.
+#' Emrich and Piedmonte (1991).
 #' @param idname Character value that specifies the name of the id variable.
 #'
 #' @return data.table with added column(s) of correlated data
+#' @references Emrich LJ, Piedmonte MR. A Method for Generating High-Dimensional
+#' Multivariate Binary Variates. The American Statistician 1991;45:302-4.
 #' @examples
 #' set.seed(23432)
 #' l <- c(8, 10, 12)

diff --git a/README.Rmd b/README.Rmd
@@ -16,6 +16,7 @@ knitr::opts_chunk$set(
 <!-- badges: start -->
 [![R build status](https://github.com/kgoldfeld/simstudy/workflows/R-CMD-check/badge.svg?branch=main)](https://github.com/kgoldfeld/simstudy/actions){target="_blank"}
 [![CRAN status](https://www.r-pkg.org/badges/version/simstudy)](https://CRAN.R-project.org/package=simstudy){target="_blank"}
+[![status](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424/status.svg)](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424){target="_blank"}
 [![CRAN downloads](https://cranlogs.r-pkg.org/badges/grand-total/simstudy)](https://CRAN.R-project.org/package=simstudy){target="_blank"}
 [![codecov](https://codecov.io/gh/kgoldfeld/simstudy/branch/main/graph/badge.svg)](https://codecov.io/gh/kgoldfeld/simstudy){target="_blank"}
 [![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://www.tidyverse.org/lifecycle/#stable){target="_blank"}
@@ -25,7 +26,8 @@ The `simstudy` package is a collection of functions that allow users to generate
 
 Simulation using `simstudy` has two fundamental steps. The user (1) **defines** the data elements of a data set and (2) **generates** the data based on these definitions. Additional functionality exists to simulate observed or randomized **treatment assignment/exposures**, to create **longitudinal/panel** data, to create **multi-level/hierarchical** data, to create datasets with **correlated variables** based on a specified covariance structure, to **merge** datasets, to create data sets with **missing** data, and to create non-linear relationships with underlying **spline** curves.
 
-The overarching philosophy of `simstudy` is to create data generating processes that mimic the typical models used to fit those types of data. So, the parameterization of some of the data generating processes may not follow the standard parameterizations for the specific distributions. For example, in `simstudy` *gamma*-distributed data are generated based on the specification of a mean &mu; (or log(&mu;)) and a dispersion $d$, rather than shape &alpha; and rate &beta; parameters that more typically characterize the *gamma* distribution. When we estimate the parameters, we are modeling &mu; (or some function of &mu;), so we should explicitly recover the `simstudy` parameters used to generate the model, thus illuminating the relationship between the underlying data generating processes and the models.
+The overarching philosophy of `simstudy` is to create data generating processes that mimic the typical models used to fit those types of data. So, the parameterization of some of the data generating processes may not follow the standard parameterizations for the specific distributions. For example, in `simstudy` *gamma*-distributed data are generated based on the specification of a mean &mu; (or log(&mu;)) and a dispersion $d$, rather than shape &alpha; and rate &beta; parameters that more typically characterize the *gamma* distribution. When we estimate the parameters, we are modeling &mu; (or some function of &mu;), so we should explicitly recover the `simstudy` parameters used to generate the model, thus illuminating the relationship between the underlying data generating processes and the models. For more details on the
+package, use cases, examples, and function reference see the [documentation page](https://kgoldfeld.github.io/simstudy/articles/simstudy.html).
 
 
 ## Installation

diff --git a/README.md b/README.md
@@ -9,6 +9,7 @@ simstudy
 status](https://github.com/kgoldfeld/simstudy/workflows/R-CMD-check/badge.svg?branch=main)](https://github.com/kgoldfeld/simstudy/actions)
 [![CRAN
 status](https://www.r-pkg.org/badges/version/simstudy)](https://CRAN.R-project.org/package=simstudy)
+[![status](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424/status.svg)](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424)
 [![CRAN
 downloads](https://cranlogs.r-pkg.org/badges/grand-total/simstudy)](https://CRAN.R-project.org/package=simstudy)
 [![codecov](https://codecov.io/gh/kgoldfeld/simstudy/branch/main/graph/badge.svg)](https://codecov.io/gh/kgoldfeld/simstudy)
@@ -48,7 +49,9 @@ typically characterize the *gamma* distribution. When we estimate the
 parameters, we are modeling μ (or some function of μ), so we should
 explicitly recover the `simstudy` parameters used to generate the model,
 thus illuminating the relationship between the underlying data
-generating processes and the models.
+generating processes and the models. For more details on the package,
+use cases, examples, and function reference see the [documentation
+page](https://kgoldfeld.github.io/simstudy/articles/simstudy.html).
 
 ## Installation
 
@@ -83,16 +86,16 @@ dd <- trtAssign(dd, nTrt = 4, grpName = "grp", balanced = TRUE)
 dd
 #>       id         x        y grp
 #>   1:   1 11.191960 8.949389   4
-#>   2:   2 10.418375 7.372060   2
-#>   3:   3  8.512109 6.925844   4
+#>   2:   2 10.418375 7.372060   4
+#>   3:   3  8.512109 6.925844   3
 #>   4:   4 11.361632 9.850340   4
-#>   5:   5  9.928811 6.515463   2
+#>   5:   5  9.928811 6.515463   4
 #>  ---                           
-#> 246: 246  8.220609 7.898416   4
-#> 247: 247  8.531483 8.681783   4
-#> 248: 248 10.507370 8.552350   4
+#> 246: 246  8.220609 7.898416   2
+#> 247: 247  8.531483 8.681783   2
+#> 248: 248 10.507370 8.552350   3
 #> 249: 249  8.621339 6.652300   1
-#> 250: 250  9.508164 7.083845   4
+#> 250: 250  9.508164 7.083845   3
 ```
 
 ## Contributing & Support