We have two studies with CDEs that we can use to start implementing this:
Note that these are strictly study/CRF relationships, however, we could include the CRF entirely within the data dictionary -- but if a study has both VLMD and CRFs, then we would expect to duplicate information between the two.
We should add the ability to pull study/CRF and variable/CDE mappings from other sources, such as a Google Sheet or the private GitHub repo, as we will need this until we manage to get all the mappings into the GitHub repository (https://github.com/uc-cdis/heal-data-dictionaries/issues/381).
These will show up in the output dbGaP XML files in three ways:
- We could put in a top-level property storing the list of HDPCDEs for the study (the study/CRF mappings).
- We could provide individual mappings at the variable level.
- We could provide entire data dictionaries reproduced from the CRFs (i.e. if we know study A with 100 variables uses CRF X with 5 variables, we could add those five variables to the 100 variables in study A).
Unless we want to do more work in Dug (or replace dbGaP XML with something else?), we will need to produce a KGX file with all the CRFs (#17).
We have two studies with CDEs that we can use to start implementing this:
Note that these are strictly study/CRF relationships, however, we could include the CRF entirely within the data dictionary -- but if a study has both VLMD and CRFs, then we would expect to duplicate information between the two.
We should add the ability to pull study/CRF and variable/CDE mappings from other sources, such as a Google Sheet or the private GitHub repo, as we will need this until we manage to get all the mappings into the GitHub repository (https://github.com/uc-cdis/heal-data-dictionaries/issues/381).
These will show up in the output dbGaP XML files in three ways:
Unless we want to do more work in Dug (or replace dbGaP XML with something else?), we will need to produce a KGX file with all the CRFs (#17).