You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue covers some exploration and experimentation with regards to how could use data from ORCiD with the names voculabulary.
The overall goal is to provide a good UX when adding creatibutor data to records that makes it easy to add researchers whilst providing comprehensive metadata. To this end we'd like the names vocabulary to be populated with data such that it:
covers relevant Imperial staff and students (fed directly from the Imperial directory).
covers the research community more broadly (likely fed from orcid data).
includes orcids and affiliation metadata for as many entries as possible.
The challenge in integrating both our internal data feed with the data from ORCiD however is that the two overlap as many researchers at Imperial of course already have an ORCiD.
Possible approaches
Some thoughts (including some assumptions that could be checked).
Naively Combining Both Data Sources
Will likely lead to duplicate entries for any Imperial researcher that has an ORCiD. Depending on the visibility of data in their ORCiD profile, the vocabulary entry from ORCiD may be missing affiliation data whilst the Imperial entry will be missing the ORCiD. So whichever of the 2 entries picked some metadata will be missing.
Combining Both Data Sources with Duplicates Removed
Essentially the same as above except we try to remove duplicate entries between the two. As far as I'm aware the only way to unambiguously cross link entries would be based on email address. The challenge here is that email addresses associated with ORCiDs may not be up to date and may not have been made publicly visible. In general I think a minority of ORCiDs have a publicly visible email associated so I would expect the number of entries we could cross link would be relatively small and the duplicates would remain. We could improve our ability to cross-link records if we allow/encourage/require users to link an their ORCiD accounts.
Use the Imperial Data Feed Enriched with Data from ORCiD
We'd only add name entries based on our internal data feed but whenever we add one we try to look for an associated ORCiD and include it. This is probably subject to the same set of caveats as cross linking entries above in that it will probably only be possible for a minority of entries. This approach should prevent duplicate entries but will obviously limit the ease with which researchers outside Imperial can be added to deposits and will likely lead to lower quality metadata for those.
Things to Try
Use the built-in functionality to load the full orcid data dump - does affiliation data get populated? Sample some Imperial researchers and see if Imperial is present as an affiliation.
Load up both the full orcid dump and imperial one at the same time. Do we get duplicates?
Does cross-linking entries by email address work ok? Possible using the orcid data dump and/or the orcid API? Can use my ORCiD as an example of one with a public imperial email (made public recently so may not appear in the data dump).
ORCiD has a concept of "verified email domains" can we use this in someway if the actual email address for an ORCiD isn't available.
The text was updated successfully, but these errors were encountered:
This issue covers some exploration and experimentation with regards to how could use data from ORCiD with the names voculabulary.
The overall goal is to provide a good UX when adding creatibutor data to records that makes it easy to add researchers whilst providing comprehensive metadata. To this end we'd like the names vocabulary to be populated with data such that it:
InvenioRDM provides some basic support for importing data from the annual data dump that ORCiD provides (see names vocab docs).
The challenge in integrating both our internal data feed with the data from ORCiD however is that the two overlap as many researchers at Imperial of course already have an ORCiD.
Possible approaches
Some thoughts (including some assumptions that could be checked).
Naively Combining Both Data Sources
Will likely lead to duplicate entries for any Imperial researcher that has an ORCiD. Depending on the visibility of data in their ORCiD profile, the vocabulary entry from ORCiD may be missing affiliation data whilst the Imperial entry will be missing the ORCiD. So whichever of the 2 entries picked some metadata will be missing.
Combining Both Data Sources with Duplicates Removed
Essentially the same as above except we try to remove duplicate entries between the two. As far as I'm aware the only way to unambiguously cross link entries would be based on email address. The challenge here is that email addresses associated with ORCiDs may not be up to date and may not have been made publicly visible. In general I think a minority of ORCiDs have a publicly visible email associated so I would expect the number of entries we could cross link would be relatively small and the duplicates would remain. We could improve our ability to cross-link records if we allow/encourage/require users to link an their ORCiD accounts.
Use the Imperial Data Feed Enriched with Data from ORCiD
We'd only add name entries based on our internal data feed but whenever we add one we try to look for an associated ORCiD and include it. This is probably subject to the same set of caveats as cross linking entries above in that it will probably only be possible for a minority of entries. This approach should prevent duplicate entries but will obviously limit the ease with which researchers outside Imperial can be added to deposits and will likely lead to lower quality metadata for those.
Things to Try
The text was updated successfully, but these errors were encountered: