Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of ORCiD data into names vocabulary #158

Open
cc-a opened this issue Feb 26, 2025 · 1 comment
Open

Integration of ORCiD data into names vocabulary #158

cc-a opened this issue Feb 26, 2025 · 1 comment
Assignees

Comments

@cc-a
Copy link
Collaborator

cc-a commented Feb 26, 2025

This issue covers some exploration and experimentation with regards to how could use data from ORCiD with the names voculabulary.

The overall goal is to provide a good UX when adding creatibutor data to records that makes it easy to add researchers whilst providing comprehensive metadata. To this end we'd like the names vocabulary to be populated with data such that it:

  • covers relevant Imperial staff and students (fed directly from the Imperial directory).
  • covers the research community more broadly (likely fed from orcid data).
  • includes orcids and affiliation metadata for as many entries as possible.

InvenioRDM provides some basic support for importing data from the annual data dump that ORCiD provides (see names vocab docs).

The challenge in integrating both our internal data feed with the data from ORCiD however is that the two overlap as many researchers at Imperial of course already have an ORCiD.

Possible approaches

Some thoughts (including some assumptions that could be checked).

Naively Combining Both Data Sources

Will likely lead to duplicate entries for any Imperial researcher that has an ORCiD. Depending on the visibility of data in their ORCiD profile, the vocabulary entry from ORCiD may be missing affiliation data whilst the Imperial entry will be missing the ORCiD. So whichever of the 2 entries picked some metadata will be missing.

Combining Both Data Sources with Duplicates Removed

Essentially the same as above except we try to remove duplicate entries between the two. As far as I'm aware the only way to unambiguously cross link entries would be based on email address. The challenge here is that email addresses associated with ORCiDs may not be up to date and may not have been made publicly visible. In general I think a minority of ORCiDs have a publicly visible email associated so I would expect the number of entries we could cross link would be relatively small and the duplicates would remain. We could improve our ability to cross-link records if we allow/encourage/require users to link an their ORCiD accounts.

Use the Imperial Data Feed Enriched with Data from ORCiD

We'd only add name entries based on our internal data feed but whenever we add one we try to look for an associated ORCiD and include it. This is probably subject to the same set of caveats as cross linking entries above in that it will probably only be possible for a minority of entries. This approach should prevent duplicate entries but will obviously limit the ease with which researchers outside Imperial can be added to deposits and will likely lead to lower quality metadata for those.

Things to Try

  • Use the built-in functionality to load the full orcid data dump - does affiliation data get populated? Sample some Imperial researchers and see if Imperial is present as an affiliation.
  • Load up both the full orcid dump and imperial one at the same time. Do we get duplicates?
  • Does cross-linking entries by email address work ok? Possible using the orcid data dump and/or the orcid API? Can use my ORCiD as an example of one with a public imperial email (made public recently so may not appear in the data dump).
  • ORCiD has a concept of "verified email domains" can we use this in someway if the actual email address for an ORCiD isn't available.
@richard-jones
Copy link
Collaborator

@npapantonis @cc-a - to take this back to the working group for a view on importance of ORCID vs Imperial record

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants