Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
91a61a2
Initial support for DOI parsing from different registration agencies
whomingbird Oct 20, 2025
7c82f7a
add an author class
whomingbird Oct 20, 2025
e6ade5e
implement CrossrefParser to fetch DOI metadata from DOI.org in CSL JS…
whomingbird Oct 20, 2025
5c40155
correct module definition in DataciteParser for Zeitwerk autoloading
whomingbird Oct 20, 2025
c25d0cb
Extracts publication dates robustly (issued, published, published-pri…
whomingbird Oct 20, 2025
9b7824f
add missing VCRs
whomingbird Oct 20, 2025
0f07613
First try to use `Seek::Doi::Parser` to replace `DOI::Query.new` from…
whomingbird Oct 20, 2025
45710ed
Refactor DOI parsers to centralize OpenStruct creation
whomingbird Oct 21, 2025
9e6ad72
Handle DOI author edge case when only "name" is provided
whomingbird Oct 21, 2025
fe58523
update tests in publications_controller_test.rb
whomingbird Oct 21, 2025
40b178b
Merge branch 'seek-1.17' into support-datacite-doi
whomingbird Oct 21, 2025
90e2082
use Crossref API for richer metadata and improve book-chapter citation
whomingbird Oct 22, 2025
6c21a9f
add/update tests for the previous commit
whomingbird Oct 22, 2025
42d67f2
twist citations support for 'book' and 'journal-article' type
whomingbird Oct 22, 2025
0a4fa7d
improve journal-article citation formatting
whomingbird Oct 22, 2025
689f00b
improve proceeding article citation formatting
whomingbird Oct 23, 2025
bab8549
improve proceeding citation formatting
whomingbird Oct 23, 2025
bf68493
improve preprint citation formatting
whomingbird Oct 23, 2025
9af0468
monograph is treated as book
whomingbird Oct 23, 2025
7e019f7
rename test
whomingbird Oct 24, 2025
3f37419
typo
whomingbird Oct 24, 2025
73f6051
add Datacite DOI parser, add tests
whomingbird Oct 28, 2025
496d23d
editors should return as a string type
whomingbird Oct 28, 2025
469b79d
better error handling in CrossrefParser
whomingbird Oct 29, 2025
3f2464e
Refactor DOI parsers: centralize shared logic
whomingbird Oct 29, 2025
b8f61fa
remove gem `doi_query_tool`
whomingbird Oct 30, 2025
faa4b56
create a base_exception.rb
whomingbird Oct 30, 2025
558ecdd
Better error handling and more informative error messages.
whomingbird Oct 30, 2025
9b99823
changing code style
whomingbird Oct 30, 2025
3db2d7b
Better error handling
whomingbird Oct 30, 2025
b78f4d8
update publication.rb to use new exceptions
whomingbird Oct 31, 2025
24d51c2
update VCR files for publication controller.
whomingbird Oct 31, 2025
11bfdac
Merge branch 'seek-1.17' into support-datacite-doi
whomingbird Oct 31, 2025
b4fabd3
remove require 'doi/record'
whomingbird Oct 31, 2025
81c9208
clean test
whomingbird Oct 31, 2025
829d13f
fix a test
whomingbird Oct 31, 2025
33de501
update according to Copilot's review
whomingbird Nov 6, 2025
8a17a0a
improve test name clarity
whomingbird Nov 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ gem 'csv'
gem 'daemons'
gem 'delayed_job_active_record'
gem 'docsplit', git: 'https://github.com/documentcloud/docsplit.git'
gem 'doi_query_tool', git: 'https://github.com/seek4science/DOI-query-tool.git'
gem 'doorkeeper'
gem 'dotenv-rails'
gem 'equivalent-xml'
Expand Down
8 changes: 0 additions & 8 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,6 @@ GIT
specs:
docsplit (0.7.6)

GIT
remote: https://github.com/seek4science/DOI-query-tool.git
revision: 7c2ba934885ad404d964cd371e236d3348efb63c
specs:
doi_query_tool (1.0.1)
libxml-ruby (>= 2.6.0)

GIT
remote: https://github.com/seek4science/zenodo-client.git
revision: 72c19d105ec2aab5298bf6c5bd50d30c798f169e
Expand Down Expand Up @@ -1076,7 +1069,6 @@ DEPENDENCIES
database_cleaner
delayed_job_active_record
docsplit!
doi_query_tool!
doorkeeper
dotenv-rails
equivalent-xml
Expand Down
11 changes: 5 additions & 6 deletions app/helpers/publications_helper.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
require 'doi/record'

module PublicationsHelper
def author_to_person_options(selected_id, suggestion)
projects = Project.includes(:people)
Expand All @@ -14,13 +12,14 @@ def author_to_person_options(selected_id, suggestion)
end

def publication_registered_mode(mode)
if mode == Publication::REGISTRATION_BY_PUBMED
case mode
when Publication::REGISTRATION_BY_PUBMED
'by PubMed ID'
elsif mode == Publication::REGISTRATION_BY_DOI
when Publication::REGISTRATION_BY_DOI
'by DOI'
elsif mode == Publication::REGISTRATION_MANUALLY
when Publication::REGISTRATION_MANUALLY
'manually'
elsif mode == Publication::REGISTRATION_FROM_BIBTEX
when Publication::REGISTRATION_FROM_BIBTEX
'imported from a bibtex file'
else
`unknown`
Expand Down
101 changes: 53 additions & 48 deletions app/models/publication.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
require 'libxml'
require 'seek/doi/base_exception'

class Publication < ApplicationRecord
include Seek::Rdf::RdfGeneration
Expand Down Expand Up @@ -48,7 +49,7 @@ class Publication < ApplicationRecord
has_one :content_blob, ->(r) { where('content_blobs.asset_version =? AND deleted=?', r.version, false) }, as: :asset, foreign_key: :asset_id

explicit_versioning(:version_column => "version", sync_ignore_columns: ['license','other_creators']) do
acts_as_versioned_resource
acts_as_versioned_resource
has_one :content_blob, -> (r) { where('content_blobs.asset_version =? AND content_blobs.asset_type =?', r.version, r.parent.class.name) },
:primary_key => :publication_id,:foreign_key => :asset_id

Expand Down Expand Up @@ -188,11 +189,11 @@ def extract_metadata(pubmed_id, doi)
end

if reference.respond_to?(:pubmed)
result = extract_pubmed_metadata(reference)
extract_pubmed_metadata(reference)
else
result = extract_doi_metadata(reference)
extract_doi_metadata(reference)
end

reference.authors.each_with_index do |author, index|
publication_authors.build(first_name: author.first_name,
last_name: author.last_name,
Expand Down Expand Up @@ -230,7 +231,7 @@ def extract_doi_metadata(doi_record)
self.citation = doi_record.citation
self.publisher = doi_record.publisher
self.booktitle = doi_record.booktitle
self.editor = doi_record.editors.map(&:name).join(" and ")
self.editor = doi_record.editors
end

# @param bibtex_record BibTeX entity from bibtex-ruby gem
Expand All @@ -239,7 +240,7 @@ def extract_bibtex_metadata(bibtex_record)
self.publication_type_id = PublicationType.get_publication_type_id(bibtex_record)
self.title = bibtex_record[:title].try(:to_s).gsub /{|}/, '' unless bibtex_record[:title].nil?
self.title = bibtex_record[:chapter].try(:to_s).gsub /{|}/, '' if (self.title.nil? && !bibtex_record[:chapter].nil?)
self.title += ( ":"+ (bibtex_record[:subtitle].try(:to_s).gsub /{|}/, '')) unless bibtex_record[:subtitle].nil?
self.title += ( ":#{(bibtex_record[:subtitle].try(:to_s).gsub /{|}/, '')}") unless bibtex_record[:subtitle].nil?

if check_bibtex_file (bibtex_record)
self.abstract = bibtex_record[:abstract].try(:to_s)
Expand Down Expand Up @@ -345,31 +346,31 @@ def generate_citation(bibtex_record)

if publication_type.is_journal?
self.citation += self.journal.nil? ? '':self.journal
self.citation += volume.blank? ? '': ' '+volume
self.citation += number.nil? ? '' : '('+ number+')'
self.citation += pages.blank? ? '' : (':'+pages)
self.citation += volume.blank? ? '': " #{volume}"
self.citation += number.nil? ? '' : "(#{number})"
self.citation += pages.blank? ? '' : (":#{pages}")
=begin
unless year.nil?
self.citation += year.nil? ? '' : (' '+year)
end
=end
elsif publication_type.is_booklet?
self.citation += howpublished.blank? ? '': ''+ howpublished
self.citation += address.nil? ? '' : (', '+ address)
self.citation += howpublished.blank? ? '': "#{howpublished}"
self.citation += address.nil? ? '' : (", #{address}")
=begin
unless year.nil?
self.citation += year.nil? ? '' : (' '+year)
end
=end
elsif publication_type.is_inbook?
self.citation += self.booktitle.nil? ? '' : ('In '+ self.booktitle)
self.citation += volume.blank? ? '' : (', volume '+ volume)
self.citation += series.blank? ? '' : (' of '+series)
self.citation += pages.blank? ? '' : (', '+ page_or_pages + ' '+pages)
self.citation += self.editor.blank? ? '' : (', Eds: '+ self.editor)
self.citation += self.publisher.blank? ? '' : (', '+ self.publisher)
self.citation += self.booktitle.nil? ? '' : ("In #{self.booktitle}")
self.citation += volume.blank? ? '' : (", volume #{volume}")
self.citation += series.blank? ? '' : (" of #{series}")
self.citation += pages.blank? ? '' : (", #{page_or_pages} #{pages}")
self.citation += self.editor.blank? ? '' : (", Eds: #{self.editor}")
self.citation += self.publisher.blank? ? '' : (", #{self.publisher}")
unless address.nil? || (self.booktitle.try(:include?, address))
self.citation += address.nil? ? '' : (', '+ address)
self.citation += address.nil? ? '' : (", #{address}")
end
=begin
unless self.booktitle.try(:include?, year)
Expand All @@ -380,14 +381,14 @@ def generate_citation(bibtex_record)
=end
elsif publication_type.is_inproceedings? || publication_type.is_incollection? || publication_type.is_book?
# InProceedings / InCollection
self.citation += self.booktitle.nil? ? '' : ('In '+ self.booktitle)
self.citation += volume.blank? ? '' : (', vol. '+ volume)
self.citation += series.blank? ? '' : (' of '+series)
self.citation += pages.blank? ? '' : (', '+ page_or_pages + ' '+pages)
self.citation += self.editor.blank? ? '' : (', Eds: '+ self.editor)
self.citation += self.publisher.blank? ? '' : (', '+ self.publisher)
self.citation += self.booktitle.nil? ? '' : ("In #{self.booktitle}")
self.citation += volume.blank? ? '' : (", vol. #{volume}")
self.citation += series.blank? ? '' : (" of #{series}")
self.citation += pages.blank? ? '' : (", #{page_or_pages} #{pages}")
self.citation += self.editor.blank? ? '' : (", Eds: #{self.editor}")
self.citation += self.publisher.blank? ? '' : (", #{self.publisher}")
unless address.nil? || (self.booktitle.try(:include?, address))
self.citation += address.nil? ? '' : (', '+ address)
self.citation += address.nil? ? '' : (", #{address}")
end
=begin
unless self.booktitle.try(:include?, year)
Expand All @@ -398,19 +399,19 @@ def generate_citation(bibtex_record)
=end
elsif publication_type.is_phd_thesis? || publication_type.is_masters_thesis? || publication_type.is_bachelor_thesis?
#PhD/Master Thesis
self.citation += school.nil? ? '' : (' '+ school)
self.citation += school.nil? ? '' : (" #{school}")
self.errors.add(:base,'A thesis need to have a school') if school.nil?
self.citation += year.nil? ? '' : (', '+ year)
self.citation += tutor.nil? ? '' : (', '+ tutor+'(Tutor)')
self.citation += tutorhits.nil? ? '' : (', '+ tutorhits+'(HITS Tutor)')
self.citation += url.nil? ? '' : (', '+ url)
self.citation += year.nil? ? '' : (", #{year}")
self.citation += tutor.nil? ? '' : (", #{tutor}(Tutor)")
self.citation += tutorhits.nil? ? '' : (", #{tutorhits}(HITS Tutor)")
self.citation += url.nil? ? '' : (", #{url}")
elsif publication_type.is_proceedings?
# Proceedings are conference proceedings, it has no authors but editors
# Book
self.journal = self.title
self.citation += volume.blank? ? '' : ('vol. '+ volume)
self.citation += series.blank? ? '' : (' of '+series)
self.citation += self.publisher.blank? ? '' : (', '+ self.publisher)
self.citation += volume.blank? ? '' : ("vol. #{volume}")
self.citation += series.blank? ? '' : (" of #{series}")
self.citation += self.publisher.blank? ? '' : (", #{self.publisher}")
=begin
unless month.nil? && year.nil?
self.citation += self.citation.blank? ? '' : ','
Expand All @@ -420,20 +421,20 @@ def generate_citation(bibtex_record)
=end
elsif publication_type.is_tech_report?
self.citation += institution.blank? ? ' ': institution
self.citation += type.blank? ? ' ' : (', '+type)
self.citation += type.blank? ? ' ' : (", #{type}")
elsif publication_type.is_unpublished?
self.citation += note.blank? ? ' ': note
end

if self.doi.blank? && self.citation.blank?
self.citation += archivePrefix unless archivePrefix.nil?
self.citation += (self.citation.blank? ? primaryClass : (','+primaryClass)) unless primaryClass.nil?
self.citation += (self.citation.blank? ? eprint : (','+eprint)) unless eprint.nil?
self.citation += (self.citation.blank? ? primaryClass : (",#{primaryClass}")) unless primaryClass.nil?
self.citation += (self.citation.blank? ? eprint : (",#{eprint}")) unless eprint.nil?
self.journal = self.citation if self.journal.blank?
end

if self.doi.blank? && self.citation.blank?
self.citation += url.blank? ? '': url
self.citation += url.blank? ? '': url
end
self.citation = self.citation.try(:to_s).strip.gsub(/^,/,'').strip
end
Expand All @@ -453,20 +454,24 @@ def fetch_pubmed_or_doi_result(pubmed_id, doi)
end
elsif !doi.blank?
begin
query = DOI::Query.new(Seek::Config.crossref_api_email)
result = query.fetch(doi)

result = Seek::Doi::Parser.parse(doi)

@error = 'Unable to get result' if result.blank?
@error = 'Unable to get DOI' if result.title.blank?
rescue DOI::MalformedDOIException
@error = 'The DOI you entered appears to be malformed.'
rescue DOI::NotFoundException
@error = 'The DOI you entered could not be resolved.'
rescue DOI::RecordNotSupported
rescue Seek::Doi::RANotSupported => e
@error = "#{e.message} Please enter the publication in another way."
rescue Seek::Doi::FetchException => e
@error = e.message
rescue Seek::Doi::MalformedDOIException => e
@error = e.message
rescue Seek::Doi::NotFoundException => e
@error = e.message
rescue Seek::Doi::RecordNotSupported
@error = 'The DOI resolved to an unsupported resource type.'
rescue RuntimeError => exception
@error = 'There was a problem contacting the DOI query service. Please try again later'
Seek::Errors::ExceptionForwarder.send_notification(exception, data: {message: "Problem accessing crossref using DOI #{doi}"})
rescue RuntimeError => e
@error = 'There was a problem contacting the DOI query service. Please add the publication manually instead.'
Seek::Errors::ExceptionForwarder.send_notification(exception, data: {message: "Problem fetching DOI #{doi} : #{e.message}"})
end
else
@error = 'Please enter either a DOI or a PubMed ID for the publication.'
Expand Down Expand Up @@ -620,7 +625,7 @@ def check_bibtex_file (bibtex_record)
end

if (%w[InCollection InProceedings].include? self.publication_type.title) && (bibtex_record[:booktitle].blank?)
errors.add(:base, "An #{self.publication_type.title} needs to have a booktitle.")
errors.add(:base, "An #{self.publication_type.title} needs to have a booktitle.")
return false
end

Expand Down
19 changes: 19 additions & 0 deletions lib/seek/doi/author.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
module Seek
module Doi
class Author

# todo: add more fields if needed (e.g., affiliation, ORCID)
# e.g.{"ORCID"=>"https://orcid.org/0000-0002-5263-5070", "authenticated-orcid"=>false, "given"=>"K. Jarrod", "family"=>"Millman", "sequence"=>"additional", "affiliation"=>[]}
attr_accessor :first_name, :last_name

def initialize(first_name:, last_name:)
@first_name = first_name
@last_name = last_name
end

def full_name
[first_name, last_name].compact.join(' ')
end
end
end
end
21 changes: 21 additions & 0 deletions lib/seek/doi/base_exception.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
module Seek
module Doi
class BaseException < RuntimeError
def initialize(msg = 'A DOI exception occurred')
super(msg)
end

def backtrace
cause ? cause.backtrace : super
end
end

class UnrecognizedTypeException < BaseException; end
class FetchException < BaseException; end
class ParseException < BaseException; end
class MalformedDOIException < BaseException; end
class NotFoundException < BaseException; end
class RecordNotSupported < BaseException; end
class RANotSupported < BaseException; end
end
end
49 changes: 49 additions & 0 deletions lib/seek/doi/parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
require 'open-uri'
require 'json'

module Seek
module Doi
class Parser
DOI_ENDPOINT = 'https://doi.org'.freeze

def self.parse(doi)

agency = get_doi_ra(doi)

case agency
when 'DataCite'
Seek::Doi::Parsers::DataciteParser.new.parse(doi)
when 'Crossref'
Seek::Doi::Parsers::CrossrefParser.new.parse(doi)
else
raise Seek::Doi::RANotSupported, "DOI registration agency '#{agency}' is not supported."
end
rescue OpenURI::HTTPError => e
# Handle RA resolution issues
raise Seek::Doi::FetchException, "Error resolving DOI #{doi}: #{e.message}"
rescue Seek::Doi::BaseException
raise # Re-raise already handled domain exceptions
rescue StandardError => e
# Fallback for truly unexpected errors
raise Seek::Doi::FetchException, "Unexpected error resolving DOI #{doi}: #{e.message}"
end

private_class_method def self.get_doi_ra(doi)
url = "#{DOI_ENDPOINT}/ra/#{doi}"
response = URI.open(url).read
data = JSON.parse(response)
ra = data.dig(0, 'RA')
status = data.dig(0, 'status')

case status
when 'Invalid DOI'
raise Seek::Doi::MalformedDOIException, "Invalid DOI format: #{doi}."
when 'DOI does not exist'
raise Seek::Doi::NotFoundException, "DOI does not exist: #{doi}."
end
raise Seek::Doi::NotFoundException, "No registration agency found for DOI #{doi}" if ra.blank?
ra
end
end
end
end
Loading