Skip to content

Conversation

@whomingbird
Copy link
Contributor

  • Removed external dependency on the doi-query-tool gem
    DOI fetching and parsing are now handled entirely within SEEK.

  • Introduced modular parser architecture:

    Seek::Doi::Parser – central entry point that detects the DOI registration agency (RA).
    Seek::Doi::Parsers::BaseParser – shared base class with standard request, error, and metadata-handling logic.
    Seek::Doi::Parsers::CrossrefParser – rewritten Crossref parser (replaces the gem-based version).
    Seek::Doi::Parsers::DataciteParser – newly added to support DOIs registered under DataCite.

  • Extensible design for future RA support
    Adding new DOI sources (e.g. mEDRA, JaLC) will only require inheriting from BaseParser and implementing a single metadata extraction method.

  • Unified exception handling
    More informative, user-facing error messages in the UI.

  • Updated Publication model
    Now fetches and parses DOI metadata via the new parser system.
    Displays detailed error feedback for unsupported or malformed DOIs.

  • Improved test coverage

- Introduced BaseParser, inherited by CrossrefParser and DataCiteParser
- Allows easy extension for future DOI registration agencies (e.g., mEDRA)
…nt).

Handles subtitle if present.

Cleans the abstract by removing all <jats:*> tags and other XML markers
- Use abbreviated journal title when available
- Add support for articles identified by article number (no volume, issue, or page).
- HTML decoding
- clean up code
- add more tests
(HTTP requests, JSON parsing, decoding, author/editor extraction)
@stuzart stuzart requested a review from Copilot November 4, 2025 09:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors the DOI metadata retrieval system by replacing the legacy XML-based Crossref implementation with modern JSON APIs for both Crossref and DataCite. The implementation introduces a modular parser architecture with comprehensive error handling and extensive test coverage using VCR cassettes.

Key changes:

  • Replaces XML parsing with JSON-based APIs for both Crossref and DataCite
  • Implements a Registration Agency (RA) detection mechanism to route DOIs to the appropriate parser
  • Introduces a structured exception hierarchy for DOI processing errors
  • Adds comprehensive VCR test cassettes for various DOI types and error scenarios

Reviewed Changes

Copilot reviewed 39 out of 40 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lib/seek/doi/parser.rb Main entry point implementing RA detection and parser routing
lib/seek/doi/parsers/base_parser.rb Abstract base class with shared parsing logic and utilities
lib/seek/doi/parsers/crossref_parser.rb Crossref-specific parser with citation formatting
lib/seek/doi/parsers/datacite_parser.rb DataCite-specific parser implementation
lib/seek/doi/base_exception.rb Custom exception hierarchy for DOI operations
lib/seek/doi/author.rb Author model object for structured metadata
test/unit/doi/* New test files with VCR cassettes for various DOI types
test/unit/publication_test.rb Updated tests to use new parser API
test/vcr_cassettes/doi/* VCR cassettes capturing real API responses
test/vcr_cassettes/publications/fairdom_by_doi.yml Deleted obsolete XML-based test cassette
Comments suppressed due to low confidence (1)

lib/seek/doi/parsers/crossref_parser.rb:1

  • The regex pattern for stripping JATS tags uses a simple approach that may not handle nested tags or edge cases correctly. Consider using a proper HTML/XML parser like Nokogiri for more robust tag removal, or at minimum document why the simple regex is sufficient for JATS content.
module Seek

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 12 to 14
def message
cause ? "#{super} (#{cause.class.name}: #{cause.message})" : super
end
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message method could potentially cause infinite recursion if the cause chain is circular or if cause.message itself triggers an error. Consider adding safeguards or limiting the depth of cause message extraction.

Copilot uses AI. Check for mistakes.
@error = e.message
rescue Seek::Doi::NotFoundException => e
@error = e.message
rescue Seek::Doi::RecordNotSupported => e
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to e is useless, since its value is never read.

Suggested change
rescue Seek::Doi::RecordNotSupported => e
rescue Seek::Doi::RecordNotSupported

Copilot uses AI. Check for mistakes.
assert_nil publication.doi
assert_equal Publication::REGISTRATION_BY_DOI, publication.registered_mode
test 'create publication from metadata via Seek::Doi::Parser' do
doi = '10.1234/fake-doi-for-test'
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to doi is useless, since its value is never read.

Suggested change
doi = '10.1234/fake-doi-for-test'

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants