Skip to content

Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID#38

Merged
caufieldjh merged 7 commits intomainfrom
37-runtimeerror-couldnt-resolve-exlinksrv2-the-address-table-is-empty
Feb 6, 2026
Merged

Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID#38
caufieldjh merged 7 commits intomainfrom
37-runtimeerror-couldnt-resolve-exlinksrv2-the-address-table-is-empty

Conversation

@caufieldjh
Copy link
Contributor

No description provided.

@claude
Copy link
Contributor

claude bot commented Jan 31, 2026

Claude finished @caufieldjh's task —— View job


PR Review Complete ✅

PR #38: Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID

Review Checklist:

  • Read and understand the changed files
  • Analyze code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security concerns
  • Assess test coverage
  • Provide constructive feedback

Summary

This PR addresses RuntimeError exceptions when fetching PMID entries that don't have associated PMCIDs. The changes add robust error handling and type safety to prevent crashes from malformed API responses.

Code Quality & Best Practices ✅

Excellent improvements:

  • Type safety: Added proper type annotations (dict[str, Any]) and runtime type checking
  • Defensive programming: Added isinstance() checks before accessing dictionary keys
  • Error handling: Wrapped risky operations with try/catch and proper logging
  • Consistent patterns: Applied the same defensive approach across both fetch() and _get_pmcid() methods

Lines of particular note:

  • src/linkml_reference_validator/etl/sources/pmid.py:121-127: Excellent addition of record type validation
  • src/linkml_reference_validator/etl/sources/pmid.py:300-307: Much more robust PMCID extraction logic

Bug Fixes ✅

Critical issues resolved:

  1. RuntimeError on missing PMCID: The original code would crash when result[0]["LinkSetDb"][0] didn't exist
  2. Type assumptions: Code now validates that record is actually a dict before accessing keys
  3. AttributeError prevention: Fixed CURIE converter to use .get() instead of attribute access

Performance Considerations ✅

No performance concerns:

  • Added type checks are minimal overhead
  • Early returns prevent unnecessary processing when data is malformed
  • No new API calls or blocking operations introduced

Security Concerns ✅

No security issues identified:

  • All external API interactions remain properly contained
  • No user input directly passed to external services
  • Error logging doesn't expose sensitive information

Test Coverage ✅

Well-tested changes:

  • Added specific test case test_get_pmcid_handles_entrez_error() in tests/test_sources.py:287-305
  • Test validates the exact error scenario (RuntimeError from Entrez.read)
  • Existing tests continue to pass with the defensive changes
  • Minor test fixes for integration tests show attention to detail

Specific Feedback

🟢 Excellent Additions

  1. Type narrowing in pmid.py:121-127:

    if not isinstance(record, dict):
        logger.warning(
            "Unexpected record format for PMID:%s: %s", pmid, type(record))
        return None

    This prevents crashes when the API returns unexpected data structures.

  2. Robust PMCID parsing in pmid.py:300-307:

    if isinstance(result, list) and result and isinstance(result[0], dict):
        link_set_db = result[0].get("LinkSetDb", [])
        if isinstance(link_set_db, list) and link_set_db:
            # ... more defensive checks

    Each level of nesting is validated before access.

  3. Error handling with cleanup in pmid.py:283-298:
    The finally block ensures resources are cleaned up even when exceptions occur.

🟡 Minor Suggestions

  1. Consider adding more specific exception types: Instead of catching all Exception, you could catch more specific types like urllib.error.URLError or xml.etree.ElementTree.ParseError if those are the expected failure modes.

  2. Test edge case coverage: Consider adding a test for the case where record is not a dict in the main fetch() method.

🟢 Code Follows Best Practices

  • Adheres to repository's "avoid try/except unless interfacing with external systems" guideline
  • Uses proper logging levels (WARNING for recoverable issues)
  • Maintains existing docstring format and examples
  • Follows type annotation conventions

Overall Assessment

This is a high-quality defensive programming improvement that addresses a real runtime issue. The changes are:

  • Focused and surgical - only touches problematic areas
  • Well-tested - includes regression test for the specific error
  • Type-safe - adds proper runtime validation
  • Maintainable - follows existing code patterns

Recommendation: Approve and merge. This PR significantly improves robustness without introducing any risks.


@claude
Copy link
Contributor

claude bot commented Feb 6, 2026

Claude finished @caufieldjh's task —— View job


PR Review Complete ✅

PR #38: Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID

Review Checklist:

  • Read and understand the changed files
  • Analyze code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security concerns
  • Assess test coverage
  • Provide constructive feedback

Summary

This PR addresses RuntimeError exceptions when fetching PMID entries that don't have associated PMCIDs. The changes add robust error handling and type safety to prevent crashes from malformed API responses.

Code Quality & Best Practices ✅

Excellent improvements:

  • Type safety: Added proper type annotations (dict[str, Any]) and runtime type checking
  • Defensive programming: Added isinstance() checks before accessing dictionary keys
  • Error handling: Wrapped risky operations with try/catch and proper logging
  • Consistent patterns: Applied the same defensive approach across both fetch() and _get_pmcid() methods

Lines of particular note:

  • src/linkml_reference_validator/etl/sources/pmid.py:121-127: Excellent addition of record type validation
  • src/linkml_reference_validator/etl/sources/pmid.py:300-307: Much more robust PMCID extraction logic

Bug Fixes ✅

Critical issues resolved:

  1. RuntimeError on missing PMCID: The original code would crash when result[0]["LinkSetDb"][0] didn't exist
  2. Type assumptions: Code now validates that record is actually a dict before accessing keys
  3. AttributeError prevention: Fixed CURIE converter to use .get() instead of attribute access

Performance Considerations ✅

No performance concerns:

  • Added type checks are minimal overhead
  • Early returns prevent unnecessary processing when data is malformed
  • No new API calls or blocking operations introduced

Security Concerns ✅

No security issues identified:

  • All external API interactions remain properly contained
  • No user input directly passed to external services
  • Error logging doesn't expose sensitive information

Test Coverage ✅

Well-tested changes:

  • Added specific test case test_get_pmcid_handles_entrez_error() in tests/test_sources.py:287-305
  • Test validates the exact error scenario (RuntimeError from Entrez.read)
  • Existing tests continue to pass with the defensive changes
  • Minor test fixes for integration tests show attention to detail

Specific Feedback

🟢 Excellent Additions

  1. Type narrowing in pmid.py:121-127:

    if not isinstance(record, dict):
        logger.warning(
            "Unexpected record format for PMID:%s: %s", pmid, type(record))
        return None

    This prevents crashes when the API returns unexpected data structures.

  2. Robust PMCID parsing in pmid.py:300-307:

    if isinstance(result, list) and result and isinstance(result[0], dict):
        link_set_db = result[0].get("LinkSetDb", [])
        if isinstance(link_set_db, list) and link_set_db:
            # ... more defensive checks

    Each level of nesting is validated before access.

  3. Error handling with cleanup in pmid.py:283-298:
    The finally block ensures resources are cleaned up even when exceptions occur.

🟡 Minor Suggestions

  1. Consider adding more specific exception types: Instead of catching all Exception, you could catch more specific types like urllib.error.URLError or xml.etree.ElementTree.ParseError if those are the expected failure modes.

  2. Test edge case coverage: Consider adding a test for the case where record is not a dict in the main fetch() method.

🟢 Code Follows Best Practices

  • Adheres to repository's "avoid try/except unless interfacing with external systems" guideline
  • Uses proper logging levels (WARNING for recoverable issues)
  • Maintains existing docstring format and examples
  • Follows type annotation conventions

Overall Assessment

This is a high-quality defensive programming improvement that addresses a real runtime issue. The changes are:

  • Focused and surgical - only touches problematic areas
  • Well-tested - includes regression test for the specific error
  • Type-safe - adds proper runtime validation
  • Maintainable - follows existing code patterns

Recommendation: Approve and merge. This PR significantly improves robustness without introducing any risks.


@claude
Copy link
Contributor

claude bot commented Feb 6, 2026

Claude finished @caufieldjh's task —— View job


PR Review Complete ✅

PR #38: Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID

Review Checklist:

  • Read and understand the changed files
  • Analyze code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security concerns
  • Assess test coverage
  • Provide constructive feedback

Summary

This PR addresses a critical RuntimeError that occurs when fetching PMID entries that don't have associated PMCIDs. The fix implements comprehensive defensive programming patterns with proper error handling and type safety.

Code Quality & Best Practices ✅

Excellent improvements:

  1. Type Safety Enhancement (pmid.py:121-127):

    if not isinstance(record, dict):
        logger.warning("Unexpected record format for PMID:%s: %s", pmid, type(record))
        return None
    record_dict: dict[str, Any] = record
    • Validates record type before dictionary access
    • Explicit type annotation with dict[str, Any]
    • Clear warning message for debugging
  2. Robust PMCID Extraction (pmid.py:300-308):

    if isinstance(result, list) and result and isinstance(result[0], dict):
        link_set_db = result[0].get("LinkSetDb", [])
        if isinstance(link_set_db, list) and link_set_db:
            links = link_set_db[0].get("Link", [])
            if isinstance(links, list) and links:
                first_link = links[0]
                if isinstance(first_link, dict) and "Id" in first_link:
                    return str(first_link["Id"])
    • Validates each level of nested dictionary access
    • Uses .get() with defaults instead of direct key access
    • Type checking at every level prevents AttributeErrors
  3. Comprehensive Error Handling (pmid.py:283-298):

    try:
        handle = Entrez.elink(...)
    except Exception as exc:
        logger.warning("Failed to link PMID:%s to PMC: %s", pmid, exc)
        return None
    try:
        result = Entrez.read(handle)
    except Exception as exc:
        logger.warning("Failed to read PMC link for PMID:%s: %s", pmid, exc)
        return None
    finally:
        handle.close()
    • Separate try/catch blocks for different failure modes
    • Proper resource cleanup with finally block
    • Informative error logging
  4. Plugin Robustness (reference_validation_plugin.py:296-316):

    • Fixed CURIE converter to handle different schema prefix formats
    • Uses getattr() with defensive checks instead of assuming attributes exist
    • Better type safety with explicit type annotations

Bug Fixes ✅

Critical issues resolved:

  1. RuntimeError: "Couldn't resolve #exLinkSrv2, the address table is empty" - now caught and handled gracefully
  2. AttributeError: Fixed assumption that record is always a dict
  3. KeyError: Prevented crashes from missing dictionary keys in nested API responses
  4. Type mismatch: Plugin now handles different prefix map formats in LinkML schemas

Security Considerations ✅

No security concerns:

  • All external API calls properly contained with error handling
  • No user input directly passed to external APIs without validation
  • Error messages don't expose sensitive information
  • No new attack vectors introduced

Performance Considerations ✅

Minimal impact:

  • Type checking overhead is negligible
  • Early returns prevent unnecessary processing on malformed data
  • No additional API calls introduced
  • Resource cleanup ensures no handle leaks

Test Coverage ✅

Comprehensive testing:

  1. New regression test (test_sources.py:286-305):

    def test_get_pmcid_handles_entrez_error(self, mock_elink, mock_read, source, config):
        """Should return None when Entrez.read raises an error."""
        handle = MagicMock()
        mock_elink.return_value = handle
        mock_read.side_effect = RuntimeError(
            "Couldn't resolve #exLinkSrv2, the address table is empty."
        )
        result = source._get_pmcid("12112053", config)
        assert result is None
        handle.close.assert_called_once()
    • Tests the exact error scenario mentioned in PR title
    • Verifies proper resource cleanup
    • Uses realistic error message from actual bug
  2. Integration test improvements (test_plugin_integration.py):

    • Added null check for file content validation
    • Minor formatting improvements for consistency

Code Style & Standards ✅

Follows project conventions:

  • Adheres to repository's "avoid try/except unless interfacing with external systems" guideline
  • Uses proper logging levels (WARNING for recoverable failures)
  • Maintains existing docstring format with examples
  • Consistent type annotation patterns
  • Proper line formatting and PEP 8 compliance

Areas of Excellence

  1. Defensive Programming: Every level of API response parsing is validated
  2. Error Recovery: System continues operating when individual references fail
  3. Logging Quality: Error messages include context (PMID) and specific failure details
  4. Type Safety: Runtime validation matches static type hints
  5. Resource Management: Proper cleanup with try/finally patterns

Minor Observations

  1. Exception Handling Scope: Uses broad Exception catching which is appropriate for external API interactions but could be more specific for better debugging
  2. Test Coverage: Could benefit from testing the record type validation path in fetch() method

Overall Assessment

This is high-quality defensive programming that addresses a real production issue. The implementation is:

  • Surgical and focused - only changes problematic code paths
  • Well-tested - includes specific regression test
  • Type-safe - adds comprehensive runtime validation
  • Maintainable - follows existing patterns and conventions
  • Robust - handles various API failure modes gracefully

Recommendation: Approve and merge. This PR significantly improves system stability and error resilience without introducing risks.


@caufieldjh caufieldjh merged commit 57d9077 into main Feb 6, 2026
5 checks passed
@caufieldjh caufieldjh deleted the 37-runtimeerror-couldnt-resolve-exlinksrv2-the-address-table-is-empty branch February 6, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RuntimeError: Couldn't resolve #exLinkSrv2, the address table is empty

1 participant