Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID by caufieldjh · Pull Request #38 · linkml/linkml-reference-validator

caufieldjh · 2026-01-31T15:16:57Z

No description provided.

claude · 2026-01-31T15:17:11Z

Claude finished @caufieldjh's task —— View job

PR Review Complete ✅

PR #38: Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID

Review Checklist:

Read and understand the changed files
Analyze code quality and best practices
Check for potential bugs or issues
Evaluate performance considerations
Review security concerns
Assess test coverage
Provide constructive feedback

Summary

This PR addresses RuntimeError exceptions when fetching PMID entries that don't have associated PMCIDs. The changes add robust error handling and type safety to prevent crashes from malformed API responses.

Code Quality & Best Practices ✅

Excellent improvements:

Type safety: Added proper type annotations (dict[str, Any]) and runtime type checking
Defensive programming: Added isinstance() checks before accessing dictionary keys
Error handling: Wrapped risky operations with try/catch and proper logging
Consistent patterns: Applied the same defensive approach across both fetch() and _get_pmcid() methods

Lines of particular note:

src/linkml_reference_validator/etl/sources/pmid.py:121-127: Excellent addition of record type validation
src/linkml_reference_validator/etl/sources/pmid.py:300-307: Much more robust PMCID extraction logic

Bug Fixes ✅

Critical issues resolved:

RuntimeError on missing PMCID: The original code would crash when result[0]["LinkSetDb"][0] didn't exist
Type assumptions: Code now validates that record is actually a dict before accessing keys
AttributeError prevention: Fixed CURIE converter to use .get() instead of attribute access

Performance Considerations ✅

No performance concerns:

Added type checks are minimal overhead
Early returns prevent unnecessary processing when data is malformed
No new API calls or blocking operations introduced

Security Concerns ✅

No security issues identified:

All external API interactions remain properly contained
No user input directly passed to external services
Error logging doesn't expose sensitive information

Test Coverage ✅

Well-tested changes:

Added specific test case test_get_pmcid_handles_entrez_error() in tests/test_sources.py:287-305
Test validates the exact error scenario (RuntimeError from Entrez.read)
Existing tests continue to pass with the defensive changes
Minor test fixes for integration tests show attention to detail

Specific Feedback

🟢 Excellent Additions

Type narrowing in pmid.py:121-127:

if not isinstance(record, dict):
    logger.warning(
        "Unexpected record format for PMID:%s: %s", pmid, type(record))
    return None

This prevents crashes when the API returns unexpected data structures.

Robust PMCID parsing in pmid.py:300-307:

if isinstance(result, list) and result and isinstance(result[0], dict):
    link_set_db = result[0].get("LinkSetDb", [])
    if isinstance(link_set_db, list) and link_set_db:
        # ... more defensive checks

Each level of nesting is validated before access.

Error handling with cleanup in pmid.py:283-298:
The finally block ensures resources are cleaned up even when exceptions occur.

🟡 Minor Suggestions

Consider adding more specific exception types: Instead of catching all Exception, you could catch more specific types like urllib.error.URLError or xml.etree.ElementTree.ParseError if those are the expected failure modes.
Test edge case coverage: Consider adding a test for the case where record is not a dict in the main fetch() method.

🟢 Code Follows Best Practices

Adheres to repository's "avoid try/except unless interfacing with external systems" guideline
Uses proper logging levels (WARNING for recoverable issues)
Maintains existing docstring format and examples
Follows type annotation conventions

Overall Assessment

This is a high-quality defensive programming improvement that addresses a real runtime issue. The changes are:

✅ Focused and surgical - only touches problematic areas
✅ Well-tested - includes regression test for the specific error
✅ Type-safe - adds proper runtime validation
✅ Maintainable - follows existing code patterns

Recommendation: Approve and merge. This PR significantly improves robustness without introducing any risks.

claude · 2026-02-06T22:20:11Z

Claude finished @caufieldjh's task —— View job

PR Review Complete ✅

PR #38: Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID

Review Checklist:

Read and understand the changed files
Analyze code quality and best practices
Check for potential bugs or issues
Evaluate performance considerations
Review security concerns
Assess test coverage
Provide constructive feedback

Summary

This PR addresses RuntimeError exceptions when fetching PMID entries that don't have associated PMCIDs. The changes add robust error handling and type safety to prevent crashes from malformed API responses.

Code Quality & Best Practices ✅

Excellent improvements:

Type safety: Added proper type annotations (dict[str, Any]) and runtime type checking
Defensive programming: Added isinstance() checks before accessing dictionary keys
Error handling: Wrapped risky operations with try/catch and proper logging
Consistent patterns: Applied the same defensive approach across both fetch() and _get_pmcid() methods

Lines of particular note:

src/linkml_reference_validator/etl/sources/pmid.py:121-127: Excellent addition of record type validation
src/linkml_reference_validator/etl/sources/pmid.py:300-307: Much more robust PMCID extraction logic

Bug Fixes ✅

Critical issues resolved:

RuntimeError on missing PMCID: The original code would crash when result[0]["LinkSetDb"][0] didn't exist
Type assumptions: Code now validates that record is actually a dict before accessing keys
AttributeError prevention: Fixed CURIE converter to use .get() instead of attribute access

Performance Considerations ✅

No performance concerns:

Added type checks are minimal overhead
Early returns prevent unnecessary processing when data is malformed
No new API calls or blocking operations introduced

Security Concerns ✅

No security issues identified:

All external API interactions remain properly contained
No user input directly passed to external services
Error logging doesn't expose sensitive information

Test Coverage ✅

Well-tested changes:

Added specific test case test_get_pmcid_handles_entrez_error() in tests/test_sources.py:287-305
Test validates the exact error scenario (RuntimeError from Entrez.read)
Existing tests continue to pass with the defensive changes
Minor test fixes for integration tests show attention to detail

Specific Feedback

🟢 Excellent Additions

Type narrowing in pmid.py:121-127:

if not isinstance(record, dict):
    logger.warning(
        "Unexpected record format for PMID:%s: %s", pmid, type(record))
    return None

This prevents crashes when the API returns unexpected data structures.

Robust PMCID parsing in pmid.py:300-307:

if isinstance(result, list) and result and isinstance(result[0], dict):
    link_set_db = result[0].get("LinkSetDb", [])
    if isinstance(link_set_db, list) and link_set_db:
        # ... more defensive checks

Each level of nesting is validated before access.

Error handling with cleanup in pmid.py:283-298:
The finally block ensures resources are cleaned up even when exceptions occur.

🟡 Minor Suggestions

Consider adding more specific exception types: Instead of catching all Exception, you could catch more specific types like urllib.error.URLError or xml.etree.ElementTree.ParseError if those are the expected failure modes.
Test edge case coverage: Consider adding a test for the case where record is not a dict in the main fetch() method.

🟢 Code Follows Best Practices

Adheres to repository's "avoid try/except unless interfacing with external systems" guideline
Uses proper logging levels (WARNING for recoverable issues)
Maintains existing docstring format and examples
Follows type annotation conventions

Overall Assessment

This is a high-quality defensive programming improvement that addresses a real runtime issue. The changes are:

✅ Focused and surgical - only touches problematic areas
✅ Well-tested - includes regression test for the specific error
✅ Type-safe - adds proper runtime validation
✅ Maintainable - follows existing code patterns

Recommendation: Approve and merge. This PR significantly improves robustness without introducing any risks.

claude · 2026-02-06T22:28:30Z

Claude finished @caufieldjh's task —— View job

PR Review Complete ✅

PR #38: Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID

Review Checklist:

Read and understand the changed files
Analyze code quality and best practices
Check for potential bugs or issues
Evaluate performance considerations
Review security concerns
Assess test coverage
Provide constructive feedback

Summary

This PR addresses a critical RuntimeError that occurs when fetching PMID entries that don't have associated PMCIDs. The fix implements comprehensive defensive programming patterns with proper error handling and type safety.

Code Quality & Best Practices ✅

Excellent improvements:

Type Safety Enhancement (pmid.py:121-127):

if not isinstance(record, dict):
    logger.warning("Unexpected record format for PMID:%s: %s", pmid, type(record))
    return None
record_dict: dict[str, Any] = record

Validates record type before dictionary access
Explicit type annotation with dict[str, Any]
Clear warning message for debugging

Robust PMCID Extraction (pmid.py:300-308):

if isinstance(result, list) and result and isinstance(result[0], dict):
    link_set_db = result[0].get("LinkSetDb", [])
    if isinstance(link_set_db, list) and link_set_db:
        links = link_set_db[0].get("Link", [])
        if isinstance(links, list) and links:
            first_link = links[0]
            if isinstance(first_link, dict) and "Id" in first_link:
                return str(first_link["Id"])

Validates each level of nested dictionary access
Uses .get() with defaults instead of direct key access
Type checking at every level prevents AttributeErrors

Comprehensive Error Handling (pmid.py:283-298):

try:
    handle = Entrez.elink(...)
except Exception as exc:
    logger.warning("Failed to link PMID:%s to PMC: %s", pmid, exc)
    return None
try:
    result = Entrez.read(handle)
except Exception as exc:
    logger.warning("Failed to read PMC link for PMID:%s: %s", pmid, exc)
    return None
finally:
    handle.close()

Separate try/catch blocks for different failure modes
Proper resource cleanup with finally block
Informative error logging

Plugin Robustness (reference_validation_plugin.py:296-316):
- Fixed CURIE converter to handle different schema prefix formats
- Uses getattr() with defensive checks instead of assuming attributes exist
- Better type safety with explicit type annotations

Bug Fixes ✅

Critical issues resolved:

RuntimeError: "Couldn't resolve #exLinkSrv2, the address table is empty" - now caught and handled gracefully
AttributeError: Fixed assumption that record is always a dict
KeyError: Prevented crashes from missing dictionary keys in nested API responses
Type mismatch: Plugin now handles different prefix map formats in LinkML schemas

Security Considerations ✅

No security concerns:

All external API calls properly contained with error handling
No user input directly passed to external APIs without validation
Error messages don't expose sensitive information
No new attack vectors introduced

Performance Considerations ✅

Minimal impact:

Type checking overhead is negligible
Early returns prevent unnecessary processing on malformed data
No additional API calls introduced
Resource cleanup ensures no handle leaks

Test Coverage ✅

Comprehensive testing:

New regression test (test_sources.py:286-305):

def test_get_pmcid_handles_entrez_error(self, mock_elink, mock_read, source, config):
    """Should return None when Entrez.read raises an error."""
    handle = MagicMock()
    mock_elink.return_value = handle
    mock_read.side_effect = RuntimeError(
        "Couldn't resolve #exLinkSrv2, the address table is empty."
    )
    result = source._get_pmcid("12112053", config)
    assert result is None
    handle.close.assert_called_once()

Tests the exact error scenario mentioned in PR title
Verifies proper resource cleanup
Uses realistic error message from actual bug

Integration test improvements (test_plugin_integration.py):
- Added null check for file content validation
- Minor formatting improvements for consistency

Code Style & Standards ✅

Follows project conventions:

Adheres to repository's "avoid try/except unless interfacing with external systems" guideline
Uses proper logging levels (WARNING for recoverable failures)
Maintains existing docstring format with examples
Consistent type annotation patterns
Proper line formatting and PEP 8 compliance

Areas of Excellence

Defensive Programming: Every level of API response parsing is validated
Error Recovery: System continues operating when individual references fail
Logging Quality: Error messages include context (PMID) and specific failure details
Type Safety: Runtime validation matches static type hints
Resource Management: Proper cleanup with try/finally patterns

Minor Observations

Exception Handling Scope: Uses broad Exception catching which is appropriate for external API interactions but could be more specific for better debugging
Test Coverage: Could benefit from testing the record type validation path in fetch() method

Overall Assessment

This is high-quality defensive programming that addresses a real production issue. The implementation is:

✅ Surgical and focused - only changes problematic code paths
✅ Well-tested - includes specific regression test
✅ Type-safe - adds comprehensive runtime validation
✅ Maintainable - follows existing patterns and conventions
✅ Robust - handles various API failure modes gracefully

Recommendation: Approve and merge. This PR significantly improves system stability and error resilience without introducing risks.

caufieldjh added 5 commits January 31, 2026 10:09

Add more error capture for PMID and PMCID retrieval

5dd4081

More hardening for PMID retrieval

e78b3de

Another fix to ensure valid PMCID

5a0c121

Small fix for CURIE converter

cf47ade

Small fix for test_fetch_relative_path_with_base_dir

6917688

caufieldjh linked an issue Jan 31, 2026 that may be closed by this pull request

RuntimeError: Couldn't resolve #exLinkSrv2, the address table is empty #37

Closed

Fix for ReferenceValidationConfig cache_dir

f97edda

Type fix for _get_converter

f21c833

caufieldjh merged commit 57d9077 into main Feb 6, 2026
5 checks passed

caufieldjh deleted the 37-runtimeerror-couldnt-resolve-exlinksrv2-the-address-table-is-empty branch February 6, 2026 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID#38

Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID#38
caufieldjh merged 7 commits intomainfrom
37-runtimeerror-couldnt-resolve-exlinksrv2-the-address-table-is-empty

caufieldjh commented Jan 31, 2026

Uh oh!

claude bot commented Jan 31, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

caufieldjh commented Jan 31, 2026

Uh oh!

claude bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Summary

Code Quality & Best Practices ✅

Bug Fixes ✅

Performance Considerations ✅

Security Concerns ✅

Test Coverage ✅

Specific Feedback

🟢 Excellent Additions

🟡 Minor Suggestions

🟢 Code Follows Best Practices

Overall Assessment

Uh oh!

claude bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Summary

Code Quality & Best Practices ✅

Bug Fixes ✅

Performance Considerations ✅

Security Concerns ✅

Test Coverage ✅

Specific Feedback

🟢 Excellent Additions

🟡 Minor Suggestions

🟢 Code Follows Best Practices

Overall Assessment

Uh oh!

claude bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Review Checklist:

Summary

Code Quality & Best Practices ✅

Bug Fixes ✅

Security Considerations ✅

Performance Considerations ✅

Test Coverage ✅

Code Style & Standards ✅

Areas of Excellence

Minor Observations

Overall Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Jan 31, 2026 •

edited

Loading

claude bot commented Feb 6, 2026 •

edited

Loading

claude bot commented Feb 6, 2026 •

edited

Loading