Add configuration options for handling unsupported reference prefixes #29

github-actions · 2026-01-14T03:18:29Z

Summary

Implements #28 by adding two new configuration options to control behavior for unsupported or unfetchable reference types:

skip_prefixes: List of reference prefixes to skip during validation
unknown_prefix_severity: Control severity level for unfetchable references

Changes

Core Implementation

models.py: Added skip_prefixes and unknown_prefix_severity fields to ReferenceValidationConfig
supporting_text_validator.py: Implemented prefix checking logic in validate() method
- Checks skip_prefixes before attempting to fetch reference
- Returns is_valid=True with INFO severity for skipped prefixes
- Uses configured severity for unfetchable references

Testing

Added comprehensive test coverage in test_supporting_text_validator.py:

✅ Skip single and multiple prefixes
✅ Case-insensitive prefix matching
✅ Severity configuration (ERROR, WARNING, INFO)
✅ Precedence rules (skip_prefixes > unknown_prefix_severity)
✅ Combined configuration scenarios

Test Results: All 406 tests passing (including 11 new tests)

Documentation

README.md: Added detailed configuration section with examples
models.py: Added doctests demonstrating new configuration options

Configuration Examples

Example 1: Skip unsupported prefixes

# .linkml-reference-validator.yaml
validation:
  skip_prefixes:
    - SRA
    - MGNIFY
    - BIOPROJECT

Result:

$ linkml-reference-validator validate text "some text" SRA:PRJNA290729
✓ Valid: True (INFO) - Skipping validation for reference with prefix 'SRA'

Example 2: Downgrade severity for unknown prefixes

validation:
  unknown_prefix_severity: WARNING  # Default: ERROR

Result:

$ linkml-reference-validator validate text "some text" UNKNOWN:12345
✗ Valid: False (WARNING) - Could not fetch reference: UNKNOWN:12345

Example 3: Combined configuration

validation:
  skip_prefixes:
    - SRA              # Completely skip SRA references
  unknown_prefix_severity: WARNING  # Other unfetchable refs get WARNING

Use Cases

This feature is particularly useful for the dismech knowledge base and other projects that:

Have references from multiple sources with varying support
Want to keep unsupported reference IDs in their data
Need validation to pass without removing legitimate references
Want to distinguish between data errors and unsupported sources

Implementation Notes

Backward compatible: Default behavior unchanged (ERROR for unfetchable references)
Case-insensitive: Prefix matching is case-insensitive for convenience
Precedence: skip_prefixes takes precedence over unknown_prefix_severity
TDD approach: Tests written first, then implementation

This commit adds two new configuration options to ReferenceValidationConfig: 1. skip_prefixes: List of prefixes to skip during validation - References with these prefixes return is_valid=True with INFO severity - Useful for unsupported reference types (SRA, MGNIFY, BIOPROJECT) - Case-insensitive matching 2. unknown_prefix_severity: Control severity for unfetchable references - Options: ERROR (default), WARNING, INFO - Applied when references cannot be fetched - skip_prefixes takes precedence over this setting Changes: - Added skip_prefixes field to ReferenceValidationConfig (models.py) - Added unknown_prefix_severity field to ReferenceValidationConfig (models.py) - Implemented prefix checking logic in validate() method (supporting_text_validator.py) - Added comprehensive test coverage (test_supporting_text_validator.py) - Updated documentation with configuration examples (README.md) - Added doctests for new configuration options Resolves #28 Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

cmungall · 2026-01-14T03:22:58Z

@dragon-ai-agent please add something to docs/. Everything must be completely clear to someone setting up a system, with illustrative examples

This commit adds three major documentation enhancements to make the system completely clear to new users setting up linkml-reference-validator: 1. **Setup Guide (docs/setup-guide.md)** - Complete installation instructions for pip, uv, and development setup - Initial configuration including NCBI API key setup - Quick start examples with real PMIDs - Real-world example: validating gene functions - Advanced configuration with YAML config files - Integration with pre-commit hooks, CI/CD, and Makefiles - Verification checklist and troubleshooting quick fixes 2. **Complete Workflow Tutorial (docs/tutorials/complete-workflow.md)** - Step-by-step 30-45 minute tutorial building a gene annotation system - Covers installation, schema design, data creation, validation, and repair - Includes real-world examples with TP53, BRCA1, EGFR, and JAK1 - Shows integration with Git, GitHub Actions, and testing frameworks - Provides templates and boilerplate code for quick starts - Production-ready examples with Makefiles and test suites 3. **Troubleshooting Guide (docs/troubleshooting.md)** - Comprehensive solutions for installation issues - Reference fetching problems (PMIDs, network, rate limiting) - Validation errors with detailed explanations and fixes - Schema and data format issues - Performance optimization tips - Common error messages with causes and solutions - Quick diagnostic checklist Also updated mkdocs.yml navigation to include the new guides in logical positions for discoverability. These guides provide clear, illustrative examples for someone setting up the system from scratch, addressing issue #29. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

github-actions · 2026-01-14T03:29:49Z

Documentation Added 📚

I've created comprehensive documentation to make the setup process completely clear, with illustrative examples throughout. See PR #30.

What's Been Added

1. Setup Guide (`docs/setup-guide.md`)

A complete guide covering everything from installation to production deployment:

Multiple installation methods (pip, uv, development)
NCBI API configuration with step-by-step instructions
Quick start examples using real PMIDs
Complete real-world example: building a gene annotation validator
Advanced configuration (YAML configs, environment variables)
Integration examples (pre-commit hooks, GitHub Actions, Makefiles)
Verification checklist to ensure everything works

2. Complete Workflow Tutorial (`docs/tutorials/complete-workflow.md`)

A hands-on 30-45 minute tutorial that builds a production-ready gene annotation system:

Step-by-step from scratch to production
Real examples with TP53, BRCA1, EGFR, JAK1
Schema design, data creation, validation, repair
Error handling scenarios with solutions
CI/CD integration
Testing and documentation templates

3. Troubleshooting Guide (`docs/troubleshooting.md`)

Comprehensive troubleshooting covering common issues:

Installation problems
Reference fetching errors
Validation failures with detailed solutions
Performance optimization
Quick diagnostic checklist
All common error messages explained

Examples Are Clear and Illustrative

Every guide includes:
✅ Real, working examples that can be copy-pasted
✅ Actual PMIDs that work (e.g., PMID:16888623)
✅ Complete file examples (schema, data, configs)
✅ Expected output for each command
✅ Common error scenarios and how to fix them
✅ Step-by-step verification at each stage

Documentation Structure

The guides are organized for different learning styles:

Quick Start: Jump right in with examples
Setup Guide: Methodical installation and configuration
Tutorial: Hands-on complete workflow
Troubleshooting: Problem-solving reference

All integrated into the mkdocs navigation for easy discovery.

Please review PR #30 and let me know if you'd like any additions or changes!

Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-14T03:41:48Z

Claude finished @github-actions[bot]'s task —— View job

PR Review Complete ✅

Reviewed PR #29: Add configuration options for handling unsupported reference prefixes

Review Checklist

Read repository guidelines (CLAUDE.md)
Analyze all changed files
Review code quality and best practices
Check for potential bugs or issues
Assess performance considerations
Evaluate security concerns
Review test coverage adequacy
Provide comprehensive feedback

Overall Assessment: Excellent ⭐⭐⭐⭐⭐

This is a well-implemented feature that follows excellent software engineering practices. The implementation is backward compatible, thoroughly tested, and properly documented.

✅ Code Quality & Best Practices

Strengths:

Follows TDD approach - Tests were written first and are comprehensive
Excellent docstring coverage - All new fields have detailed doctests with examples
Type safety - Proper use of Pydantic validation and enum types
Backward compatible - Default behavior unchanged (ERROR for unfetchable references)
Clean separation of concerns - Configuration in models, logic in validator
Consistent naming - Clear, descriptive field names (skip_prefixes, unknown_prefix_severity)

Code Quality Examples:

Good validation: skip_prefixes validation in models.py:395-412
Clear precedence rules: Skip prefixes checked before fetching in supporting_text_validator.py:148-160
Proper error messages: Informative messages for both skipped and unfetchable references

✅ Test Coverage: Comprehensive

Test Quality Analysis:

11 new tests covering all scenarios - excellent coverage
Edge cases covered: Case-insensitive matching, precedence rules, combined configs
Real-world scenarios: Multiple prefixes, different severity levels
Proper mocking: Uses mocker.patch appropriately to isolate validator logic

Notable Test Cases:

# tests/test_supporting_text_validator.py:469-489
def test_skip_prefixes_takes_precedence_over_unknown_severity()
# Ensures skip_prefixes overrides unknown_prefix_severity - critical behavior

Test coverage is exceptional - covers all combinations and edge cases.

✅ Implementation Analysis

Models (`models.py`)

# Lines 395-412: Well-designed configuration fields
skip_prefixes: list[str] = Field(
    default_factory=list,
    description=(
        "List of reference prefixes to skip during validation. "
        "References with these prefixes will return is_valid=True with INFO severity. "
        # ... clear, comprehensive documentation
    ),
)

Strengths:

Clear field descriptions with use cases
Proper default values
Comprehensive doctests with realistic examples

Validator Logic (`supporting_text_validator.py`)

# Lines 148-160: Efficient prefix checking
prefix = reference_id.split(":")[0].upper() if ":" in reference_id else ""
skip_prefixes_upper = [p.upper() for p in self.config.skip_prefixes]

if prefix and prefix in skip_prefixes_upper:
    return ValidationResult(
        is_valid=True,  # Key: returns True for skipped prefixes
        # ... proper INFO severity and descriptive message
    )

Strengths:

Case-insensitive matching (good UX)
Checked before fetching (performance optimization)
Returns is_valid=True for skipped prefixes (correct design)
Configurable severity for unfetchable references (lines 165-172)

✅ Performance Considerations

Optimizations Implemented:

Early exit for skipped prefixes - No unnecessary fetch attempts
Case conversion cached - Only done once per validation
Simple string operations - No regex overhead

Performance Impact:

Positive: Skipped prefixes avoid network calls entirely
Negligible overhead: Simple list lookup for prefix checking
Scales well: O(n) for prefix list, typically small (< 10 items)

✅ Security Considerations

No Security Issues Identified:

Input validation - Prefix extraction uses safe string operations
No injection risks - Configuration values are validated by Pydantic
No sensitive data - Configuration is metadata only
Safe defaults - Empty skip list, ERROR severity (fail-safe)

🚀 Minor Enhancement Suggestions

1. Documentation Enhancement

Consider adding a brief example to the main docstring in supporting_text_validator.py:

class SupportingTextValidator:
    """Validate that supporting text quotes are found in references.
    
    Supports configuration for handling unsupported reference types:
    
    Examples:
        >>> # Skip SRA references entirely
        >>> config = ReferenceValidationConfig(skip_prefixes=["SRA"])
        >>> # Downgrade unknown refs to WARNING
        >>> config = ReferenceValidationConfig(unknown_prefix_severity=ValidationSeverity.WARNING)
    """

2. Logging Enhancement (Optional)

Consider adding debug logging for configuration decisions:

# In validate() method
if prefix and prefix in skip_prefixes_upper:
    logger.debug(f"Skipping validation for prefix '{prefix}' (in skip_prefixes)")
    return ValidationResult(...)

3. Configuration Validation (Enhancement)

Consider validating that skip_prefixes don't contain duplicates or empty strings:

@validator('skip_prefixes')
def validate_skip_prefixes(cls, v):
    return [p.strip() for p in v if p.strip()]  # Remove empty/whitespace

📋 Use Cases Validation

Real-World Scenarios Supported:

✅ Dismech knowledge base - Can skip unsupported prefixes while keeping references
✅ Multi-source validation - Different severity levels for different reference types
✅ CI/CD pipelines - Configurable failure modes (ERROR/WARNING/INFO)
✅ Development workflows - Skip problematic references during development

🔧 README Documentation Review

Strengths:

Comprehensive examples showing both YAML config and CLI usage
Clear use case explanations for when to use each option
Practical examples with realistic reference prefixes (SRA, MGNIFY, BIOPROJECT)
Precedence rules clearly explained (skip_prefixes > unknown_prefix_severity)

Documentation Quality: Excellent

The README additions provide clear, actionable guidance for users.

🎯 Alignment with Repository Guidelines

CLAUDE.md Compliance:

✅ Test-driven development - Tests written first
✅ No unnecessary try/except blocks - Clean error handling
✅ Comprehensive docstrings - Excellent documentation
✅ Deterministic logic - No complex exception handling

Perfect adherence to project standards.

📊 Final Assessment

Aspect	Rating	Notes
Code Quality	⭐⭐⭐⭐⭐	Excellent structure, typing, documentation
Test Coverage	⭐⭐⭐⭐⭐	Comprehensive, edge cases covered
Performance	⭐⭐⭐⭐⭐	Optimized, no performance regressions
Security	⭐⭐⭐⭐⭐	No security concerns identified
Documentation	⭐⭐⭐⭐⭐	Clear examples, comprehensive README
Backward Compatibility	⭐⭐⭐⭐⭐	Fully backward compatible

✅ Recommendation: APPROVE

This PR is ready to merge. It's a textbook example of how to implement a new feature:

Well-designed API with clear configuration options
Thoroughly tested with comprehensive coverage
Properly documented with practical examples
Backward compatible with sensible defaults
Follows project conventions perfectly

The implementation addresses a real user need (handling unsupported reference types) with a clean, configurable solution that doesn't compromise existing functionality.

Great work! 🎉

github-actions bot mentioned this pull request Jan 14, 2026

Feature: Configuration to skip or downgrade severity for unsupported reference prefixes #28

Closed

github-actions bot mentioned this pull request Jan 14, 2026

Add comprehensive setup and troubleshooting documentation #30

Open

7 tasks

fix: Remove unnecessary f-string prefix in GEO validation notebook

3612641

Co-Authored-By: Claude Opus 4.5 <[email protected]>

dragon-ai-agent merged commit 91ab7b4 into main Jan 14, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configuration options for handling unsupported reference prefixes #29

Add configuration options for handling unsupported reference prefixes #29

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

cmungall commented Jan 14, 2026

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

claude bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add configuration options for handling unsupported reference prefixes #29

Add configuration options for handling unsupported reference prefixes #29

Uh oh!

Conversation

github-actions bot commented Jan 14, 2026

Summary

Changes

Core Implementation

Testing

Documentation

Configuration Examples

Example 1: Skip unsupported prefixes

Example 2: Downgrade severity for unknown prefixes

Example 3: Combined configuration

Use Cases

Implementation Notes

Related

Uh oh!

cmungall commented Jan 14, 2026

Uh oh!

github-actions bot commented Jan 14, 2026

Documentation Added 📚

What's Been Added

1. Setup Guide (docs/setup-guide.md)

2. Complete Workflow Tutorial (docs/tutorials/complete-workflow.md)

3. Troubleshooting Guide (docs/troubleshooting.md)

Examples Are Clear and Illustrative

Documentation Structure

Uh oh!

claude bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Review Checklist

Overall Assessment: Excellent ⭐⭐⭐⭐⭐

✅ Code Quality & Best Practices

Strengths:

Code Quality Examples:

✅ Test Coverage: Comprehensive

Test Quality Analysis:

Notable Test Cases:

✅ Implementation Analysis

Models (models.py)

Validator Logic (supporting_text_validator.py)

✅ Performance Considerations

Optimizations Implemented:

Performance Impact:

✅ Security Considerations

No Security Issues Identified:

🚀 Minor Enhancement Suggestions

1. Documentation Enhancement

2. Logging Enhancement (Optional)

3. Configuration Validation (Enhancement)

📋 Use Cases Validation

Real-World Scenarios Supported:

🔧 README Documentation Review

Strengths:

Documentation Quality: Excellent

🎯 Alignment with Repository Guidelines

CLAUDE.md Compliance:

📊 Final Assessment

✅ Recommendation: APPROVE

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Setup Guide (`docs/setup-guide.md`)

2. Complete Workflow Tutorial (`docs/tutorials/complete-workflow.md`)

3. Troubleshooting Guide (`docs/troubleshooting.md`)

claude bot commented Jan 14, 2026 •

edited

Loading

Models (`models.py`)

Validator Logic (`supporting_text_validator.py`)