Add SOPClassUID and TransferSyntaxUID to the main index#125
Merged
Conversation
Add two series-level DICOM attributes to the main index query: - SOPClassUID: unambiguous object type identifier, more specific than Modality for distinguishing object types (e.g., Enhanced CT vs legacy CT, parametric maps, structured reports) - TransferSyntaxUID: encoding/compression of stored instances (e.g., Explicit VR Little Endian, JPEG 2000, HTJ2K), useful for tool compatibility and performance planning Both are mandatory DICOM attributes with very low cardinality, so the size impact on the parquet file is negligible (< 1MB combined). Refs #124 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SM series almost always contain instances with mixed transfer syntaxes (93.7% of SM series) — e.g., uncompressed for thumbnails/labels and JPEG/JPEG2000 for tiles. ANY_VALUE would arbitrarily pick one, which is misleading. STRING_AGG(DISTINCT ...) captures all values as a comma-separated string while behaving identically to ANY_VALUE for non-SM series (single value, no comma). Refs #124 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add sop_class_name and transfer_syntax_name columns derived from CASE mappings of SOPClassUID and TransferSyntaxUID respectively. Names are verified against pydicom's UID dictionary. The ELSE clause uses ERROR() so the query fails loudly if an unmapped UID appears in a future IDC version, forcing the mapping to be updated rather than silently falling back to the raw UID. Validated against BigQuery: 994,073 series, zero differences in all original columns, no series lost. Refs #124 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The INNER JOIN with dicom_metadata_curated (used only for BodyPartExamined) would silently drop series if a future IDC version has incomplete curated coverage. LEFT JOIN preserves all series, with BodyPartExamined as NULL when curated data is unavailable. No change in output for v23 (both tables have identical instance coverage: 46,870,903 instances, zero orphans in either direction). Refs #124 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The step summary was not visible on PR-triggered CD runs. Using tee -a ensures the report appears in both the step log output and the GitHub step summary, making it reliably visible regardless of event type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update all actions to versions that use Node.js 24, ahead of the June 2, 2026 deprecation deadline for Node.js 20: - actions/checkout: v4 → v6 - actions/setup-python: v5 → v6 - actions/upload-artifact: v4 → v7 - actions/download-artifact: v4 → v8 - google-github-actions/auth: v2 → v3 - google-github-actions/upload-cloud-storage: v2 → v3 Also add google-cloud-bigquery-storage to pip install in the CD workflow to silence the "BigQuery Storage module not found" warning and use the faster gRPC-based API instead of the REST fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.