Skip to content

Retrieve SPDX from a given license#249

Merged
soimkim merged 4 commits intomainfrom
spdx
Feb 26, 2026
Merged

Retrieve SPDX from a given license#249
soimkim merged 4 commits intomainfrom
spdx

Conversation

@soimkim
Copy link
Contributor

@soimkim soimkim commented Feb 25, 2026

Description

  • License expressions are now normalized to SPDX format when parsing ScanCode files, with automatic fallback for unrecognized formats.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Documentation update
  • Refactoring, Maintenance
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30bb318 and 2742426.

📒 Files selected for processing (1)
  • src/fosslight_source/_parsing_scancode_file_item.py

📝 Walkthrough

Walkthrough

Added get_license_expression_spdx to normalize trimmed license expressions to SPDX via licensedcode.cache.build_spdx_license_expression; parsing_scancode_32_later now substitutes found license strings with the SPDX expression when available.

Changes

Cohort / File(s) Summary
License SPDX Expression Normalization
src/fosslight_source/_parsing_scancode_file_item.py
Added get_license_expression_spdx(license_expression: str) -> str that trims input, calls licensedcode.cache.build_spdx_license_expression, returns "" for empty input, GPL/LicenseRef-like matches, or on exception. Integrated into parsing_scancode_32_later to replace found_lic with normalized SPDX when non-empty. Minor whitespace cleanup.

Sequence Diagram(s)

sequenceDiagram
    participant Parser
    participant Helper as get_license_expression_spdx
    participant LicensedCode as licensedcode.cache

    Parser->>Helper: pass found_lic
    Helper->>LicensedCode: build_spdx_license_expression(trim(found_lic))
    LicensedCode-->>Helper: SPDX expression or error
    Helper-->>Parser: SPDX expression or empty string
    alt SPDX non-empty
        Parser->>Parser: replace found_lic with SPDX expression
    else SPDX empty
        Parser->>Parser: keep original found_lic
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 I nibble at strings both long and terse,
I trim the edges, ask licensedcode to converse,
If SPDX appears, I tuck it in neat,
If empty returns, I leave the old seat,
Hopping on — tidy licenses, tidy feet.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Retrieve SPDX from a given license' clearly and concisely describes the main change: adding SPDX normalization functionality for license expressions during ScanCode file parsing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch spdx

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@soimkim soimkim self-assigned this Feb 25, 2026
@soimkim soimkim added the chore [PR/Issue] Refactoring, maintenance the code label Feb 25, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fosslight_source/_parsing_scancode_file_item.py`:
- Around line 210-220: In get_license_expression_spdx, catch exceptions the same
way other handlers do and log the failure with logger.debug including the
exception info; also guard against build_spdx_license_expression returning None
by checking the result before calling str() and return "" when result is None or
starts with "LicenseRef-". Use the existing function name
get_license_expression_spdx and the called symbol build_spdx_license_expression
and follow the file's logger.debug pattern for consistency.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 535c81e and ee24e55.

📒 Files selected for processing (1)
  • src/fosslight_source/_parsing_scancode_file_item.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/fosslight_source/_parsing_scancode_file_item.py (1)

283-284: license_expression_spdx is reused for a semantically different variable at line 310.

Line 283 uses license_expression_spdx for a per-match SPDX lookup; line 310 reuses the same name for the file-level detected_license_expression_spdx field. While there's no functional conflict (different scopes in the loop), the shadowing reduces clarity.

♻️ Proposed rename at line 283
-                                    license_expression_spdx = get_license_expression_spdx(found_lic)
-                                    found_lic = license_expression_spdx if license_expression_spdx else found_lic
+                                    matched_lic_spdx = get_license_expression_spdx(found_lic)
+                                    found_lic = matched_lic_spdx if matched_lic_spdx else found_lic
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fosslight_source/_parsing_scancode_file_item.py` around lines 283 - 284,
The local variable license_expression_spdx used when processing each match
should be renamed to avoid shadowing the file-level
detected_license_expression_spdx; update the per-match variable (where
get_license_expression_spdx(found_lic) is called and assigned back into
found_lic) to a clearer name like match_license_expression_spdx or
license_expression_match_spdx, and use that renamed identifier in the subsequent
assignment to found_lic so the later file-level detected_license_expression_spdx
remains unambiguous.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fosslight_source/_parsing_scancode_file_item.py`:
- Around line 216-218: The current guard is ineffective because result is always
overwritten; change the logic so that when result_obj is falsy/None you
immediately return an empty string (or set result = "" and skip conversion)
instead of executing result = str(result_obj); specifically update the code
around result_obj (the block using build_spdx_license_expression / result_obj)
to: if not result_obj: return "" (or if you must keep result variable: if not
result_obj: result = "" else: result = str(result_obj)) so that "None" cannot be
produced as a license string.
- Around line 213-223: The function get_license_expression_spdx incorrectly
handles a falsy result_obj: it sets result = "" then immediately overwrites with
str(result_obj), which yields "None" when result_obj is None; update the logic
to check result_obj and return "" early (or use a try-else) before converting to
string, e.g. call build_spdx_license_expression(license_expression.strip()), if
result_obj is falsy return "" immediately, otherwise set result =
str(result_obj) and then apply the LicenseRef- check; ensure exception handling
remains to return "" on errors.

---

Nitpick comments:
In `@src/fosslight_source/_parsing_scancode_file_item.py`:
- Around line 283-284: The local variable license_expression_spdx used when
processing each match should be renamed to avoid shadowing the file-level
detected_license_expression_spdx; update the per-match variable (where
get_license_expression_spdx(found_lic) is called and assigned back into
found_lic) to a clearer name like match_license_expression_spdx or
license_expression_match_spdx, and use that renamed identifier in the subsequent
assignment to found_lic so the later file-level detected_license_expression_spdx
remains unambiguous.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee24e55 and 6b4ee6c.

📒 Files selected for processing (1)
  • src/fosslight_source/_parsing_scancode_file_item.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fosslight_source/_parsing_scancode_file_item.py`:
- Around line 210-220: The function get_license_expression_spdx must handle
build_spdx_license_expression returning None before calling regex.match; modify
get_license_expression_spdx to check if result is None (or falsy) and return ""
immediately prior to calling regex.match(result), and replace the bare except
with a narrower exception handling (or rethrow) so TypeError from regex.match
won't be silently swallowed.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b4ee6c and 30bb318.

📒 Files selected for processing (1)
  • src/fosslight_source/_parsing_scancode_file_item.py

@soimkim soimkim merged commit 00d41d0 into main Feb 26, 2026
6 of 7 checks passed
@soimkim soimkim deleted the spdx branch February 26, 2026 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore [PR/Issue] Refactoring, maintenance the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant