Skip to content

NUTCH-3161 Address Sonarcloud High and Medium Security Hotspots#904

Open
lewismc wants to merge 3 commits intoapache:masterfrom
lewismc:NUTCH-3161
Open

NUTCH-3161 Address Sonarcloud High and Medium Security Hotspots#904
lewismc wants to merge 3 commits intoapache:masterfrom
lewismc:NUTCH-3161

Conversation

@lewismc
Copy link
Member

@lewismc lewismc commented Feb 26, 2026

PR to address NUTCH-3161. This patch addresses following Security Hotspota

High

(false positives): Exclude plugin resource directories from analysis in sonar-project.properties. No Java code lives in conf, data, or sample under src/plugin/**, so these paths are excluded from scanning., and

Medium

  • ParseOutputFormat (src/java/.../ParseOutputFormat.java): Replace regex " *, *" for db.parsemeta.to.crawldb with comma-split + trim in getParseMetaToCrawlDBKeys(). Add TestParseOutputFormat with tests for empty, single, comma-separated, trim, and empty-segment handling.
  • HtmlParser (parse-html plugin): Remove metaPattern, charsetPattern, and charsetPatternHTML5. Use linear string parsing in extractCharsetFromMeta() / extractCharsetValue() for HTML4 and HTML5 meta charset detection. Add testExtractCharsetFromMeta in existing TestHtmlParser.
  • JSParseFilter (parse-js plugin): Remove STRING_PATTERN and URI_PATTERN. Use extractQuotedStrings() and looksLikeUri() (linear scan / simple checks). Add testExtractQuotedStrings and testLooksLikeUri in existing TestJSParseFilter.
  • UrlValidator (urlfilter-validator plugin): Remove URL_PATTERN and AUTHORITY_PATTERN. Use java.net.URI for URL structure and parseAuthority() for host/port. Remove unused isBlankOrNull. Add testParseAuthority in existingTestUrlValidator.

Thanks for any review.

@lewismc lewismc self-assigned this Feb 26, 2026
@lewismc
Copy link
Member Author

lewismc commented Feb 26, 2026

Confirmed this PR addresses the security hotspots and passes tests.

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
62.6% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant