Skip to content

feat(hackerrank): IBM HackerRank Assessment 2026#4

Merged
shortthirdman merged 2 commits intomainfrom
ibm-hackerrank-2026
Mar 13, 2026
Merged

feat(hackerrank): IBM HackerRank Assessment 2026#4
shortthirdman merged 2 commits intomainfrom
ibm-hackerrank-2026

Conversation

@shortthirdman
Copy link
Copy Markdown
Member

Overview

This pull request introduces a spam classification solution implemented in Java as part of the IBM HackerRank Assessment 2026. The implementation focuses on identifying spam messages based on the presence of predefined spam indicator words within a collection of input texts.

Two approaches are provided:

  1. Iterative implementation optimized for clarity and early exit performance.
  2. Parallel stream-based implementation leveraging Java Streams and parallel processing to improve throughput for large datasets.

Both implementations normalize input using case-insensitive comparison and classify texts as spam or not_spam based on the presence of at least two spam-indicator words in a message.


Key Features

  • Spam detection logic

    • Classifies text as spam when ≥ 2 spam indicator words are detected.
    • Case-insensitive word matching.
  • Two processing strategies

    • Sequential implementation

      • Optimized with early termination when spam threshold is reached.
    • Parallel stream implementation

      • Utilizes parallelStream() for improved performance on large input sets.
  • Efficient lookups

    • Spam words are stored in a HashSet to achieve O(1) average lookup time.
  • Memory-conscious design

    • Result lists are pre-sized when possible to minimize resizing overhead.

Testing

A comprehensive JUnit 5 test suite has been added covering:

Functional Tests

  • Basic spam detection
  • Mixed spam and non-spam inputs
  • Multiple input texts

Edge Cases

  • Case-insensitive matching
  • Duplicate spam words
  • Empty input lists
  • Empty strings
  • Whitespace variations

Negative Tests

  • Null input handling
  • Null elements within collections
  • Tokenization limitations (split(" ") behavior)

Parallel Consistency

  • Verifies that both implementations produce identical results

Performance Considerations

Implementation Characteristics
classifyTexts Efficient for small to medium datasets with early exit optimization
classifyTextStream Parallel processing suited for large-scale datasets

The stream implementation uses .limit(2) to avoid unnecessary processing once the spam threshold is reached.


Complexity

Time Complexity

  • Average case: O(n × m)

    • n = number of texts
    • m = average number of words per text

Space Complexity

  • O(s + n)

    • s = number of spam words
    • n = number of texts

Future Improvements

Potential enhancements include:

  • Regex-based tokenization to handle punctuation and variable whitespace
  • Configurable spam threshold
  • Support for streaming inputs or large-scale message ingestion
  • Benchmark comparisons between sequential and parallel implementations
  • Property-based and fuzz testing for robustness

Summary

This PR provides a clean, performant, and well-tested spam classification implementation aligned with modern Java practices, offering both sequential and parallel processing strategies while ensuring correctness through a comprehensive test suite.

@shortthirdman shortthirdman self-assigned this Mar 13, 2026
@shortthirdman shortthirdman added the enhancement New feature or request label Mar 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Deprecation Warning: The deny-licenses option is deprecated for possible removal in the next major release. For more information, see issue 997.

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@github-actions
Copy link
Copy Markdown
Contributor

Code Coverage

Overall Project 87.69% -0.02% 🍏
Files changed 98.68% 🍏

File Coverage
SpamTextClassifier.java 98.68% -1.32% 🍏
PatternAnalyzer.java 77.39% 🍏

@shortthirdman shortthirdman merged commit 8de448e into main Mar 13, 2026
4 checks passed
@shortthirdman shortthirdman deleted the ibm-hackerrank-2026 branch March 13, 2026 14:06
@github-actions
Copy link
Copy Markdown
Contributor

Code Coverage

Overall Project 87.69% -0.02% 🍏
Files changed 98.68% 🍏

File Coverage
SpamTextClassifier.java 98.68% -1.32% 🍏
PatternAnalyzer.java 77.39% 🍏

@shortthirdman-org shortthirdman-org locked as resolved and limited conversation to collaborators Mar 25, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

1 participant