Skip to content

feat: improve repository recommendations using GitHub topic matching#12

Open
tejeshvenkat wants to merge 6 commits into
OWASP-BLT:mainfrom
tejeshvenkat:feature-topic-matching
Open

feat: improve repository recommendations using GitHub topic matching#12
tejeshvenkat wants to merge 6 commits into
OWASP-BLT:mainfrom
tejeshvenkat:feature-topic-matching

Conversation

@tejeshvenkat
Copy link
Copy Markdown
Contributor

@tejeshvenkat tejeshvenkat commented Mar 5, 2026

This PR improves the repository recommendation system by incorporating GitHub repository topics into the ranking algorithm.

Key improvements:

• Fetch repository topics using the GitHub API preview header
• Extract contributor interests from repository topics
• Introduce topicScore to evaluate topic relevance
• Update ranking formula to include topicScore

New ranking formula:
(stars * 0.5) + (activityScore * 0.2) + (languageScore * 0.2) + (topicScore * 0.1)

Additional improvements:
• Safe handling of missing topic data
• Display repository topics in recommendation cards
• Improved recommendation accuracy using contributor interests

Summary by CodeRabbit

  • New Features

    • Contributor Activity Score and Top Languages are displayed in results; recommendations incorporate contributor topics, activity and language relevance.
  • Performance

    • Short-term response caching (10-minute TTL) reduces network calls.
  • Documentation

    • Contributing guide replaced with a structured workflow, branching conventions, code style, testing and clearer PR steps.

@Jayant2908
Copy link
Copy Markdown
Contributor

Hey man, really good changes. I am having some similar changes with a major big one, pushing it soon and you can iterate on that. Thank you!

@tejeshvenkat
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback!

That sounds great. I’ll wait for the upcoming changes and then update this PR to align with the new implementation. Happy to iterate on it further.

@tejeshvenkat tejeshvenkat force-pushed the feature-topic-matching branch from 5d484a2 to 75ec1cd Compare March 19, 2026 05:07
@owasp-blt
Copy link
Copy Markdown

owasp-blt Bot commented Mar 19, 2026

👋 Hi @tejeshvenkat!

This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:

  • The PR author
  • coderabbitai
  • copilot

Once a valid peer review is submitted, this check will pass automatically. Thank you!

⚠️ Peer review enforcement is active.

@owasp-blt owasp-blt Bot added the needs-peer-review PR needs peer review label Mar 19, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 7c9d783b-3ac6-4d6c-abda-317f9547c149

📥 Commits

Reviewing files that changed from the base of the PR and between bcef24c and 2e9c152.

📒 Files selected for processing (2)
  • index.html
  • js/app.js
✅ Files skipped from review due to trivial changes (1)
  • index.html
🚧 Files skipped from review as they are similar to previous changes (1)
  • js/app.js

Walkthrough

Added client-side caching (localStorage, 10-minute TTL), extended recommendation logic to use public events and repo languages/topics (activity score, top languages), updated UI to show contributor activity score and top languages, and revised README contributing instructions and API endpoint documentation.

Changes

Cohort / File(s) Summary
Documentation Updates
README.md
Removed a documented GitHub REST API endpoint and replaced the previous CONTRIBUTING instructions with a structured guide: branching conventions, files to edit, npm test, code style, PR steps targeting main, and guidance on contributions.
UI Additions
index.html
Added DOM elements for contributor metrics: #contributor-activity-score, #contributor-activity-label, and #top-languages-list; inserted placeholders and adjusted markup ordering.
Caching & Recommendation Engine
js/app.js
Introduced CACHE_TTL, getCachedData(), setCachedData() using localStorage (10-minute TTL) with safe parsing. Modified submit flow to use cache-first for repos/events, added error handling/fallbacks, changed buildRecommendations(userData, repos)buildRecommendations(userData, repos, eventsData = []), computed activity_score and activity_breakdown, derived top_languages, integrated contributor topics into scoring, augmented github_stats with new fields, and updated displayResults and generated Markdown to show activity and top languages.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant App as App (js/app.js)
    participant Cache as localStorage
    participant API as GitHub API
    participant Recommend as buildRecommendations()
    participant UI as DOM

    User->>App: Submit username form
    App->>Cache: Check `github_repos_${username}` cache
    alt Cache hit
        Cache-->>App: Return cached repos
    else Cache miss
        App->>API: GET /users/{username}/repos (Accept: topics)
        API-->>App: Repos data
        App->>Cache: Store repos with TTL
    end

    App->>Cache: Check `github_events_${username}` cache
    alt Cache hit
        Cache-->>App: Return cached events
    else Cache miss
        App->>API: GET /users/{username}/events
        alt API success
            API-->>App: Events data
            App->>Cache: Store events with TTL
        else API error
            App-->>App: Use default events = []
        end
    end

    App->>Recommend: buildRecommendations(userData, repos, events)
    Recommend->>Recommend: Compute activity_score & activity_breakdown
    Recommend->>Recommend: Extract top_languages from repos
    Recommend->>Recommend: Score topics & rank recommendations
    Recommend-->>App: Recommendations + github_stats (activity_score, top_languages)
    App->>UI: Display activity_score, top_languages, and recommendations
    UI-->>User: Show results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels: quality: high

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: improve repository recommendations using GitHub topic matching' accurately reflects the main change: integrating GitHub topics into the recommendation algorithm to improve ranking.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
js/app.js (1)

174-208: ⚠️ Potential issue | 🟠 Major

topicScore is self-referential with the current candidate set.

contributorTopics is built from the same repos array that you later rank, so every repo with topics already matches its own topics. In practice this is mostly a bonus for “has more topics,” not a relevance signal for contributor interests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@js/app.js` around lines 174 - 208, The topic scoring is biased because
contributorTopics is built from the same repos being scored, so each repo
matches its own topics; change getTopicScore to compute contributor interest
topics excluding the candidate repo (or accept a precomputed interest set keyed
by repo id) and compare the candidate repo.topics against that exclusion set;
keep the fallback relevantTopics list logic for when the exclusion set is empty,
and update references to contributorTopics/relevantTopics in getTopicScore so it
uses the exclusion set (use repo.id or another unique repo identifier to exclude
the candidate).
🧹 Nitpick comments (1)
README.md (1)

306-312: Don’t present a placeholder npm test as required validation.

This asks contributors to run npm test and then immediately says the command is only a placeholder. Until there is a real automated check behind it, I’d rename this step to manual validation and point contributors at the smoke-test workflow above.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 306 - 312, Update the README section titled "### 3.
Run Tests Before Submitting" to stop presenting `npm test` as a required
validation step when it’s just a placeholder: rename the section to something
like "Manual validation before submitting", remove or de-emphasize the `npm
test` command as an automated check, and instead reference the existing
smoke-test workflow (mentioned earlier in the README) as the authoritative
pre-submit check and/or provide explicit manual steps to perform; ensure the
text around the "npm test" snippet clarifies it is a placeholder and not an
automated gate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@index.html`:
- Around line 282-300: The Contributor Activity section is using inline analysis
that is out-of-sync with the main app bundle, so update the page to use the
single source of truth: remove or disable the inline analyzer and instead import
and invoke the analysis/rendering from js/app.js (or if you prefer to keep
inline, call the same exported functions from js/app.js). Specifically, ensure
the js/app.js logic that computes activity_score and top_languages updates the
DOM elements with IDs contributor-activity-score and top-languages-list (or
expose and call renderActivityScore() and renderTopLanguages() from js/app.js),
so the values replace the placeholder "0" and "—" after a real analysis runs.

In `@js/app.js`:
- Around line 153-159: In buildRecommendations, activityScore is computed once
for the whole user (eventsData → activityScore) and then added to every repo, so
it doesn't affect ordering; either compute a per-repo activity score by
filtering eventsData by event.repo.name for each repo (use event.repo.name to
attribute PushEvent/PullRequestEvent/IssuesEvent weights when calculating
repoActivityScore inside the loop that processes repos) and use that
repoActivityScore in the ranking formula, or remove activityScore from the
ranking and only include the global activityScore as display-only metadata in
github_stats; update any references to activityScore in buildRecommendations
(and the similar block around lines 210-216) accordingly.
- Around line 104-128: The code currently converts any non-OK GitHub fetch into
an empty array and persists it, which can poison the cache on transient errors;
update both repo and event fetch flows (symbols: getCachedData, setCachedData,
reposResponse, eventsResponse, reposData, eventsData) to only call setCachedData
when the HTTP response is ok and the parsed JSON is a valid array, and avoid
persisting or overwrite the cache when responses are not ok (instead return/keep
undefined or previous cache and optionally log the response status/error);
ensure the events try/catch also doesn’t setCachedData on fetch failures or
non-ok responses.

---

Outside diff comments:
In `@js/app.js`:
- Around line 174-208: The topic scoring is biased because contributorTopics is
built from the same repos being scored, so each repo matches its own topics;
change getTopicScore to compute contributor interest topics excluding the
candidate repo (or accept a precomputed interest set keyed by repo id) and
compare the candidate repo.topics against that exclusion set; keep the fallback
relevantTopics list logic for when the exclusion set is empty, and update
references to contributorTopics/relevantTopics in getTopicScore so it uses the
exclusion set (use repo.id or another unique repo identifier to exclude the
candidate).

---

Nitpick comments:
In `@README.md`:
- Around line 306-312: Update the README section titled "### 3. Run Tests Before
Submitting" to stop presenting `npm test` as a required validation step when
it’s just a placeholder: rename the section to something like "Manual validation
before submitting", remove or de-emphasize the `npm test` command as an
automated check, and instead reference the existing smoke-test workflow
(mentioned earlier in the README) as the authoritative pre-submit check and/or
provide explicit manual steps to perform; ensure the text around the "npm test"
snippet clarifies it is a placeholder and not an automated gate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 1c9aacd2-0032-42ad-a272-758523934595

📥 Commits

Reviewing files that changed from the base of the PR and between ebec537 and 75ec1cd.

📒 Files selected for processing (3)
  • README.md
  • index.html
  • js/app.js

Comment thread index.html
Comment thread js/app.js Outdated
Comment thread js/app.js Outdated
Copy link
Copy Markdown

@karunarapolu karunarapolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Ranking formula might be better if topics and languages are given more weight than stars.

@tejeshvenkat
Copy link
Copy Markdown
Contributor Author

good idea — can tune weights in a follow-up or explain catalog scoring

@owasp-blt owasp-blt Bot added has-peer-review PR has received peer review and removed needs-peer-review PR needs peer review labels Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

has-peer-review PR has received peer review quality: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants