feat: improve repository recommendations using GitHub topic matching#12
feat: improve repository recommendations using GitHub topic matching#12tejeshvenkat wants to merge 6 commits into
Conversation
|
Hey man, really good changes. I am having some similar changes with a major big one, pushing it soon and you can iterate on that. Thank you! |
|
Thanks for the feedback! That sounds great. I’ll wait for the upcoming changes and then update this PR to align with the new implementation. Happy to iterate on it further. |
…ntribution guidelines
5d484a2 to
75ec1cd
Compare
|
👋 Hi @tejeshvenkat! This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:
Once a valid peer review is submitted, this check will pass automatically. Thank you!
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdded client-side caching (localStorage, 10-minute TTL), extended recommendation logic to use public events and repo languages/topics (activity score, top languages), updated UI to show contributor activity score and top languages, and revised README contributing instructions and API endpoint documentation. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant App as App (js/app.js)
participant Cache as localStorage
participant API as GitHub API
participant Recommend as buildRecommendations()
participant UI as DOM
User->>App: Submit username form
App->>Cache: Check `github_repos_${username}` cache
alt Cache hit
Cache-->>App: Return cached repos
else Cache miss
App->>API: GET /users/{username}/repos (Accept: topics)
API-->>App: Repos data
App->>Cache: Store repos with TTL
end
App->>Cache: Check `github_events_${username}` cache
alt Cache hit
Cache-->>App: Return cached events
else Cache miss
App->>API: GET /users/{username}/events
alt API success
API-->>App: Events data
App->>Cache: Store events with TTL
else API error
App-->>App: Use default events = []
end
end
App->>Recommend: buildRecommendations(userData, repos, events)
Recommend->>Recommend: Compute activity_score & activity_breakdown
Recommend->>Recommend: Extract top_languages from repos
Recommend->>Recommend: Score topics & rank recommendations
Recommend-->>App: Recommendations + github_stats (activity_score, top_languages)
App->>UI: Display activity_score, top_languages, and recommendations
UI-->>User: Show results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels: 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable sequence diagrams in the walkthrough.Disable the |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
js/app.js (1)
174-208:⚠️ Potential issue | 🟠 Major
topicScoreis self-referential with the current candidate set.
contributorTopicsis built from the samereposarray that you later rank, so every repo with topics already matches its own topics. In practice this is mostly a bonus for “has more topics,” not a relevance signal for contributor interests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@js/app.js` around lines 174 - 208, The topic scoring is biased because contributorTopics is built from the same repos being scored, so each repo matches its own topics; change getTopicScore to compute contributor interest topics excluding the candidate repo (or accept a precomputed interest set keyed by repo id) and compare the candidate repo.topics against that exclusion set; keep the fallback relevantTopics list logic for when the exclusion set is empty, and update references to contributorTopics/relevantTopics in getTopicScore so it uses the exclusion set (use repo.id or another unique repo identifier to exclude the candidate).
🧹 Nitpick comments (1)
README.md (1)
306-312: Don’t present a placeholdernpm testas required validation.This asks contributors to run
npm testand then immediately says the command is only a placeholder. Until there is a real automated check behind it, I’d rename this step to manual validation and point contributors at the smoke-test workflow above.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 306 - 312, Update the README section titled "### 3. Run Tests Before Submitting" to stop presenting `npm test` as a required validation step when it’s just a placeholder: rename the section to something like "Manual validation before submitting", remove or de-emphasize the `npm test` command as an automated check, and instead reference the existing smoke-test workflow (mentioned earlier in the README) as the authoritative pre-submit check and/or provide explicit manual steps to perform; ensure the text around the "npm test" snippet clarifies it is a placeholder and not an automated gate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@index.html`:
- Around line 282-300: The Contributor Activity section is using inline analysis
that is out-of-sync with the main app bundle, so update the page to use the
single source of truth: remove or disable the inline analyzer and instead import
and invoke the analysis/rendering from js/app.js (or if you prefer to keep
inline, call the same exported functions from js/app.js). Specifically, ensure
the js/app.js logic that computes activity_score and top_languages updates the
DOM elements with IDs contributor-activity-score and top-languages-list (or
expose and call renderActivityScore() and renderTopLanguages() from js/app.js),
so the values replace the placeholder "0" and "—" after a real analysis runs.
In `@js/app.js`:
- Around line 153-159: In buildRecommendations, activityScore is computed once
for the whole user (eventsData → activityScore) and then added to every repo, so
it doesn't affect ordering; either compute a per-repo activity score by
filtering eventsData by event.repo.name for each repo (use event.repo.name to
attribute PushEvent/PullRequestEvent/IssuesEvent weights when calculating
repoActivityScore inside the loop that processes repos) and use that
repoActivityScore in the ranking formula, or remove activityScore from the
ranking and only include the global activityScore as display-only metadata in
github_stats; update any references to activityScore in buildRecommendations
(and the similar block around lines 210-216) accordingly.
- Around line 104-128: The code currently converts any non-OK GitHub fetch into
an empty array and persists it, which can poison the cache on transient errors;
update both repo and event fetch flows (symbols: getCachedData, setCachedData,
reposResponse, eventsResponse, reposData, eventsData) to only call setCachedData
when the HTTP response is ok and the parsed JSON is a valid array, and avoid
persisting or overwrite the cache when responses are not ok (instead return/keep
undefined or previous cache and optionally log the response status/error);
ensure the events try/catch also doesn’t setCachedData on fetch failures or
non-ok responses.
---
Outside diff comments:
In `@js/app.js`:
- Around line 174-208: The topic scoring is biased because contributorTopics is
built from the same repos being scored, so each repo matches its own topics;
change getTopicScore to compute contributor interest topics excluding the
candidate repo (or accept a precomputed interest set keyed by repo id) and
compare the candidate repo.topics against that exclusion set; keep the fallback
relevantTopics list logic for when the exclusion set is empty, and update
references to contributorTopics/relevantTopics in getTopicScore so it uses the
exclusion set (use repo.id or another unique repo identifier to exclude the
candidate).
---
Nitpick comments:
In `@README.md`:
- Around line 306-312: Update the README section titled "### 3. Run Tests Before
Submitting" to stop presenting `npm test` as a required validation step when
it’s just a placeholder: rename the section to something like "Manual validation
before submitting", remove or de-emphasize the `npm test` command as an
automated check, and instead reference the existing smoke-test workflow
(mentioned earlier in the README) as the authoritative pre-submit check and/or
provide explicit manual steps to perform; ensure the text around the "npm test"
snippet clarifies it is a placeholder and not an automated gate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 1c9aacd2-0032-42ad-a272-758523934595
📒 Files selected for processing (3)
README.mdindex.htmljs/app.js
…, catalog, cache, events, topic matching
karunarapolu
left a comment
There was a problem hiding this comment.
Suggestion: Ranking formula might be better if topics and languages are given more weight than stars.
|
good idea — can tune weights in a follow-up or explain catalog scoring |
This PR improves the repository recommendation system by incorporating GitHub repository topics into the ranking algorithm.
Key improvements:
• Fetch repository topics using the GitHub API preview header
• Extract contributor interests from repository topics
• Introduce topicScore to evaluate topic relevance
• Update ranking formula to include topicScore
New ranking formula:
(stars * 0.5) + (activityScore * 0.2) + (languageScore * 0.2) + (topicScore * 0.1)
Additional improvements:
• Safe handling of missing topic data
• Display repository topics in recommendation cards
• Improved recommendation accuracy using contributor interests
Summary by CodeRabbit
New Features
Performance
Documentation