Skip to content

Conversation

@Asespinel
Copy link

Description

This PR implements retry logic with exponential backoff to handle course re-run operations for large courses. When a course is re-run, the backend needs time to process and create the new course structure, which can result in temporary 404 or 202 responses. This change ensures the frontend waits appropriately instead of immediately showing "Not Found" errors.

Problem

When performing a re-run operation on large courses, the following issues occurred:

  1. Immediate 404 errors: The frontend would immediately show "Not Found" after a re-run, even though the course was being processed
  2. Inconsistent state: fetchCourseDetail would succeed but fetchCourseOutlineIndex would fail, leading to undefined properties (e.g., isEnabled)
  3. Poor user experience: Users had to manually refresh the page multiple times or wait with the page open until the course was ready

Solution

Implemented a retryOnNotReady helper function with exponential backoff that:

  • Retries API calls when receiving 202 (Accepted) status codes (course still processing)
  • Retries when receiving course_does_not_exist errors during the initial attempts
  • Uses exponential backoff (2s → 3s → 4.5s → 6.75s → ...)
  • Maximum of 10 retry attempts over approximately 60 seconds

Technical Details

Retry Logic

async function retryOnNotReady(
  apiCall: () => Promise,
  maxRetries: number = 10,
  initialDelay: number = 2000,
  backoffMultiplier: number = 1.5
): Promise

Parameters:

  • maxRetries: Maximum number of retry attempts (default: 10)
  • initialDelay: Initial delay in milliseconds (default: 2000ms)
  • backoffMultiplier: Multiplier for exponential backoff (default: 1.5)

Retry Conditions:

  • HTTP 202 (Accepted) - Backend explicitly indicates processing
  • course_does_not_exist error during first 5 attempts
  • Any other error immediately throws without retry

Why Exponential Backoff?

Large courses can take 30-60 seconds to process. Exponential backoff:

  • Reduces server load by spacing out requests
  • Adapts to different course sizes automatically
  • Provides better user experience than fixed intervals

Testing

Manual Testing Steps

  1. Create or select a large course (100+ units)
  2. Perform a re-run operation
  3. Verify the course loads successfully without "Not Found" errors

@openedx-webhooks
Copy link

Thanks for the pull request, @Asespinel!

This repository is currently maintained by @bradenmacdonald.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.


Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Oct 28, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Oct 28, 2025
@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 90.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.77%. Comparing base (5ce61fa) to head (5d64fb2).
⚠️ Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
src/course-outline/data/thunk.ts 78.57% 3 Missing ⚠️
src/data/thunks.ts 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2581      +/-   ##
==========================================
+ Coverage   94.74%   94.77%   +0.02%     
==========================================
  Files        1218     1225       +7     
  Lines       27299    27432     +133     
  Branches     5983     6184     +201     
==========================================
+ Hits        25865    25998     +133     
+ Misses       1376     1363      -13     
- Partials       58       71      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bradenmacdonald
Copy link
Contributor

Thanks for this contribution. Let me know when it's ready for review.

I see you implemented custom retry logic; in the long term, we'd like to migrate all data loading to React Query, which provides retry logic already. Did/could you consider updating this to use React Query, so we can use its retry logic?

Does this code to retry with backoff only when the course is new/rerun / seems to exist but has no outline yet, or does it always retry the outline no matter what?

Also, did you use any AI tools when creating this code? If so, please mention that in the PR description.

@mphilbrick211 mphilbrick211 moved this from Needs Triage to In Eng Review in Contributions Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Status: In Eng Review

Development

Successfully merging this pull request may close these issues.

3 participants