Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Oct 29, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Oct 29, 2025 1:53am
rivet-inspector Ignored Ignored Oct 29, 2025 1:53am
rivet-site Ignored Ignored Oct 29, 2025 1:53am

Copy link
Contributor Author

MasterPtato commented Oct 29, 2025


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Oct 29, 2025

Code Review: PR #3282

Summary

This PR disables the reschedule backoff mechanism in the actor runtime by commenting out the exponential backoff sleep logic. As indicated by the title "do not merge", this appears to be a temporary change for testing/debugging purposes.

Analysis

What This Changes:

  • Comments out lines 643-656 in engine/packages/pegboard/src/workflows/actor/runtime.rs:643-656
  • The commented code implements exponential backoff when rescheduling actors (starting at 2000ms base timeout)
  • The backoff logic prevents immediate retries and allows early destruction if a Destroy signal is received during the wait period

Impact:

⚠️ Performance & Resource Concerns:

  • Removes throttling protection: Without backoff, failed actors will be rescheduled immediately, potentially causing:
    • Resource exhaustion if actors repeatedly fail (e.g., due to missing dependencies, configuration errors)
    • Database/service pressure from rapid retry loops
    • Difficulty debugging issues due to log flooding

⚠️ Behavioral Changes:

  • Loss of graceful shutdown: The commented code allows actors to be destroyed during the backoff period via listen_with_timeout::<Destroy>. Without this, actors cannot be cleanly destroyed between reschedule attempts
  • Dead code: The backoff object (lines 621-627) is still initialized but never used, making the code misleading

Code Quality Issues:

  1. Dead Code: Lines 621-627 create a backoff object that is never stepped through:

    let mut backoff = util::backoff::Backoff::new_at(
        8,
        None,
        BASE_RETRY_TIMEOUT_MS,
        500,
        state.reschedule_state.retry_count,
    );
    // Never used after this point
  2. Commented Code: Large blocks of commented code should generally be avoided in favor of feature flags or conditional compilation if temporary behavior is needed

  3. Missing Safeguards: If this is intentional (even temporarily), there should be rate limiting or circuit breaker logic elsewhere to prevent runaway retries

Recommendations

For Testing (if that's the goal):

If you need to disable backoff temporarily for testing, consider instead:

// Make backoff configurable via feature flag or config
const DISABLE_RESCHEDULE_BACKOFF: bool = cfg!(feature = "disable-reschedule-backoff");

if state.reschedule_state.retry_count > 0 && !DISABLE_RESCHEDULE_BACKOFF {
    let next = backoff.step().expect("should not have max retry");
    // ... rest of logic
}

Or add a runtime flag in the workflow input to control this behavior.

Before Merging (if this were to be merged):

  1. Remove the unused backoff initialization
  2. Add metrics/alerts for rapid reschedule loops
  3. Implement alternative rate limiting mechanism
  4. Document why backoff was removed
  5. Add tests covering the new behavior

For "Do Not Merge" PRs:

Consider using draft PRs or WIP branches rather than opening a PR marked "do not merge". This keeps the PR list cleaner and prevents accidental merges.

Verdict

Given the "do not merge" label, this appears to be intentional for testing. However:

  • ✅ The code change itself is straightforward and reversible
  • ⚠️ Be cautious running this in any production-like environment
  • ❌ Should not be merged as-is due to the concerns above
  • 🧹 Clean up dead code (backoff initialization) even in test branches

Questions

  1. What specific behavior are you testing by disabling the backoff?
  2. Have you observed any negative impacts (resource usage, retry storms) with this change?
  3. Is there a specific condition you're investigating that the backoff was masking?

Review generated with attention to CLAUDE.md conventions and Rivet architecture patterns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants