Lock queue if updating toolchain fails#1307
Conversation
Reduces the impact from failures like rust-lang#1305, rather than continuing to build crates with the failing toolchain we will just stop building crates until an admin has time to investigate the cause.
| if let Err(err) = builder.update_toolchain() { | ||
| log::error!("Updating toolchain failed, locking queue: {}", err); | ||
| self.lock()?; | ||
| return Err(err); | ||
| } |
There was a problem hiding this comment.
Rather than calling update_toolchain in more places, do you think it makes sense to change build_package to return an error enum so we can tell where the error came from? The current change seems racy: a new nightly could be published between calling update_toolchain here and calling it in build_package.
I'm also slightly concerned that this doesn't handle errors when an admin runs cratesfyi build crate, but I don't know a simple way to handle that, and it seems rare for that to be the first build with a new toolchain anyway.
There was a problem hiding this comment.
That was the approach I tried first, but I got stuck with working out how to make it work with failure, I could take another attempt.
There was a problem hiding this comment.
Hmm, could you maybe add a new error type and try downcasting to that? Don't spend too much time on it :) this should work fine 99.9% of the time.
|
I think this is fine to merge - only the queue needs to know what the error type is, and the edge case for it to be racy is less than a few seconds a day: docs.rs/src/docbuilder/rustwide_builder.rs Lines 294 to 298 in 79ecb67 |
Reduces the impact from failures like #1305, rather than continuing to build crates with the failing toolchain we will just stop building crates until an admin has time to investigate the cause.
Example log when this happens: