Description
This is a description of an already solved problem which I think indicates a (minor) design flaw.
Earlier today I had internet problems in the house. We were able to solve the problem, but several hours later I noticed something strange about my network activity (ironically, thanks to the net
block on my bar): a consistent down speed of about 9Mbps despite having nothing running that should be downloading. I also noticed that the packages
block was displaying "pacman -Sy
exited with non zero exit status", even though pacman
was working just fine when I ran it from the terminal.
Long story short I eventually found out that the cause of the mysterious leech on my down speed was i3status-rust
repeatedly spawning pacman
instances, each of which downloaded 11MB of package databases before exiting with an error. The error was somewhat opaque (something about error: GPGME error: No data
) but easily resolved by deleting the contents of /tmp/checkup-db-i3statusrs-{user}
. I suppose an internet outage earlier must have coincidentally hit while the block was running, and corrupted the checkup database by interrupting pacman
at the wrong time.
That the database was corrupted is obviously no fault of i3status-rust
, but I can't help but think there's something better it could have done than repeatedly run the block with seemingly no retry delay or maximum number of tries. Indeed, I have the packages
block configured to run once every 10 minutes precisely because it's relatively intensive on resources, so it seems wrong that an error in the output could cause it to run more like once every 10 seconds. This was probably close to the worst-case scenario that could come from this oversight (if a CPU-intensive rather than network-intensive block were stuck in a loop like this, I'd at least notice the fans getting loud rather than going hours without noticing the resource drain), but I suspect other blocks could be similarly affected.
I'd think that the simplest way to mitigate this issue would be to respect the block interval when retrying from an error state. For instance, the protocol when a block errors could be to
- immediately retry a handful of times, up to some arbitrary limit;
- if this still fails, "give up" and wait until the next time the block would ordinarily run (until the block interval has elapsed) to try again.
Something like this would significantly lessen the impact of this issue, considering that resource-intensive blocks are likely to be set to a high interval in the first place -- though I may well be missing some other reason this wouldn't work.