Skip to content

Blocks probably shouldn't retry infinitely in fail state #2152

Open
@sungodmoth

Description

@sungodmoth

This is a description of an already solved problem which I think indicates a (minor) design flaw.

Earlier today I had internet problems in the house. We were able to solve the problem, but several hours later I noticed something strange about my network activity (ironically, thanks to the net block on my bar): a consistent down speed of about 9Mbps despite having nothing running that should be downloading. I also noticed that the packages block was displaying "pacman -Sy exited with non zero exit status", even though pacman was working just fine when I ran it from the terminal.

Long story short I eventually found out that the cause of the mysterious leech on my down speed was i3status-rust repeatedly spawning pacman instances, each of which downloaded 11MB of package databases before exiting with an error. The error was somewhat opaque (something about error: GPGME error: No data) but easily resolved by deleting the contents of /tmp/checkup-db-i3statusrs-{user}. I suppose an internet outage earlier must have coincidentally hit while the block was running, and corrupted the checkup database by interrupting pacman at the wrong time.

That the database was corrupted is obviously no fault of i3status-rust, but I can't help but think there's something better it could have done than repeatedly run the block with seemingly no retry delay or maximum number of tries. Indeed, I have the packages block configured to run once every 10 minutes precisely because it's relatively intensive on resources, so it seems wrong that an error in the output could cause it to run more like once every 10 seconds. This was probably close to the worst-case scenario that could come from this oversight (if a CPU-intensive rather than network-intensive block were stuck in a loop like this, I'd at least notice the fans getting loud rather than going hours without noticing the resource drain), but I suspect other blocks could be similarly affected.

I'd think that the simplest way to mitigate this issue would be to respect the block interval when retrying from an error state. For instance, the protocol when a block errors could be to

  • immediately retry a handful of times, up to some arbitrary limit;
  • if this still fails, "give up" and wait until the next time the block would ordinarily run (until the block interval has elapsed) to try again.
    Something like this would significantly lessen the impact of this issue, considering that resource-intensive blocks are likely to be set to a high interval in the first place -- though I may well be missing some other reason this wouldn't work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions