build+log: use the Critifalf logging function more liberally where it makes sense #6693
Labels
beginner
Issues suitable for new developers
good first issue
Issues suitable for first time contributors to LND
logging
Related to the logging / debug output functionality
safety
General label for issues/PRs related to the safety of using the software
A few weeks ago, one of my routing nodes had its disk fill up. I didn't detect the problem until I attempted to poke it, realized something was up, then saw all the errors in the log. I think the node was running for a day or so in this state. Thankfully, I was able to shut it down, increase the partition size and restart w/o any weird force closes or breaches or w/e.
Our logging infra has a
Criticalf
formatting function that'll log the error, then immediately shutdown. In my case, if we were using this logging mode when we see errors like this, my node would've cleanly shut down the first time, potentially never restarting, which would've triggered some monitoring I have set up.A disk filling up is just one case I can think of. Other may include: no free file descriptors. Alternatively, if I had the
healtcheck
on I would've caught this, but imo we still want this sort of log-then-crash behavior to for situations we know we can't recover from.The text was updated successfully, but these errors were encountered: