[Reporting] change interrim retry-able report failures so they aren't flagged as errors #216180
Labels
Feature:Reporting:Framework
Reporting issues pertaining to the overall framework
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
It's often the case that if a report generation fails because of time outs, it will succeed the next time, as the first attempt "warms up" ES (fills caches, etc) so that the ES requests may return faster.
Unfortunately, each of the retried failures is counted as a failure, and something we end up measuring in our health rules.
We should look into whether we can "ignore" these errored runs when they will be retried. A failure would only be generated when the final attempt fails.
We'll need to figure out what we're measuring here, that indicates these are failures. Not clear if logging is affected, but generally it's fairly easy to see the retries in the logs.
It may also be be the case that we'd like this support for retry-able tasks in general. For instance, for a Slack 429 response (retry later), we will retry the task, but the 429 attempt will be flagged as a failure.
The text was updated successfully, but these errors were encountered: