Skip to content

watchdog: Support ro.hw_timeout_multiplier on Android#6123

Open
tutan-chromium wants to merge 4 commits into
mainfrom
dev/tutankhamen/watchdog-hw-timeout-multiplier
Open

watchdog: Support ro.hw_timeout_multiplier on Android#6123
tutan-chromium wants to merge 4 commits into
mainfrom
dev/tutankhamen/watchdog-hw-timeout-multiplier

Conversation

@tutan-chromium
Copy link
Copy Markdown

Multiply fatal timer durations by the integer multiplier read from the system property ro.hw_timeout_multiplier when building for Android.

This keeps Perfetto thread and task-runner timeouts safe from triggering false-positive crashes (wdog_reason: 3) during prolonged suspend/resume cycles or on slow emulated environments.

I use direct __system_property_get to scale the interval ms inside CreateFatalTimer, ensuring that we avoid pulling heavy dependencies like libbase into Perfetto base. Non-Android platforms are unaffected.

Bug: 6122

Multiply fatal timer durations by the integer multiplier read from the
system property `ro.hw_timeout_multiplier` when building for Android.

This keeps Perfetto thread and task-runner timeouts safe from triggering
false-positive crashes (wdog_reason: 3) during prolonged suspend/resume
cycles or on slow emulated environments.

I use direct `__system_property_get` to scale the interval `ms` inside
`CreateFatalTimer`, ensuring that we avoid pulling heavy dependencies
like libbase into Perfetto base. Non-Android platforms are unaffected.

Bug: 504382878
@tutan-chromium tutan-chromium requested a review from a team as a code owner June 3, 2026 21:50
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

…tiplier

Add comments explaining that this system property is already used for all
sorts of Android watchdogs and that bringing Perfetto watchdog behavior closer
to Android's standard behavior reduces surprises.

Bug: 6122
Copy link
Copy Markdown
Member

@sashwinbalaji sashwinbalaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

FYI, we are also investigating watchdog issue on android (http://b/496506220). And to summarise the current status:

  • Added a trigger that captures a trace when traced_probes disconnects (#5299).
  • @primiano made the watchdog log the actual mono/boot/cpu clocks at crash time (#5749), and gave the process ~60s to flush the trace before crashing instead of dying immediately (#5755), so the captured traces are more useful.
  • The traces we got back were all kTraceDidntStop (wdog reason 4).
  • And on looking into them, the system was just busy immediately after boot (heavy ftrace load) and the timeout we compute was just too short. mono == boot in every case, so most likely suspend wasn't involved.
  • So we raised a PR to floor that timeout at 5 min (#6045).
  • This was done last week, waiting for more traces to investigate the watchdog:3 issue.

Happy to know if you any other insight.

Comment thread src/base/watchdog_posix.cc Outdated
uint32_t GetHwTimeoutMultiplier() {
static uint32_t multiplier = []() {
char prop_val[PROP_VALUE_MAX] = {0};
if (__system_property_get("ro.hw_timeout_multiplier", prop_val) > 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use the GetAndroidProp

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/base/watchdog_posix.cc Outdated
static uint32_t multiplier = []() {
char prop_val[PROP_VALUE_MAX] = {0};
if (__system_property_get("ro.hw_timeout_multiplier", prop_val) > 0) {
int val = atoi(prop_val);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringToUInt32 here pls

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/base/watchdog_posix.cc Outdated
char prop_val[PROP_VALUE_MAX] = {0};
if (__system_property_get("ro.hw_timeout_multiplier", prop_val) > 0) {
int val = atoi(prop_val);
return val > 1 ? static_cast<uint32_t>(val) : 1u;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/slight worry: AFAICT by code searching, the value of multiplier is usually small. But just to be defensive shall we clamp it ? Because if some erroneous value (should not happen), the below multiplication can overflow as timeout is also high (5min = 300000).

Also just curious on what are the possible values for your device ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's set to 500, for somewhat related reasons.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Filip said, we use 500 for our relatively slow HW. And I added uint32_t overflow check.

Use GetAndroidProp() from android_utils.cc instead of
__system_property_get(), and StringToUInt32() from string_utils.h
instead of atoi(). Additionally, clamp the final timeout multiplication
to prevent overflow of uint32.

Bug: 504382878
@tutan-chromium
Copy link
Copy Markdown
Author

Thanks for the PR!

FYI, we are also investigating watchdog issue on android (http://b/496506220). And to summarise the current status:
Happy to know if you any other insight.

Thank you so much for the information! In our case, we are likely dealing with an edge case on a relatively slow SoC with potential instabilities in suspend/resume state transitions and monotonic clock drifts (see b/460841705). I reviewed the changes you mentioned, and they will definitely be very useful for us as well. Thanks again!

#include <sys/timerfd.h>
#include <unistd.h>

#if PERFETTO_BUILDFLAG(PERFETTO_OS_ANDROID)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pls place it below along with other perfetto imports and after the stdlib ones.

Comment on lines +165 to +169
if (scaled_ms > std::numeric_limits<uint32_t>::max()) {
ms = std::numeric_limits<uint32_t>::max();
} else {
ms = static_cast<uint32_t>(scaled_ms);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit optional: to redue the number of lines

    ms = static_cast<uint32_t>(
        std::min<uint64_t>(scaled_ms, std::numeric_limits<uint32_t>::max()));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants