-
Notifications
You must be signed in to change notification settings - Fork 5.5k
fix: Set default kExchangeMaxErrorDuration to be > maxMemoryArbitrationTime #26566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…axMemoryArbitrationTime
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR adjusts the default exchange error timeout to exceed the maximum memory arbitration duration, preventing misleading exchange failures under high memory pressure. Entity relationship diagram for updated SystemConfig propertyerDiagram
SYSTEM_CONFIG {
string kExchangeMaxErrorDuration
}
SYSTEM_CONFIG ||--|| EXCHANGE : configures
EXCHANGE {
}
Class diagram for updated SystemConfig default valueclassDiagram
class SystemConfig {
+string kExchangeMaxErrorDuration = "6m"
+constructor SystemConfig()
-- other properties and methods --
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thanks for the release note! Suggested rephrasing so that the entry begins with a keyword in the Order of changes in the Release Notes Guidelines. If this rephrasing doesn't accurately describe your work, please revise. |
|
@jaystarshot : Thanks for this change. I am in agreement that it is worth making this greater than maxArbitrationTime, but I feel 6 mins is a long duration as well. Do you feel its worth reducing maxArbitrationTime to 4 mins and increase exchangeMaxErrorDuration to 5 mins ? |
We have seen issues in production due to
Failed to fetch data from xxx /v1/task/20251016_061652_01883_is2sk.9.0.9.0/results/17/74 - Exhausted after 1 retries, duration 206279ms: proxygen::HTTPException: Shutdown transport: EOF, 10.154.225.18:8080 Operator: Exchange[3877]206279msis roughly 3 min. Arbitrator timeout is 5mDetailed investigation revealed that these errors occurred during high memory usage scenarios. Some stages were blocked waiting for memory arbitration, and while arbitration was pending, downstream tasks timed out when fetching data from upstream, resulting in the above exchange errors.
Fix
We increased the exchange error timeout to exceed the memory arbitrator’s activation duration. This prevents premature exchange failures caused by arbitration delays.
Note
This change does not address the underlying memory usage or query inefficiencies. It simply ensures that failures now correctly reflect memory pressure issues (e.g., lack of spilling or arbitration tuning) rather than misleading exchange errors.
There is also
kExchangeRequestTimeoutandkExchangeConnectTimeoutbut don't think they are respected in this case cause you can clearly see there was only 1 retry done in the error in 3 mins whereas we had values to set to 20sec== RELEASE NOTES ==
Hive Connector Changes