Skip to content

Improve setInstanceOperation performance #3017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

zpinto
Copy link
Contributor

@zpinto zpinto commented Apr 2, 2025

Issues

  • Currently setInstanceOperation takes a long time because we are unecessarily finding instances with matching logicalIds as well as individually getting all instance configs in the cluster.

(#200 - Link your issue number here: You can write "Fixes #XXX". Please use the proper keyword so that the issue gets closed automatically. See https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue
Any of the following keywords can be used: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved)

Description

This change will switch to parallel get on all instance configs and avoid calling findInstancesWithMatchingLogicalId for instance operation transitions where this check is not required.

Tests

Current tests cover this change, as the result/response of these APIs will not change.

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

zpinto added 3 commits April 2, 2025 14:17
…ssarily finding instances with matching logicalIds as well as individually getting all instance configs in the cluster.

This change will switch to parallel get on all instance configs and avoid calling findInstancesWithMatchingLogicalId for instance operation transitions where this check is not required.
@zpinto zpinto changed the title Improve setInstanceOperation preformance Improve setInstanceOperation performance Apr 2, 2025
@GrantPSpencer
Copy link
Contributor

This will close: #2928

Comment on lines 234 to 241
if (baseDataAccessor != null) {
return findInstancesWithMatchingLogicalId(baseDataAccessor, clusterName, instanceConfig);
} else if (configAccessor != null) {
return findInstancesWithMatchingLogicalId(configAccessor, clusterName, instanceConfig);
} else {
throw new HelixException(
"Both BaseDataAccessor and ConfigAccessor cannot be null at the same time");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better way to make the code more clear is:

if (baseDataAccessor == null && configAccessor == null) {
throw new HelixException(
"Both BaseDataAccessor and ConfigAccessor cannot be null at the same time");
}

return findInstancesWithMatchingLogicalId(baseDataAccessor != null ? baseDataAccessor : configAccessor, clusterName, instanceConfig);

This simplify the branch of coding and also return the fail over logic at first beginning with out further checks. Especially when you have function with a lot of arguments repeatedly shows up, readability can be reduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, I have incorporated the suggestion.

@xyuanlu
Copy link
Contributor

xyuanlu commented Apr 11, 2025

Had an offline discussion about wether we want to use multistream with configAccessor or using dataAccessor.

if we use multistream with configAccessor, we need to follow the pattern for all current usage of configAccessor - call ZKUtil.isInstanceSetup check before calling configAccessor. It is not efficient as inside configAccessor, it is ZKUtil.isInstanceSetup called again, resulting in many duplicate ZK read.

If we change to using dataAccessor, we need to deprecate the prev API and create a new one.

Considering we need to get all instances config in the cluster, I think we could favor performance to code style. And also do code clean in follow up changes.

@zpinto
Copy link
Contributor Author

zpinto commented Apr 14, 2025

This PR is ready to be merged.

Commit Message:

This change will switch to parallel/async get on all instance configs, using HelixDataAccessor, and avoid calling findInstancesWithMatchingLogicalId for instance operation transitions where this check is not required.

@xyuanlu xyuanlu merged commit f87d948 into apache:master Apr 15, 2025
2 checks passed
GrantPSpencer pushed a commit to linkedin/helix that referenced this pull request Apr 22, 2025
This change will switch to parallel/async get on all instance configs, using HelixDataAccessor, and avoid calling findInstancesWithMatchingLogicalId for instance operation transitions where this check is not required.
GrantPSpencer pushed a commit to linkedin/helix that referenced this pull request Apr 22, 2025
This change will switch to parallel/async get on all instance configs, using HelixDataAccessor, and avoid calling findInstancesWithMatchingLogicalId for instance operation transitions where this check is not required.
GrantPSpencer pushed a commit to GrantPSpencer/helix that referenced this pull request Apr 22, 2025
This change will switch to parallel/async get on all instance configs, using HelixDataAccessor, and avoid calling findInstancesWithMatchingLogicalId for instance operation transitions where this check is not required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants