-
Notifications
You must be signed in to change notification settings - Fork 3.7k
CASSANDRA-20581 Improved observability in AutoRepair to report both expected vs. actual repair bytes and expected vs. actual keyspaces #4126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
@@ -84,13 +86,22 @@ public abstract class AutoRepairState | |||
@VisibleForTesting | |||
protected int skippedTablesCount = 0; | |||
@VisibleForTesting | |||
protected long totalBytesToRepair = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future, we can extend by adding expected/actual partitions, token ranges, etc.
List<PrioritizedRepairPlan> repairPlans = PrioritizedRepairPlan.build(keyspacesAndTablesToRepair, repairType, shuffleFunc); | ||
List<PrioritizedRepairPlan> repairPlans = PrioritizedRepairPlan.build(keyspacesAndTablesToRepair, repairType, shuffleFunc, primaryRangeOnly); | ||
|
||
int keyspaceRepairPlansSofar = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is cumbersome to identify if a keyspace has been completely repaired or not, due to our priority-based allotment. In most cases, when priority is not set, one keyspace=one keyspaceRepairPlans. I agree that it is hard for end users to differentiate between keyspace vs. keyspaceRepairPlans if priority is set, but in the beginning, we can target accuracy for use cases/operators that have no priority set
@VisibleForTesting | ||
protected int totalKeyspaceRepairPlansToRepair = 0; | ||
@VisibleForTesting | ||
protected int keyspacesRepairPlansAlreaydRepaired = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sp: keyspacesRepairPlansAlreadyRepaired
@@ -101,6 +112,19 @@ protected RepairCoordinator getRepairRunnable(String keyspace, RepairOption opti | |||
options, keyspace); | |||
} | |||
|
|||
public void calcExpectedScheduleStats(List<PrioritizedRepairPlan> repairPlans) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: rename to something like calculateRepairPlanStatistics or updateScheduleStatistics to be more descriptive (feel free to ignore)
bffa9e2
to
fc6bd13
Compare
166174a
to
71ea40d
Compare
2322b33
to
13b1667
Compare
13c09a3
to
18a1e0a
Compare
@@ -1186,4 +1196,197 @@ public static Collection<Range<Token>> split(Range<Token> tokenRange, int number | |||
} | |||
return ranges; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the following APIs from RepairTokenRangeSplitter.java to here
* @param tableNames tables to repair for the given keyspace. | ||
* @return Single repair plan. | ||
*/ | ||
static List<PrioritizedRepairPlan> buildSingleKeyspacePlan(AutoRepairConfig.RepairType repairType, String keyspaceName, String ... tableNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this API because it was only being used in testing. Created a similar API under unit/*
2f35462
to
390a08b
Compare
…al repair bytes and expected vs. actual keyspaces
390a08b
to
dc743fd
Compare
NOTE: This is a draft PR intended to gather early feedback. Test cases, code structure, and other refinements are not yet finalized.
Overall, we want to improve the AutoRepair scheduler's observability. The whole work will be divided into multiple PRs, with this PR adding the following two capabilities:
With this PR, we can visualize something like


The Cassandra Jira