Skip to content

MDEV-18983 Port rpl_semi_sync_master_wait_for_slave_count from MySQL #4037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ParadoxV5
Copy link
Contributor

@ParadoxV5 ParadoxV5 commented May 7, 2025

  • The Jira issue number for this PR is: MDEV-18983

Description

rpl_semi_sync_master_wait_for_slave_count is a long-requested variable specifying the minimum number of acknowledging replicas before a semi-synchronous transaction can complete on the primary.
The default count is 1 to match the default behavior of “complete when any one replica acknowledges”.
When this matches the number of replicas, it requires all replicas to acknowledge, and we have full-synchronous replication at home.

Consequently, its purpose also leads it to indirectly control the number of replicas required to keep semi-synchronous replication active when rpl_semi_sync_master_wait_no_slave is set to disable immediately (i.e. auto-revert to asynchronous).

Release Notes

TODO
Include any changed system variables, status variables or behaviour. Optionally list any https://mariadb.com/kb/ pages that need changing.

How can this PR be tested?

TODO: Basic, Grant and Functional tests for the new variable

PR quality check

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.
  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@ParadoxV5 ParadoxV5 added MariaDB Corporation Replication Patches involved in replication labels May 7, 2025
ParadoxV5 added 3 commits May 12, 2025 18:52
* Create reüsable functions
  `is_no_slave()` & `Active_tranx::get_tranx_node()`
* Replace `Active_tranx::is_thd_waiter()`
  with equivalent method `is_tranx_end_pos()`
fix_rpl_semi_sync_master_wait_for_slave_count
@ParadoxV5 ParadoxV5 requested a review from bnestere May 13, 2025 19:15
Copy link
Contributor Author

@ParadoxV5 ParadoxV5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature should be implemented. While it’s ready for review besides testing, I would prefer we first refactor our code to solve the hindrances I encountered.

Comment on lines 549 to +551
DBUG_ASSERT(rpl_semi_sync_master_clients > 0);
if (!(--rpl_semi_sync_master_clients) && !rpl_semi_sync_master_wait_no_slave)
--rpl_semi_sync_master_clients;
if (is_no_slave())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -68,6 +69,14 @@ static ulonglong timespec_to_usec(const struct timespec *ts)
return (ulonglong) ts->tv_sec * TIME_MILLION + ts->tv_nsec / TIME_THOUSAND;
}

/** @return Should we revert to async because there not enough slaves? */
static bool is_no_slave()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note from #3931:

The code for “automatic revert to async” is not an active function, but rather passive behaviors of various functions when the conditions are met.

Comment on lines +696 to +704
entry= m_active_tranxs->get_tranx_node(log_file_name, log_file_pos);
if (entry && ++(entry->acks) >= rpl_semi_sync_master_wait_for_slave_count)
{
/* Remove all active transaction nodes before this point. */
m_active_tranxs->clear_active_tranx_nodes(log_file_name, log_file_pos,
signal_waiting_transaction);
if (m_active_tranxs->is_empty())
m_wait_file_name_inited= false;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report_reply_binlog() is somehow responsible for three distinct subfeatures:

  1. turn Semi-Sync back on if it is not, probably in response to a late ACK post-timeout
    • Is this even the correct place for this sub-“feature”?
      rpl_semi_sync_master_wait_for_slave_count doesn’t even care about it.
  2. update Semi-Sync’s ‘current’ position
    • dump_start() also calls report_reply_binlog(), yet I believe it’s only needs this subfeature.
      The other two subfeatures require this entry && condition to defend against this call.
  3. clear_active_tranx_nodes(): flush and clear transactions (plural?) before and including the specified position to allow them to complete

Tranx_node *Active_tranx::find_acked_tranx_node()
{
Tranx_node *new_front;
for (Tranx_node *entry= m_trx_front; entry; entry= entry->next)
Copy link
Contributor Author

@ParadoxV5 ParadoxV5 May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repl_semi_sync_master::refresh_wait_for_slave_count(uint32 server_id) calls this to iteratively find the transaction(s) that can now pass with a lowered rpl_semi_sync_master_wait_for_slave_count.
Then it calls
report_reply_binlog() which includes clear_active_tranx_nodes() which iteratively finds those transactions again.
Yes, both iterations are linear.

While the infrequently-used clear_active_tranx_nodes() can use lambdas to merge with this iteration, its call is mixed within report_reply_binlog().
This, along with is_no_slave() & Active_tranx::get_tranx_node(), and how implementing fix_rpl_semi_sync_master_wait_for_slave_count requires new methods in two layers, are signs that Semi-Sync is due for refactoring.

Copy link
Contributor Author

@ParadoxV5 ParadoxV5 May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Active_tranx is a naturally-ordered, block-allocating, hashtable with linked list, specialized for managing … its elements.

P.S. It uses neither C++ std::set nor our in-house HASH.

Comment on lines +3829 to +3833
static bool fix_rpl_semi_sync_master_wait_for_slave_count
(sys_var *self, THD *thd, enum_var_type type)
{
mysql_mutex_unlock(&LOCK_global_system_variables);
repl_semisync_master.refresh_wait_for_slave_count(thd->variables.server_id);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the refresh is only required when the requirement decreases.
However, it seems that the Sys.Var. system has a limitation that the variable must have an ‘old’ duplicate that an ON_UPDATE callback can compare to tell whether it was decreased.

Intriguingly, other Semi-Sync Primary variables are duplicated and updated through ON_UPDATE even though they don’t have special handling.
This is likely a remnant of Semi-Sync’s past as a plugin.

Comment on lines +3899 to +3902
"The number of slaves that need to acknowledge that they have received "
"a transaction before the transaction can complete on the master",
GLOBAL_VAR(rpl_semi_sync_master_wait_for_slave_count),
CMD_LINE(REQUIRED_ARG), VALID_RANGE(1, 0xFFFF),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches MySQL

@bnestere
Copy link
Contributor

Hi @ParadoxV5 !

Thanks for submitting the PR! A couple quick questions:

  1. Why did you not include any tests? Regardless of whether or not we refactor, we'd need MTR tests for the feature (which wouldn't change whether or not a refactor is included).

  2. Do you have a specific refactor design in mind? I agree the semi-sync code-base is spaghetti-adjacent, and I'm open to improvements.

@ParadoxV5
Copy link
Contributor Author

Hi @bnestere,

  1. Why did you not include any tests? Regardless of whether or not we refactor, we'd need MTR tests for the feature (which wouldn't change whether or not a refactor is included).

It is simply because I have not begun writing them. I wanted to share those thoughts with the team early.

  1. Do you have a specific refactor design in mind? I agree the semi-sync code-base is spaghetti-adjacent, and I'm open to improvements.

Besides improving the inconveniences I had, I don’t have exact designs in mind, especially when also considering adjacent topics such as proofing for #3931 and coherence with async.

@bnestere
Copy link
Contributor

It is simply because I have not begun writing them. I wanted to share those thoughts with the team early.

Ah ok. Agreed that early feedback is ideal :) Overall, I'd say your patch aligns with what I had in mind for the feature, though I didn't do a detailed review. To the general early-feedback process, if you're just looking for design-level feedback (which it seems you are doing here), it should be ok to just write-up/sketch out the idea of the implementation (preferably before starting implementation, as to save time in-case the design is changed). Then later on when you share code, I'd say to at least provide a minimal test to show the code works (even for early feedback).

Besides improving the inconveniences I had, I don’t have exact designs in mind, especially when also considering adjacent topics such as proofing for #3931 and coherence with async.

Ok. My thoughts on the current semi-sync implementation aren't that the implementation is necessarily wrong, just that it is hard to understand the details of what is happening, and when. I wonder if this doesn't really even need that large of a refactoring, but perhaps just some small encapsulations, similar to what you did in this patch, to improve readability. Or at least, it would be a start (as you've already started doing :)). Perhaps if there is time left-over at the end of the sprint, we can address it.

@ParadoxV5
Copy link
Contributor Author

… just that it is hard to understand the details of what is happening, and when.

The “what” is straightforward to encapsulate, but the “when”… 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MariaDB Corporation Replication Patches involved in replication
Development

Successfully merging this pull request may close these issues.

2 participants