Skip to content

MDEV-36760, MDEV-36781, MDEV-36759: Fix performance regressions after MDEV-29445#4042

Merged
sanja-byelkin merged 6 commits into10.11from
10.11-MDEV-36759
May 21, 2025
Merged

MDEV-36760, MDEV-36781, MDEV-36759: Fix performance regressions after MDEV-29445#4042
sanja-byelkin merged 6 commits into10.11from
10.11-MDEV-36759

Conversation

@dr-m
Copy link
Copy Markdown
Contributor

@dr-m dr-m commented May 13, 2025

  • The Jira issue number for this PR is: MDEV-36760, MDEV-36781, MDEV-36759

Description

Fix several performance regressions due to #3107. See the individual commits.

Release Notes

Several performance regressions were fixed.

How can this PR be tested?

For an example, see MDEV-36759.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@dr-m dr-m self-assigned this May 13, 2025
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 13, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ bnestere
❌ dr-m
You have signed the CLA already but the status is still pending? Let us recheck it.

@dr-m dr-m force-pushed the 10.11-MDEV-36759 branch from fcda8b5 to 94838cb Compare May 13, 2025 09:25
dr-m and others added 6 commits May 13, 2025 12:27
There were two issues with the test:

 1. A race between a race_condition.inc and status variable, where the
    status variable Rpl_semi_sync_master_status could be ON before the
    semi-sync connection finished establishing, resulting in
    Rpl_semi_sync_master_clients showing 0 (instead of 1). To fix this,
    we simply instead wait for Rpl_semi_sync_master_clients to be 1
    before proceeding.

 2. Another race between a race_condition.inc and status variable,
    where the wait_condition waited on a process_list command of
    'BINLOG DUMP' to disappear to infer the binlog dump thread was
    killed, to where we then verified semi-sync state was correct
    using status variables. However, the 'BINLOG DUMP' command is
    overridden with a killed status before the semi-sync tear-down
    happens, and thereby we could see invalid values. The fix for
    this is to change the wait_condition to instead wait for the
    connection with the replication user is gone, because that stays
    through the binlog dump thread tear-down life-cycle
log_t::append_prepare_wait(): Do not attempt to read log_sys.write_lsn
because it is not protected by log_sys.latch but by write_lock, which
we cannot hold here. The assertion could fail if log_t::write_buf()
is executing concurrently, and it has not yet executed log_write_buf()
or updated log_sys.write_lsn.

Fixes up commit acd071f (MDEV-21923)
In commit b692342 (MDEV-29445)
we started to specify the MAP_POPULATE flag for allocating the
InnoDB buffer pool. This would cause a lot of time to be spent
on __mm_populate() inside the Linux kernel, such as 16 seconds
to pre-fault or commit innodb_buffer_pool_size=64G.

Let us revert to the previous way of allocating the buffer pool
at startup. Note: An attempt to increase the buffer pool size by
SET GLOBAL innodb_buffer_pool_size (up to innodb_buffer_pool_size_max)
will invoke my_virtual_mem_commit(), which will use MAP_POPULATE
to zero-fill and prefault the requested additional memory area, blocking
buf_pool.mutex.

Before MDEV-29445 we allocated the InnoDB buffer pool by invoking
mmap(2) once (via my_large_malloc()). After the change, we would
invoke mmap(2) twice, first via my_virtual_mem_reserve() and then
via my_virtual_mem_commit(). Outside Microsoft Windows, we are
reverting back to my_large_malloc() like allocation.

my_virtual_mem_reserve(): Define only for Microsoft Windows.
Other platforms should invoke my_large_virtual_alloc() and
update_malloc_size() instead of my_virtual_mem_reserve() and
my_virtual_mem_commit().

my_large_virtual_alloc(): Define only outside Microsoft Windows.
Do not specify MAP_NORESERVE nor MAP_POPULATE, to preserve compatibility
with my_large_malloc(). Were MAP_POPULATE specified, the mmap()
system call would be significantly slower, for example 18 seconds
to reserve 64 GiB upfront.
buf_buddy_shrink(): Properly cover the case when KEY_BLOCK_SIZE
corresponds to the innodb_page_size, that is, the ROW_FORMAT=COMPRESSED
page frame is directly allocated from the buffer pool, not via the
binary buddy allocator.

buf_LRU_check_size_of_non_data_objects(): Avoid a crash when the
buffer pool is being shrunk.

buf_pool_t::shrink(): Abort if over 95% of the shrunk buffer pool
would be occupied by the adaptive hash index or record locks.
In commit b692342 (MDEV-29445)
some hash tables were accidentally created with the minimum size
(101 entries) instead of correctly deriving the size from the
initial innodb_buffer_pool_size. This led to very long hash bucket
chains, which are very slow to traverse.

ut_find_prime(): Assert that the size is nonzero in order to catch
this type of regression in the future.

innodb_init_params(): Do not bother reading buf_pool.curr_size()
when it is known to be 0,

srv_start(): Correctly initialize srv_lock_table_size to 5 times
buf_pool.curr_size(), that is, the buffer pool size in pages,
between invoking buf_pool.create() and lock_sys.create().

btr_search_enable(), dict_sys_t::create(), dict_sys_t::resize():
Correctly refer to buf_pool.curr_pool_size(), that is,
innodb_buffer_pool_size in bytes, when calculating the hash table size.
In MDEV-29445 the expressions buf_pool_get_curr_size() were
accidentally replaced with buf_pool.curr_size().
@dr-m dr-m force-pushed the 10.11-MDEV-36759 branch from 94838cb to 8fb0942 Compare May 13, 2025 09:30
Copy link
Copy Markdown
Contributor

@mariadb-DebarunBanerjee mariadb-DebarunBanerjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am done with the review for following 6 commits included.

  1. MDEV-36663: Testcase Fixup : Ok, trivial patch.
  2. MDEV-36760 log_t::append_prepare_wait(): Ok, I reviewed in pull/4038
  3. MDEV-36780: InnoDB buffer pool reserves all assigned memory:Ok but this could impact and need testing to ensure that we are back to the previous level.
  4. MDEV-36781: Assertion i < BUF_BUDDY_SIZES : Ok, Trivial debug specific
  5. MDEV-36759: Huge performance drop : This seems to be fixing the real regression. Great catch and the fix looks correct.

grooverdan referenced this pull request in MariaDB/mariadb-docker May 14, 2025
@sanja-byelkin sanja-byelkin merged commit b8d3257 into 10.11 May 21, 2025
13 of 14 checks passed
@dr-m dr-m deleted the 10.11-MDEV-36759 branch May 23, 2025 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants