MDEV-36760, MDEV-36781, MDEV-36759: Fix performance regressions after MDEV-29445 by dr-m · Pull Request #4042 · MariaDB/server

dr-m · 2025-05-13T09:12:25Z

The Jira issue number for this PR is: MDEV-36760, MDEV-36781, MDEV-36759

Description

Fix several performance regressions due to #3107. See the individual commits.

Release Notes

Several performance regressions were fixed.

How can this PR be tested?

For an example, see MDEV-36759.

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

CLAassistant · 2025-05-13T09:12:32Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ bnestere
❌ dr-m
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

There were two issues with the test: 1. A race between a race_condition.inc and status variable, where the status variable Rpl_semi_sync_master_status could be ON before the semi-sync connection finished establishing, resulting in Rpl_semi_sync_master_clients showing 0 (instead of 1). To fix this, we simply instead wait for Rpl_semi_sync_master_clients to be 1 before proceeding. 2. Another race between a race_condition.inc and status variable, where the wait_condition waited on a process_list command of 'BINLOG DUMP' to disappear to infer the binlog dump thread was killed, to where we then verified semi-sync state was correct using status variables. However, the 'BINLOG DUMP' command is overridden with a killed status before the semi-sync tear-down happens, and thereby we could see invalid values. The fix for this is to change the wait_condition to instead wait for the connection with the replication user is gone, because that stays through the binlog dump thread tear-down life-cycle

log_t::append_prepare_wait(): Do not attempt to read log_sys.write_lsn because it is not protected by log_sys.latch but by write_lock, which we cannot hold here. The assertion could fail if log_t::write_buf() is executing concurrently, and it has not yet executed log_write_buf() or updated log_sys.write_lsn. Fixes up commit acd071f (MDEV-21923)

In commit b692342 (MDEV-29445) we started to specify the MAP_POPULATE flag for allocating the InnoDB buffer pool. This would cause a lot of time to be spent on __mm_populate() inside the Linux kernel, such as 16 seconds to pre-fault or commit innodb_buffer_pool_size=64G. Let us revert to the previous way of allocating the buffer pool at startup. Note: An attempt to increase the buffer pool size by SET GLOBAL innodb_buffer_pool_size (up to innodb_buffer_pool_size_max) will invoke my_virtual_mem_commit(), which will use MAP_POPULATE to zero-fill and prefault the requested additional memory area, blocking buf_pool.mutex. Before MDEV-29445 we allocated the InnoDB buffer pool by invoking mmap(2) once (via my_large_malloc()). After the change, we would invoke mmap(2) twice, first via my_virtual_mem_reserve() and then via my_virtual_mem_commit(). Outside Microsoft Windows, we are reverting back to my_large_malloc() like allocation. my_virtual_mem_reserve(): Define only for Microsoft Windows. Other platforms should invoke my_large_virtual_alloc() and update_malloc_size() instead of my_virtual_mem_reserve() and my_virtual_mem_commit(). my_large_virtual_alloc(): Define only outside Microsoft Windows. Do not specify MAP_NORESERVE nor MAP_POPULATE, to preserve compatibility with my_large_malloc(). Were MAP_POPULATE specified, the mmap() system call would be significantly slower, for example 18 seconds to reserve 64 GiB upfront.

buf_buddy_shrink(): Properly cover the case when KEY_BLOCK_SIZE corresponds to the innodb_page_size, that is, the ROW_FORMAT=COMPRESSED page frame is directly allocated from the buffer pool, not via the binary buddy allocator. buf_LRU_check_size_of_non_data_objects(): Avoid a crash when the buffer pool is being shrunk. buf_pool_t::shrink(): Abort if over 95% of the shrunk buffer pool would be occupied by the adaptive hash index or record locks.

In commit b692342 (MDEV-29445) some hash tables were accidentally created with the minimum size (101 entries) instead of correctly deriving the size from the initial innodb_buffer_pool_size. This led to very long hash bucket chains, which are very slow to traverse. ut_find_prime(): Assert that the size is nonzero in order to catch this type of regression in the future. innodb_init_params(): Do not bother reading buf_pool.curr_size() when it is known to be 0, srv_start(): Correctly initialize srv_lock_table_size to 5 times buf_pool.curr_size(), that is, the buffer pool size in pages, between invoking buf_pool.create() and lock_sys.create(). btr_search_enable(), dict_sys_t::create(), dict_sys_t::resize(): Correctly refer to buf_pool.curr_pool_size(), that is, innodb_buffer_pool_size in bytes, when calculating the hash table size. In MDEV-29445 the expressions buf_pool_get_curr_size() were accidentally replaced with buf_pool.curr_size().

mariadb-DebarunBanerjee

I am done with the review for following 6 commits included.

MDEV-36663: Testcase Fixup : Ok, trivial patch.
MDEV-36760 log_t::append_prepare_wait(): Ok, I reviewed in pull/4038
MDEV-36780: InnoDB buffer pool reserves all assigned memory:Ok but this could impact and need testing to ensure that we are back to the previous level.
MDEV-36781: Assertion i < BUF_BUDDY_SIZES : Ok, Trivial debug specific
MDEV-36759: Huge performance drop : This seems to be fixing the real regression. Great catch and the fix looks correct.

dr-m requested a review from mariadb-DebarunBanerjee May 13, 2025 09:12

dr-m self-assigned this May 13, 2025

dr-m force-pushed the 10.11-MDEV-36759 branch from fcda8b5 to 94838cb Compare May 13, 2025 09:25

dr-m and others added 6 commits May 13, 2025 12:27

bump the VERSION

791fcea

dr-m force-pushed the 10.11-MDEV-36759 branch from 94838cb to 8fb0942 Compare May 13, 2025 09:30

mariadb-DebarunBanerjee approved these changes May 13, 2025

View reviewed changes

svoj added the MariaDB Corporation label May 13, 2025

grooverdan referenced this pull request in MariaDB/mariadb-docker May 14, 2025

Stable 2025 Q1 updates

c566990

sanja-byelkin merged commit b8d3257 into 10.11 May 21, 2025
13 of 14 checks passed

dr-m deleted the 10.11-MDEV-36759 branch May 23, 2025 04:54

dr-m mentioned this pull request Mar 3, 2026

MDEV-38947 Reimplement SET GLOBAL innodb_buffer_pool_size #4729

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MDEV-36760, MDEV-36781, MDEV-36759: Fix performance regressions after MDEV-29445#4042

MDEV-36760, MDEV-36781, MDEV-36759: Fix performance regressions after MDEV-29445#4042
sanja-byelkin merged 6 commits into10.11from
10.11-MDEV-36759

dr-m commented May 13, 2025

Uh oh!

CLAassistant commented May 13, 2025 •

edited

Loading

Uh oh!

mariadb-DebarunBanerjee left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

dr-m commented May 13, 2025

Description

Release Notes

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

Uh oh!

CLAassistant commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mariadb-DebarunBanerjee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

6 participants

CLAassistant commented May 13, 2025 •

edited

Loading