tests: parametrize bench mark tests #4974

cm-iwata · 2024-12-30T06:43:08Z

In the previous implementation, it was necessary to adjust the timeout value every time a benchmark test added.
By parametrizing the benchmark tests, the time required for each test becomes predictable, eliminating the need to adjust the timeout value

Changes

Parametrize the test by the list of criterion benchmarks.

By parametrizing the tests, git clone will executed for each parameter in here.

firecracker/tests/framework/ab_test.py

Lines 85 to 98 in 70ac154

    
           with TemporaryDirectory() as tmp_dir: 
        
               dir_a = git_clone(Path(tmp_dir) / a_revision, a_revision) 
        
               result_a = test_runner(dir_a, True) 
        
               if b_revision: 
        
                   dir_b = git_clone(Path(tmp_dir) / b_revision, b_revision) 
        
               else: 
        
                   # By default, pytest execution happens inside the `tests` subdirectory. Pass the repository root, as 
        
                   # documented. 
        
                   dir_b = Path.cwd().parent 
        
               result_b = test_runner(dir_b, False) 
        
               comparison = comparator(result_a, result_b) 
        
               return result_a, result_b, comparison

To run all parametrized tests with single git close would require major revisions to git_ab_test, so this PR does not address that issue.

Reason

close #4832

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

I have read and understand CONTRIBUTING.md.
I have run tools/devtool checkstyle to verify that the PR passes the
automated style checks.
I have described what is done in these changes, why they are needed, and
how they are solving the problem in a clear and encompassing way.
I have updated any relevant documentation (both in code and in the docs)
in the PR.
I have mentioned all user-facing changes in CHANGELOG.md.
If a specific issue led to this PR, this PR closes the issue.
When making API changes, I have followed the
Runbook for Firecracker API changes.
I have tested all new and changed functionalities in unit tests and/or
integration tests.
I have linked an issue to every new TODO.

This functionality cannot be added in rust-vmm.

In the previous implementation, it was necessary to adjust the timeout value every time a benchmark test added. By parametrizing the benchmark tests, the time required for each test becomes predictable, eliminating the need to adjust the timeout value Signed-off-by: Tomoya Iwata <[email protected]>

codecov · 2025-01-08T09:32:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.19%. Comparing base (ae078ee) to head (cfbd0e9).

❗ Current head cfbd0e9 differs from pull request most recent head d1226ab

Please upload reports for the commit d1226ab to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4974      +/-   ##
==========================================
+ Coverage   83.01%   83.19%   +0.17%     
==========================================
  Files         250      247       -3     
  Lines       26897    26641     -256     
==========================================
- Hits        22328    22163     -165     
+ Misses       4569     4478      -91

Flag	Coverage Δ
5.10-c5n.metal	`83.67% <ø> (+0.10%)`	⬆️
5.10-m5n.metal	`83.66% <ø> (+0.09%)`	⬆️
5.10-m6a.metal	`82.86% <ø> (+0.07%)`	⬆️
5.10-m6g.metal	`79.66% <ø> (+0.32%)`	⬆️
5.10-m6i.metal	`83.64% <ø> (+0.09%)`	⬆️
5.10-m7g.metal	`79.66% <ø> (+0.32%)`	⬆️
6.1-c5n.metal	`83.67% <ø> (+0.06%)`	⬆️
6.1-m5n.metal	`83.66% <ø> (+0.05%)`	⬆️
6.1-m6a.metal	`82.86% <ø> (+0.02%)`	⬆️
6.1-m6g.metal	`?`
6.1-m6i.metal	`83.64% <ø> (+0.04%)`	⬆️
6.1-m7g.metal	`79.66% <ø> (+0.32%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pb8o · 2024-12-30T10:36:47Z

tests/integration_tests/performance/test_benchmarks.py

+    )
+
+    executables = []
+    for line in stdout.split("\n"):


nit: could be stdout.splitlines()

fix in b95900e

And I realized there are mamy same implementation.

https://github.com/search?q=repo%3Afirecracker-microvm%2Ffirecracker%20split(%22%5Cn%22)&type=code

roypat

I think like this we're no longer doing an A/B-test, we're just benchmarking the same binary compiled from the PR branch twice (e.g. comparing the PR results to themselves)

roypat · 2025-01-08T13:55:32Z

tests/integration_tests/performance/test_benchmarks.py

 @pytest.mark.no_block_pr
-@pytest.mark.timeout(900)
-def test_no_regression_relative_to_target_branch():
+@pytest.mark.timeout(600)


from the buildkite run, it seems like the longest duration of one of these is 150s for the queue benchmarks, so I think we can actually drop this timeout marker altogether and just rely on the default timeout specified in pytest.ini (which is 300s)

fix in 215af23

When I verified this, I found that the first execution would require more than 300 seconds of overhead, so 300 seconds is too short.

root@94c9107981fb:/firecracker/tests# pytest "integration_tests/performance/test_benchmarks.py" -m no_block_pr --durations=0 -v ================================================================================================== test session starts =================================================================================================== platform linux -- Python 3.12.3, pytest-8.3.5, pluggy-1.5.0 -- /opt/venv/bin/python cachedir: ../build/pytest_cache metadata: {'Python': '3.12.3', 'Platform': 'Linux-6.8.0-51-generic-x86_64-with-glibc2.39', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'metadata': '3.1.1', 'rerunfailures': '14.0', 'timeout': '2.3.1', 'xdist': '3.6.1', 'json-report': '1.5.0'}} EC2 AMI: NA rootdir: /firecracker/tests configfile: pytest.ini plugins: metadata-3.1.1, rerunfailures-14.0, timeout-2.3.1, xdist-3.6.1, json-report-1.5.0 timeout: 300.0s timeout method: signal timeout func_only: False collected 8 items integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[serialize_cpu_template] PASSED [ 12%] integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[page_fault] PASSED [ 25%] integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_16] PASSED [ 37%] integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_pop_16] PASSED [ 50%] integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_256] PASSED [ 62%] integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[next_descriptor_16] PASSED [ 75%] integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[request_parse] PASSED [ 87%] integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[deserialize_cpu_template] PASSED [100%] ------------------------------------------------------------------------------------------------------ JSON report ------------------------------------------------------------------------------------------------------- report saved to: ../test_results/test-report.json =================================================================================================== slowest durations ==================================================================================================== 315.52s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[serialize_cpu_template] 40.62s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[page_fault] 39.61s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_256] 39.57s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[request_parse] 37.05s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_pop_16] 36.06s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[next_descriptor_16] 34.07s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_16] 22.73s call integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[deserialize_cpu_template] 1.63s setup integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[serialize_cpu_template] 0.15s teardown integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[deserialize_cpu_template] 0.00s setup integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[next_descriptor_16] 0.00s setup integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_pop_16] 0.00s setup integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[deserialize_cpu_template] 0.00s setup integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_256] 0.00s setup integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[page_fault] 0.00s teardown integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_256] 0.00s setup integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[request_parse] 0.00s teardown integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_16] 0.00s setup integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_add_used_16] 0.00s teardown integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[serialize_cpu_template] 0.00s teardown integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[queue_pop_16] 0.00s teardown integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[next_descriptor_16] 0.00s teardown integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[request_parse] 0.00s teardown integration_tests/performance/test_benchmarks.py::TestBenchMarks::test_no_regression_relative_to_target_branch[page_fault]

roypat · 2025-01-08T14:00:01Z

tests/integration_tests/performance/test_benchmarks.py

+    _, stdout, _ = cargo(
+        "bench",
+        f"--all --quiet --target {platform.machine()}-unknown-linux-musl --message-format json --no-run",
+    )


Mhh, I don't think this does what we want. We precompile the executables ones (from the PR branch), and then we use this precompiled executable for both A and B runs. What need to do though is compile each benchmark twice, ones from the main branch and once from the PR branch, so that this test does a meaningful comparison :/ That's why in #4832 I suggested to use --list-only or something: determine the names of the benchmarks here, and then compile them twice in _run_criterion

Sorry, I think I misunderstood a bit how it works.

Let me confirm the modifications.
First, run cargo bench --all -- --list to generate parameters to pass to pytest.parametrize.
I will get the following output:

root@90de30508db0:/firecracker# cargo bench --all -- --list Finished `bench` profile [optimized] target(s) in 0.10s Running benches/block_request.rs (build/cargo_target/release/deps/block_request-2e4b90407b22a8d0) request_parse: benchmark Running benches/cpu_templates.rs (build/cargo_target/release/deps/cpu_templates-cd18fd51dbad16f4) Deserialization test - Template size (JSON string): [2380] bytes. Serialization test - Template size: [72] bytes. deserialize_cpu_template: benchmark serialize_cpu_template: benchmark Running benches/memory_access.rs (build/cargo_target/release/deps/memory_access-741f97a7c9c33391) page_fault: benchmark page_fault #2: benchmark Running benches/queue.rs (build/cargo_target/release/deps/queue-b2dfffbab00c4157) next_descriptor_16: benchmark queue_pop_16: benchmark queue_add_used_16: benchmark queue_add_used_256: benchmark

And then, I will get benchmark name...for example queue_pop_16.

Finally run a command like cargo bench --all -- queue_pop_16 in _run_criterion.
Is this correct?

Yes, that's pretty much it! The main point is that the compilation of the benchmarks needs to happen in _run_criterion, because we actually have to compile them twice, once for the pull request target, and once for the pull request head.

fix in d42d39f

Since it would be very slow if git clone and build executable for each parameter, so I adjusted fixture so that git clone is executed only once.
033ca8f

use `splitlines()` instead of `split("\n")`. Signed-off-by: Tomoya Iwata <[email protected]>

No longer need to set individual timeout values, Because parameterized performance tests. Signed-off-by: Tomoya Iwata <[email protected]>

In the previous implementation, same binary that built in the PR branch execute twice, which was not a correct A/B test. This has been fixed. Signed-off-by: Tomoya Iwata <[email protected]>

In the previous implementation, git clone executed for each parameter of the parametize test. This has a large overhead, adjusted it so that fixtures only called once per class. Signed-off-by: Tomoya Iwata <[email protected]>

Added `exact` option to cargo bench for avoid running deserialize_cpu_template when specifying serialize_cpu_template Signed-off-by: Tomoya Iwata <[email protected]>

I parametrized the benchmark test, but it still took more than 300 seconds to run the first time, so I adjusted the timeout value. Signed-off-by: Tomoya Iwata <[email protected]>

pb8o added the python Pull requests that update Python code label Jan 8, 2025

pb8o previously approved these changes Jan 8, 2025

View reviewed changes

roypat requested changes Jan 8, 2025

View reviewed changes

cm-iwata added 3 commits January 12, 2025 16:46

Merge branch 'main' into tests/parametrize_benchmark

f9ec357

tests: use splitlines

b95900e

use `splitlines()` instead of `split("\n")`. Signed-off-by: Tomoya Iwata <[email protected]>

tests: delete specific timeout parameter for performance test

215af23

No longer need to set individual timeout values, Because parameterized performance tests. Signed-off-by: Tomoya Iwata <[email protected]>

cm-iwata dismissed pb8o’s stale review via 215af23 January 12, 2025 08:50

cm-iwata and others added 5 commits January 17, 2025 09:59

Merge branch 'main' into tests/parametrize_benchmark

a35b63e

tests: build and run benchmark tests in each branch

d42d39f

In the previous implementation, same binary that built in the PR branch execute twice, which was not a correct A/B test. This has been fixed. Signed-off-by: Tomoya Iwata <[email protected]>

tests: optimize a/b test fixture

033ca8f

In the previous implementation, git clone executed for each parameter of the parametize test. This has a large overhead, adjusted it so that fixtures only called once per class. Signed-off-by: Tomoya Iwata <[email protected]>

Merge branch 'main' into tests/parametrize_benchmark

a10045b

Merge branch 'main' into tests/parametrize_benchmark

cfbd0e9

cm-iwata mentioned this pull request Apr 10, 2025

Parametrize test_benchmarks.py test by criterion benchmarks #4832

Open

cm-iwata added 3 commits April 27, 2025 15:26

Merge branch 'main' into tests/parametrize_benchmark

d701b1b

tests: adjust criterion option

50d6707

Added `exact` option to cargo bench for avoid running deserialize_cpu_template when specifying serialize_cpu_template Signed-off-by: Tomoya Iwata <[email protected]>

test: extend timeout threshold for benchmark

d1226ab

I parametrized the benchmark test, but it still took more than 300 seconds to run the first time, so I adjusted the timeout value. Signed-off-by: Tomoya Iwata <[email protected]>

cm-iwata requested a review from roypat April 27, 2025 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: parametrize bench mark tests #4974

tests: parametrize bench mark tests #4974

cm-iwata commented Dec 30, 2024 •

edited

Loading

codecov bot commented Jan 8, 2025 •

edited

Loading

pb8o Dec 30, 2024

cm-iwata Jan 12, 2025

roypat left a comment

roypat Jan 8, 2025

cm-iwata Jan 12, 2025

cm-iwata Apr 27, 2025

roypat Jan 8, 2025

cm-iwata Jan 12, 2025

roypat Jan 16, 2025

cm-iwata Jan 21, 2025

	with TemporaryDirectory() as tmp_dir:
	dir_a = git_clone(Path(tmp_dir) / a_revision, a_revision)
	result_a = test_runner(dir_a, True)

	if b_revision:
	dir_b = git_clone(Path(tmp_dir) / b_revision, b_revision)
	else:
	# By default, pytest execution happens inside the `tests` subdirectory. Pass the repository root, as
	# documented.
	dir_b = Path.cwd().parent
	result_b = test_runner(dir_b, False)

	comparison = comparator(result_a, result_b)
	return result_a, result_b, comparison

tests: parametrize bench mark tests #4974

Are you sure you want to change the base?

tests: parametrize bench mark tests #4974

Conversation

cm-iwata commented Dec 30, 2024 • edited Loading

Changes

Reason

License Acceptance

PR Checklist

codecov bot commented Jan 8, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roypat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cm-iwata commented Dec 30, 2024 •

edited

Loading

codecov bot commented Jan 8, 2025 •

edited

Loading