Update "Running large tests" article #992

imiric · 2023-01-19T15:42:15Z

This is an update to the "Running large tests" article, and closes #208. I'm creating the PR to get early feedback on the changes, but let's not merge it yet until the benchmarking is finished.

See the commits for details, but some of the major changes include:

Removal of the benchmark results section.
We plan to move all benchmark results to the k6-hardware-benchmark repository, so that we can automate running the tests and generating the report there. This article will contain only general suggestions, which won't need to be updated as often, and will also make it shorter and more readable.
Removal of the suggestion of --compatibility-mode=base. See 2ffca0a for explanation.

The default compatibility-mode (extended) in recent k6 versions only has minor overhead from Babel, since core.js was removed several versions ago. And as goja keeps improving, we're planning to drop Babel altogether, which will likely mean deprecating compatibility-mode. The end result of this is that currently in v0.42.0, compatibility-mode=base has very little benefit over extended. To the point where it's not worth the hassle of maintaining a build pipeline for most users. For these reasons it's best to just stop recommending its usage here.

The number of terminals is not relevant if using e.g. nmon, which can monitor CPU, RAM and network.

First explain the default k6 behavior, then introduce the option.

The "cloud" (the web buzzword) is not the same as "k6 Cloud" (the product).

In order to update this information easily, we will keep benchmark results in the k6-hardware-benchmark repository, and leave this article for suggestions only. This way we can avoid having to continously update this article, and it will make it more readable by being shorter. In the future we might also remove specifically mentioning EC2, as benchmarks will be run from GitHub Actions (Azure). Or we might want to expand this article and give broad suggestions for the largest cloud providers.

immavalls

Great updates @imiric, just added some minor suggestions. @MattDodsonEnglish should probably review as he is the k6 tech writing expert 😄

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md

immavalls · 2023-01-20T12:56:57Z

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md


-### Testing for RPS
+We maintain a [repository](https://github.com/grafana/k6-hardware-benchmark) with some scripts used to benchmark k6 and create reports. These tests are run for every new k6 version, and you can see the results in the [`results/` directory](https://github.com/grafana/k6-hardware-benchmark/tree/master/results).


The URL https://github.com/grafana/k6-hardware-benchmark/tree/master/results does not work. I find that we have something for v0.42.0 in https://github.com/grafana/k6-hardware-benchmark/tree/update/k6-v0.42.0/results/v0.42.0, but not in the master branch?

Nice catch, but like I mentioned in the PR description, we shouldn't merge this before the benchmarking is finished. That will happen after the results are added to the k6-hardware-benchmark repo, so this link will eventually work.

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md

MattDodsonEnglish · 2023-01-20T13:40:49Z

A rendered version in staging is here:

https://mdr-ci.staging.k6.io/docs/refs/pull/992/merge/testing-guides/running-large-tests/

FYI, I'm going to only read it and see if I have any structural changes. I'll wait to do copy edits till the final version, since that's the most obvious and mechanical work.

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md

MattDodsonEnglish · 2023-01-23T12:43:07Z

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md

-Tests that are using file uploads can consume tens of megabytes per VU.
+Tests that are using file uploads, or load large JS modules, can consume tens of megabytes per VU.
+Keep in mind that each VU has a copy of all JS modules your test uses.
+If you need to share memory between VUs, consider using [SharedArray](/javascript-api/k6-data/sharedarray/), or an external data store, such as [Redis](/javascript-api/k6-experimental/redis/).

 ## General advice

 ### Make your test code resilient

 When running large stress tests, your script can't assume anything about the HTTP response.
 Often performance tests are written with a "happy path" in mind.


I'd seen this "Happy path" term twice now, and I only just realized that it's an industry term. I think we should define it. Now I'm not the brightest, but even if it's a common term in CS, not all of our readers will come from the same language and educational backgrounds.

MattDodsonEnglish · 2023-01-23T12:47:30Z

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md


 ## General advice

 ### Make your test code resilient

 When running large stress tests, your script can't assume anything about the HTTP response.
 Often performance tests are written with a "happy path" in mind.
-For example, a "happy path" check like the one below is something that we see in k6 often.
+For example, a "happy path" check like the one below is something that we often see in k6 scripts.


Now that I understand what "Happy path" is, is this really a happy path check? It seems like it does not assume the happy path, since it accepts the possibility of an incorrect body size.

Instead, it seems like it's more a case of error handling that's not general enough.

Proposed change:

Even if scripts accept the SUT won't always respond according to the happy path, it still might not handle a sufficiently wide range of errors. For example, the following check is something we often see in k6 scripts:

It is a happy path check, since it doesn't take into account failure conditions under load, when the response body wouldn't exist. The body size check is a standard assertion, and the user might want the test to fail if it's incorrect.

Your suggestion sounds more confusing, sorry, so I'd rather leave this as is.

It is a happy path check, since it doesn't take into account failure conditions under load, when the response body wouldn't exist. The body size check is a standard assertion, and the user might want the test to fail if it's incorrect.

I don't understand. Isn't an incorrect body size another indication of system failure? Or are we defining failure as "no body response at all?". From the Wikipedia page, I thought a happy path is where there are no error conditions.

An incorrect body size would in this case fail the check in an expected way. I.e. the test would just fail. This is not an exceptional condition.

The response body not existing is an exceptional condition the script is not handling, therefore it's only testing the "happy path".

Trust me. This is fine :)

MattDodsonEnglish · 2023-01-23T12:50:25Z

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md

-The issue here is that the check assumes that there's always a body in a response. The `r.body` may not exist if server is failing.
-In such case, the check itself won't work as expected and error similar to the one below will be returned:
+The issue here is that the check assumes that there's always a body in a response. The `r.body` may not exist if the server is failing.
+In such case, the check itself won't work as expected and an error similar to the one below will be returned:


Shouldn't we add an example of how to handle this exception? Right now it seems we diagnose a problem, but don't give any way to find a solution.

Oh, I realize now that this is in the text, just not in the diffs. Still, it looks like the example only fails a check if there is a body to begin with.

const checkRes = check(res, { 'Homepage body size is 11026 bytes': (r) => r.body && r.body.length === 11026, });

Wouldn't it make more sense to have two checks? One that checks that a body is returned and other that, when returned, it's the correct length?

Wouldn't it make more sense to have two checks?

You could do that, but you'd then have to add a dependency between the body length check and the body existence check, which would complicate the check functions. It's much simpler to use && here to avoid the TypeError.

Sorry, I'm think I confused myself about the program behavior yesterday.

But, shouldn't the check have a different name?

- Homepage body size is 11026 bytes + Body exists and is 11026 bytes

Matt, I think you're nitpicking at this point. It's fine as is :)

MattDodsonEnglish · 2023-01-25T20:08:38Z

Nice, I made some edits for structure and sentences to merge into this branch. #1003 @imiric , I'd appreciate a review. I don't think you need to analyze it too deeply―just if anything feels off to you. I'll proofread the text after the benchmarks are added.

It's a hard article to organize because of its length and the breadth, but I'm happy with how it shaped up. After my merges, I'll be totally fine with the article (except my necessary proofreading).

imiric · 2023-01-27T11:16:33Z

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md

+As general guidelines, look for the following:
+- CPU utilization doesn't exceed 90%. If all CPU cores are 100% utilized during the test run, you might notice performance degradation in your test results.
+- Network utilization is at an acceptable level. Depending on your test, you might expect to fully saturate the network bandwidth, or this might be a signal that your test is bound by the available bandwidth. In other scenarios, you might want to minimize network usage to keep costs down.
+- Memory utilization doesn't exceed 90%. If you're close to exhausting available physical RAM, the system might start swapping memory to disk, which will affect performance and system stability. In extreme cases, running out of memory on Linux will cause the system to end the k6 process.


Should we recommend running without swap? Our loadgens don't use it, AFAIK. cc: @mstoykov

I would recommend people to run without swap in general ;) The only use at this point IMO are :

systems which don't have enough ram at all

hibernation (which is why I still have swap on my laptop)

In (my) practice the difference between rogue process running out of memory and dying and going into swap is just that in the second case it dies a lot slower and later usually thrashing the whole system in the process ;).

I guess with NVMe SSDs this is a bit better but even on my laptop whenever I go to swap I find out as the whole system goes really slow.

Given that k6 is a performance tool - it performance degrading significantly because swap is used seems like a terrible idea even if the process itself doesn't die.

Fully agree, so I think this deserves a separate section that explains it. Will add, thanks.

Resolved by 1ed89f2.

Co-authored-by: Ivan Mirić <[email protected]>

…nning large tests.md Co-authored-by: Ivan Mirić <[email protected]>

na-- · 2023-01-27T14:04:06Z

Removed myself from the list of reviewers since this doesn't need my review, I likely can't contribute anything that @mstoykov and @imiric haven't already .

large tests, edit structure, format, and sentences

Resolves #992 (comment)

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md

…nning large tests.md Co-authored-by: Mihail Stoykov <[email protected]>

ppcano · 2023-02-08T17:17:19Z

@imiric any blocker - can we merge it?

imiric · 2023-02-09T09:39:15Z

@ppcano Not yet. It's waiting for PRs in k6-hardware-benchmark to be approved and merged.

mstoykov

LGTM!

This reads as marketing material with a bunch of hyperbole and statements bordering on "k6 is the only load testing tool that users your resources fully". Which arguably is not true. Apart from locust I think all the other fairly known load testing tools use the resources fully - some of them just have a lot of other things that eat into their actual performance (jMeter for example).

And a lot of those statements would've been a lot better if we were comparing with other tools, and we had "see graph k6 does 10kRPs, jMeter(for example) does 2k RPS, both at 80% CPU". But we show only k6 and then are "it's better than most" which IMO is pointless - better than most is still worse than some which might be what people use. And with no concrete comparison there is no way for someone to know if with this same setup their load testing tool will not give them twice the performance.

I personally do not think the docs are a place for marketing, so I would prefer if we tone it down a bunch on the amount of comparison to others where we ... just say stuff instead of show them.

I still think there is a lot of valuable information and showing that you can do X RPS for some situation is still beneficial to users, and we should have it. But a bunch of the article reads as "k6 exceptionalism" to me.

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md

imiric · 2023-02-10T09:20:37Z

@mstoykov You're right that the article has a marketing tone biased towards k6. I wouldn't say it reads like k6 is the only tool that fully utilizes system resources, but we're pointing out a clear advantage over other tools that struggle with that. This is not a comparison article like the "Open source load testing tool review" (which we should also eventually update, and possibly automate), so mentioning the performance of other tools would be out of context. This article is about how to configure the system and k6 to fully utilize the available hardware resources.

I personally do not think the docs are a place for marketing

Agreed, so you'll be happy to know that this article will be moved to the blog. 😄

a bunch of the article reads as "k6 exceptionalism" to me

Can you point out all those places?

This shouldn't read as a marketing article, but then again, it should point out k6's strengths, so that's a thin line to tread. I can try toning down some of these parts, ~~with @MattDodsonEnglish's help, of course.~~ Ah, sorry, Matt has limited availability right now, so we can change it ourselves.

MattDodsonEnglish · 2023-02-10T11:28:46Z

, with @MattDodsonEnglish's help, of course. Ah, sorry, Matt has limited availability right now, so we can change it ourselves.

Thanks, @imiric ! 🙏 I can make a quick comment though. Besides that first line, I also don't really see the "k6 exceptionalism" that @mstoykov pointed out. The only other overt line that I found was "not as greedy as other tools" but that does link to a data-driven comparison.

I believe that for technical articles like this, the best marketing is honesty and usefulness. So if there are other places where the doc seems over the top, I'd prefer to delete those lines. I just don't know what lines those are exactly.

However, the first line makes such a strong impression that might it frame the whole article as marketing material, when really the material is quite practical and technical. That's also from the original version, and since then there's been much revision. So it isn't very harmonious with the rest of the article. I proposed cooling it down here:

#992 (comment).

Partly resolves #992 (review)

imiric · 2023-02-10T16:26:51Z

I know that you already approved this @mstoykov, but it sounded like you had major objections to the marketing tone of the article, so please take another look if the recent changes are an improvement.

So with that approval, this will be ready to merge, since the k6-benchmarks changes are done.

mstoykov

I guess I over-reacted as the start was particularly marketing like IMO.

Although that in general seems to be mostly from the previous version then anything done to this one.

With the latest changes IMO there are no unfound comparisons.

But maybe we should also try to get https://github.com/grafana/loadgentest updated? Maybe with just less tools?

imiric · 2023-02-13T10:25:34Z

@mstoykov Yeah, we should definitely update the comparison article and benchmark results at some point as well. This year would be great.

Can we get Ragnar to do it again? 😄

Anyway, I'm merging this as is, with messy history and all. It would be a lot of work to clean it up, and we want to preserve the history, so I wouldn't squash it.

Thanks for the reviews, everyone! 🙇

Ivan Mirić added 3 commits January 19, 2023 16:08

Some general cleanup

c6a2897

Link SUT term to glossary

570df9c

imiric requested review from mstoykov, sniku, na--, immavalls and MattDodsonEnglish January 19, 2023 15:42

Ivan Mirić added 11 commits January 19, 2023 16:54

Suggest more system monitoring tools

0efc098

Rephrase resource monitoring tool recommendation

0a28465

The number of terminals is not relevant if using e.g. nmon, which can monitor CPU, RAM and network.

Rephrase discardResponseBodies explanation

527a768

First explain the default k6 behavior, then introduce the option.

Remove trailing backslash from example k6 run command

fd4e9d9

Rewrite 'Test script optimizations' section

038fb7b

Rewrite Costs section

898e91d

Minor tweaks to the Errors section

a286341

Mention k6 Cloud specifically

4fac5c1

The "cloud" (the web buzzword) is not the same as "k6 Cloud" (the product).

Minor updates to the "Distributed execution" section

06675a4

Add URL grouping optimization suggestion

dbb023d

imiric force-pushed the update/208-running-large-tests branch from 999a8a0 to dbb023d Compare January 19, 2023 15:54

immavalls approved these changes Jan 20, 2023

View reviewed changes

mstoykov approved these changes Jan 20, 2023

View reviewed changes

MattDodsonEnglish reviewed Jan 23, 2023

View reviewed changes

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md Show resolved Hide resolved

MattDodsonEnglish reviewed Jan 23, 2023

View reviewed changes

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md Outdated Show resolved Hide resolved

MattDodsonEnglish reviewed Jan 23, 2023

View reviewed changes

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md Outdated Show resolved Hide resolved

MattDodsonEnglish reviewed Jan 23, 2023

View reviewed changes

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md Outdated Show resolved Hide resolved

MattDodsonEnglish reviewed Jan 23, 2023

View reviewed changes

Remove stray backtip

a179d81

imiric commented Jan 27, 2023

View reviewed changes

MattDodsonEnglish and others added 5 commits January 27, 2023 10:41

Remove "mistake" from happy path

a2c925f

Co-authored-by: Ivan Mirić <[email protected]>

Really remove mistake from happy path

92d2a36

Use present tense

c9c1b3b

Apply Imiric suggestions from code review

d44f574

Co-authored-by: Ivan Mirić <[email protected]>

Update src/data/markdown/translated-guides/en/06 Testing Guides/04 Ru…

de66242

…nning large tests.md Co-authored-by: Ivan Mirić <[email protected]>

na-- removed their request for review January 27, 2023 14:00

imiric mentioned this pull request Jan 27, 2023

Replace usage of DescriptionList where not semantically correct #1010

Open

Ivan Mirić added 3 commits January 31, 2023 12:38

Merge pull request #1003 from grafana/208-restructure

84d1ad9

large tests, edit structure, format, and sentences

Fix annoying Vale errors

d96acb6

Add a note about swap space

1ed89f2

Resolves #992 (comment)

mstoykov reviewed Jan 31, 2023

View reviewed changes

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md Outdated Show resolved Hide resolved

Update src/data/markdown/translated-guides/en/06 Testing Guides/04 Ru…

8e80e6c

…nning large tests.md Co-authored-by: Mihail Stoykov <[email protected]>

mstoykov approved these changes Feb 10, 2023

View reviewed changes

src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md Outdated Show resolved Hide resolved

Ivan Mirić added 2 commits February 10, 2023 13:09

Soften some marketing language biased towards k6

62bd7e8

Partly resolves #992 (review)

Update links to k6-benchmarks

522ba38

imiric requested a review from mstoykov February 10, 2023 16:26

mstoykov approved these changes Feb 10, 2023

View reviewed changes

imiric merged commit d976e1c into main Feb 13, 2023

imiric deleted the update/208-running-large-tests branch February 13, 2023 10:28

na-- mentioned this pull request Mar 6, 2023

Improve the per-VU buffer pool grafana/k6#2879

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update "Running large tests" article #992

Update "Running large tests" article #992

imiric commented Jan 19, 2023 •

edited

Loading

immavalls left a comment

immavalls Jan 20, 2023

imiric Jan 20, 2023

MattDodsonEnglish commented Jan 20, 2023

MattDodsonEnglish Jan 23, 2023

MattDodsonEnglish Jan 23, 2023

imiric Jan 24, 2023

MattDodsonEnglish Jan 24, 2023

imiric Jan 24, 2023

MattDodsonEnglish Jan 23, 2023

MattDodsonEnglish Jan 23, 2023

imiric Jan 24, 2023

MattDodsonEnglish Jan 24, 2023

imiric Jan 24, 2023

MattDodsonEnglish commented Jan 25, 2023 •

edited

Loading

imiric Jan 27, 2023

mstoykov Jan 27, 2023

imiric Jan 27, 2023

imiric Jan 31, 2023

na-- commented Jan 27, 2023

ppcano commented Feb 8, 2023

imiric commented Feb 9, 2023

mstoykov left a comment

imiric commented Feb 10, 2023 •

edited

Loading

MattDodsonEnglish commented Feb 10, 2023

imiric commented Feb 10, 2023 •

edited

Loading

mstoykov left a comment

imiric commented Feb 13, 2023 •

edited

Loading


		### Testing for RPS
		We maintain a [repository](https://github.com/grafana/k6-hardware-benchmark) with some scripts used to benchmark k6 and create reports. These tests are run for every new k6 version, and you can see the results in the [`results/` directory](https://github.com/grafana/k6-hardware-benchmark/tree/master/results).

Update "Running large tests" article #992

Update "Running large tests" article #992

Conversation

imiric commented Jan 19, 2023 • edited Loading

immavalls left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattDodsonEnglish commented Jan 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattDodsonEnglish commented Jan 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- commented Jan 27, 2023

ppcano commented Feb 8, 2023

imiric commented Feb 9, 2023

mstoykov left a comment

Choose a reason for hiding this comment

imiric commented Feb 10, 2023 • edited Loading

MattDodsonEnglish commented Feb 10, 2023

imiric commented Feb 10, 2023 • edited Loading

mstoykov left a comment

Choose a reason for hiding this comment

imiric commented Feb 13, 2023 • edited Loading

imiric commented Jan 19, 2023 •

edited

Loading

MattDodsonEnglish commented Jan 25, 2023 •

edited

Loading

imiric commented Feb 10, 2023 •

edited

Loading

imiric commented Feb 10, 2023 •

edited

Loading

imiric commented Feb 13, 2023 •

edited

Loading