-
Notifications
You must be signed in to change notification settings - Fork 231
Update "Running large tests" article #992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The default compatibility-mode (extended) in recent k6 versions only has minor overhead from Babel, since core.js was removed several versions ago. And as goja keeps improving, we're planning to drop Babel altogether, which will likely mean deprecating compatibility-mode. The end result of this is that currently in v0.42.0, compatibility-mode=base has very little benefit over extended. To the point where it's not worth the hassle of maintaining a build pipeline for most users. For these reasons it's best to just stop recommending its usage here.
The number of terminals is not relevant if using e.g. nmon, which can monitor CPU, RAM and network.
First explain the default k6 behavior, then introduce the option.
The "cloud" (the web buzzword) is not the same as "k6 Cloud" (the product).
In order to update this information easily, we will keep benchmark results in the k6-hardware-benchmark repository, and leave this article for suggestions only. This way we can avoid having to continously update this article, and it will make it more readable by being shorter. In the future we might also remove specifically mentioning EC2, as benchmarks will be run from GitHub Actions (Azure). Or we might want to expand this article and give broad suggestions for the largest cloud providers.
999a8a0
to
dbb023d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great updates @imiric, just added some minor suggestions. @MattDodsonEnglish should probably review as he is the k6 tech writing expert 😄
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
|
||
### Testing for RPS | ||
We maintain a [repository](https://github.com/grafana/k6-hardware-benchmark) with some scripts used to benchmark k6 and create reports. These tests are run for every new k6 version, and you can see the results in the [`results/` directory](https://github.com/grafana/k6-hardware-benchmark/tree/master/results). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL https://github.com/grafana/k6-hardware-benchmark/tree/master/results does not work. I find that we have something for v0.42.0 in https://github.com/grafana/k6-hardware-benchmark/tree/update/k6-v0.42.0/results/v0.42.0, but not in the master branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, but like I mentioned in the PR description, we shouldn't merge this before the benchmarking is finished. That will happen after the results are added to the k6-hardware-benchmark repo, so this link will eventually work.
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
A rendered version in staging is here: https://mdr-ci.staging.k6.io/docs/refs/pull/992/merge/testing-guides/running-large-tests/ FYI, I'm going to only read it and see if I have any structural changes. I'll wait to do copy edits till the final version, since that's the most obvious and mechanical work. |
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Show resolved
Hide resolved
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
Tests that are using file uploads can consume tens of megabytes per VU. | ||
Tests that are using file uploads, or load large JS modules, can consume tens of megabytes per VU. | ||
Keep in mind that each VU has a copy of all JS modules your test uses. | ||
If you need to share memory between VUs, consider using [SharedArray](/javascript-api/k6-data/sharedarray/), or an external data store, such as [Redis](/javascript-api/k6-experimental/redis/). | ||
|
||
## General advice | ||
|
||
### Make your test code resilient | ||
|
||
When running large stress tests, your script can't assume anything about the HTTP response. | ||
Often performance tests are written with a "happy path" in mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd seen this "Happy path" term twice now, and I only just realized that it's an industry term. I think we should define it. Now I'm not the brightest, but even if it's a common term in CS, not all of our readers will come from the same language and educational backgrounds.
|
||
## General advice | ||
|
||
### Make your test code resilient | ||
|
||
When running large stress tests, your script can't assume anything about the HTTP response. | ||
Often performance tests are written with a "happy path" in mind. | ||
For example, a "happy path" check like the one below is something that we see in k6 often. | ||
For example, a "happy path" check like the one below is something that we often see in k6 scripts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I understand what "Happy path" is, is this really a happy path check? It seems like it does not assume the happy path, since it accepts the possibility of an incorrect body size.
Instead, it seems like it's more a case of error handling that's not general enough.
Proposed change:
Even if scripts accept the SUT won't always respond according to the happy path, it still might not handle a sufficiently wide range of errors.
For example, the following check is something we often see in k6 scripts:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a happy path check, since it doesn't take into account failure conditions under load, when the response body wouldn't exist. The body size check is a standard assertion, and the user might want the test to fail if it's incorrect.
Your suggestion sounds more confusing, sorry, so I'd rather leave this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a happy path check, since it doesn't take into account failure conditions under load, when the response body wouldn't exist. The body size check is a standard assertion, and the user might want the test to fail if it's incorrect.
I don't understand. Isn't an incorrect body size another indication of system failure? Or are we defining failure as "no body response at all?". From the Wikipedia page, I thought a happy path is where there are no error conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An incorrect body size would in this case fail the check in an expected way. I.e. the test would just fail. This is not an exceptional condition.
The response body not existing is an exceptional condition the script is not handling, therefore it's only testing the "happy path".
Trust me. This is fine :)
The issue here is that the check assumes that there's always a body in a response. The `r.body` may not exist if server is failing. | ||
In such case, the check itself won't work as expected and error similar to the one below will be returned: | ||
The issue here is that the check assumes that there's always a body in a response. The `r.body` may not exist if the server is failing. | ||
In such case, the check itself won't work as expected and an error similar to the one below will be returned: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we add an example of how to handle this exception? Right now it seems we diagnose a problem, but don't give any way to find a solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I realize now that this is in the text, just not in the diffs. Still, it looks like the example only fails a check if there is a body to begin with.
const checkRes = check(res, {
'Homepage body size is 11026 bytes': (r) => r.body && r.body.length === 11026,
});
Wouldn't it make more sense to have two checks? One that checks that a body is returned and other that, when returned, it's the correct length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to have two checks?
You could do that, but you'd then have to add a dependency between the body length check and the body existence check, which would complicate the check functions. It's much simpler to use &&
here to avoid the TypeError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm think I confused myself about the program behavior yesterday.
But, shouldn't the check have a different name?
- Homepage body size is 11026 bytes
+ Body exists and is 11026 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matt, I think you're nitpicking at this point. It's fine as is :)
Nice, I made some edits for structure and sentences to merge into this branch. #1003 @imiric , I'd appreciate a review. I don't think you need to analyze it too deeply―just if anything feels off to you. I'll proofread the text after the benchmarks are added. It's a hard article to organize because of its length and the breadth, but I'm happy with how it shaped up. After my merges, I'll be totally fine with the article (except my necessary proofreading). |
As general guidelines, look for the following: | ||
- CPU utilization doesn't exceed 90%. If all CPU cores are 100% utilized during the test run, you might notice performance degradation in your test results. | ||
- Network utilization is at an acceptable level. Depending on your test, you might expect to fully saturate the network bandwidth, or this might be a signal that your test is bound by the available bandwidth. In other scenarios, you might want to minimize network usage to keep costs down. | ||
- Memory utilization doesn't exceed 90%. If you're close to exhausting available physical RAM, the system might start swapping memory to disk, which will affect performance and system stability. In extreme cases, running out of memory on Linux will cause the system to end the k6 process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we recommend running without swap? Our loadgens don't use it, AFAIK. cc: @mstoykov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend people to run without swap in general ;) The only use at this point IMO are :
- systems which don't have enough ram at all
- hibernation (which is why I still have swap on my laptop)
In (my) practice the difference between rogue process running out of memory and dying and going into swap is just that in the second case it dies a lot slower and later usually thrashing the whole system in the process ;).
I guess with NVMe SSDs this is a bit better but even on my laptop whenever I go to swap I find out as the whole system goes really slow.
Given that k6 is a performance tool - it performance degrading significantly because swap is used seems like a terrible idea even if the process itself doesn't die.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agree, so I think this deserves a separate section that explains it. Will add, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by 1ed89f2.
Co-authored-by: Ivan Mirić <[email protected]>
Co-authored-by: Ivan Mirić <[email protected]>
…nning large tests.md Co-authored-by: Ivan Mirić <[email protected]>
large tests, edit structure, format, and sentences
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
…nning large tests.md Co-authored-by: Mihail Stoykov <[email protected]>
@imiric any blocker - can we merge it? |
@ppcano Not yet. It's waiting for PRs in k6-hardware-benchmark to be approved and merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This reads as marketing material with a bunch of hyperbole and statements bordering on "k6 is the only load testing tool that users your resources fully". Which arguably is not true. Apart from locust
I think all the other fairly known load testing tools use the resources fully - some of them just have a lot of other things that eat into their actual performance (jMeter for example).
And a lot of those statements would've been a lot better if we were comparing with other tools, and we had "see graph k6 does 10kRPs, jMeter(for example) does 2k RPS, both at 80% CPU". But we show only k6 and then are "it's better than most" which IMO is pointless - better than most is still worse than some which might be what people use. And with no concrete comparison there is no way for someone to know if with this same setup their load testing tool will not give them twice the performance.
I personally do not think the docs are a place for marketing, so I would prefer if we tone it down a bunch on the amount of comparison to others where we ... just say stuff instead of show them.
I still think there is a lot of valuable information and showing that you can do X RPS for some situation is still beneficial to users, and we should have it. But a bunch of the article reads as "k6 exceptionalism" to me.
src/data/markdown/translated-guides/en/06 Testing Guides/04 Running large tests.md
Outdated
Show resolved
Hide resolved
@mstoykov You're right that the article has a marketing tone biased towards k6. I wouldn't say it reads like k6 is the only tool that fully utilizes system resources, but we're pointing out a clear advantage over other tools that struggle with that. This is not a comparison article like the "Open source load testing tool review" (which we should also eventually update, and possibly automate), so mentioning the performance of other tools would be out of context. This article is about how to configure the system and k6 to fully utilize the available hardware resources.
Agreed, so you'll be happy to know that this article will be moved to the blog. 😄
Can you point out all those places? This shouldn't read as a marketing article, but then again, it should point out k6's strengths, so that's a thin line to tread. I can try toning down some of these parts, |
Thanks, @imiric ! 🙏 I can make a quick comment though. Besides that first line, I also don't really see the "k6 exceptionalism" that @mstoykov pointed out. The only other overt line that I found was "not as greedy as other tools" but that does link to a data-driven comparison. I believe that for technical articles like this, the best marketing is honesty and usefulness. So if there are other places where the doc seems over the top, I'd prefer to delete those lines. I just don't know what lines those are exactly. However, the first line makes such a strong impression that might it frame the whole article as marketing material, when really the material is quite practical and technical. That's also from the original version, and since then there's been much revision. So it isn't very harmonious with the rest of the article. I proposed cooling it down here: |
I know that you already approved this @mstoykov, but it sounded like you had major objections to the marketing tone of the article, so please take another look if the recent changes are an improvement. So with that approval, this will be ready to merge, since the k6-benchmarks changes are done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I over-reacted as the start was particularly marketing like IMO.
Although that in general seems to be mostly from the previous version then anything done to this one.
With the latest changes IMO there are no unfound comparisons.
But maybe we should also try to get https://github.com/grafana/loadgentest updated? Maybe with just less tools?
@mstoykov Yeah, we should definitely update the comparison article and benchmark results at some point as well. This year would be great. Can we get Ragnar to do it again? 😄 Anyway, I'm merging this as is, with messy history and all. It would be a lot of work to clean it up, and we want to preserve the history, so I wouldn't squash it. Thanks for the reviews, everyone! 🙇 |
This is an update to the "Running large tests" article, and closes #208. I'm creating the PR to get early feedback on the changes, but let's not merge it yet until the benchmarking is finished.
See the commits for details, but some of the major changes include:
Removal of the benchmark results section.
We plan to move all benchmark results to the k6-hardware-benchmark repository, so that we can automate running the tests and generating the report there. This article will contain only general suggestions, which won't need to be updated as often, and will also make it shorter and more readable.
Removal of the suggestion of
--compatibility-mode=base
. See 2ffca0a for explanation.