-
Notifications
You must be signed in to change notification settings - Fork 580
Enable probes. #3268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable probes. #3268
Conversation
PR HealthBreaking changes ✔️
Changelog Entry ✔️
Changes to files need to be accounted for in their respective changelogs. Coverage ✔️
This check for test coverage is informational (issues shown here will not fail the PR). API leaks ✔️The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.
License Headers ✔️
All source files should start with a license header. Unrelated files missing license headers
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding testing to the dartpad front-end is fantastic!
I'm curious - how stable are these golden images? They seem relatively semantic (likely pretty stable?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The images are little different on different environments. And, with flutter changes the images may modify.
So, yes, there is some level of flackiness. It is worth it, because of number of errors the goldens are catching.
The flakiness can be tuned by golden diff tolerance that is set in flutter_test_config.dart.
It is how most apps implement it.
|
||
import 'model.dart'; | ||
|
||
export 'model.dart'; | ||
|
||
@visibleForTesting | ||
int activeHttpRequests = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this variable? Is there a way to test the app without it? I'm worried that this will cause some maintenance overhead in the future. For example, if we forget to increment / decrement this value in other places in the code where we are making HTTP requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment why it is needed.
I do not see other simple way how to test without it.
Another option is to to create DartPadHttpClient, that will incapsulate this variable.
But I do not see large difference with what we have now: ServicesClient incapsulates it pretty well and updates for all types of the http requests: get, post, stream etc.
If things change and we will forget to support this variable, we will get tests failing, debug them and figure out how to make it better.
For now it seems to be ok.
I am open for other suggestions.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a comment explaining why it's needed - could you help me understand why we need this? Does pumpAndSettle not run the app long enough or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my machine, I'm able to comment out the await waitForRequestsToComplete(tester);
line, and the test still passes, so I'm wondering if we can take this out for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If failed for me number of times. It will introduce flakiness. You do not want it. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea, maybe we could run the tests against a local server instead? Maybe that would reduce flakiness here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced flackiness is still flackiness. I wrapped standard http client with DartPadHttpClient. So, now no way to forget something.
About what is 'right': there is no 'right' and 'wrong' here. Every app and product choose their way to test things. Some test environments to not allow http requests, some allow them. Flutter tests allow them, and some flutter developers are doing them.
Yes, I see howserver_test.dart
is ramping up the backend locally: server = await EndpointsServer.serve(0, sdk, null, 'nnbd_artifacts');
, but:
- it will fail for AI requests
- If i test UI against it, it will not ensure backward compatibility with what is hosted in prod. Yes, we try to release backend and front end roughly at the same time, but there is race condition, and one of the releases may fail.
For v0 of testing I would stick with real requests because:
- It will double check our backend is working as expected. If we switch to local backend, we will stop testing prod is working.
- I can take care about cloud logging (is it
_logger.info
or something else?). However, while DartPad internal and external developers people work with DartPad, they generate more fake traffic, than unit tests would do. - To ensure ui/backend parity we just need changes to be backward compatible. It is good thing, because it naturally enforces the the PRs to be roll-back safe.
- I am not sure how to ramp up local backend in unit tests without exposing api keys. Or, we can ramp it up knowing that we will not test AI, that may start failing any time. It seems to be good thing that we will be alerted about AI failures by daily test run (do we want to make it hourly?).
It is possible to have testing for both local and prod backend, with local one skipping ai testing, but, assuming we need testing for prod anyway, I do not see why we may want to add testing for fake backend.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general my point is:
- This solution is simplest and cheapest possible, covers things end-to-end, and ready to merge
- It is better than no UI testing at all
- We can turn off testing or improve it any time when we see something is wrong with it
- But before improving it I want to know where is the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you for offering to improve our testing in DartPad. This PR was not quite what I was expecting after our discussion last week, so I want to make sure we are aligned on the overall testing strategy first before we jump to solutions that don't work in the long-term.
Overall, I don't think this is a very valuable test for a few reasons:
- It's flaky, and requires extra code to track HTTP requests to reduce the flakiness
- It's a golden test, which is primarily used for testing UI fidelity. It's not the right tool for testing the backend or other UI functionality.
- It uses the actual DartPad backend, which messes up our user metrics (which we desperately need right now, since we don't have a proper Dart analytics package to use).
About what is 'right': there is no 'right' and 'wrong' here. Every app and product choose their way to test things. Some test environments to not allow http requests, some allow them. Flutter tests allow them, and some flutter developers are doing them.
I worry that it's not maintainable in the long-term, and it's important that we make sure the changes we are making align with our overall testing strategy, since we are interested in adding tests so that we can confidently make changes to the codebase over the long-term (starting with #3235). There's a lot of functionality that this PR doesn't include that I think are important (application state, compilation + running, hot reload, user interaction, having testable code structure, etc.)
It will double check our backend is working as expected. If we switch to local backend, we will stop testing prod is working.
I am not sure how to ramp up local backend in unit tests without exposing api keys. Or, we can ramp it up knowing that we will not test AI, that may start failing any time. It seems to be good thing that we will be alerted about AI failures by daily test run (do we want to make it hourly?).
This is why we need to align on our testing strategy, I don't think this is the right approach for testing the genUI backend. We should have a separate tool for a backend health-check. I don't think a golden test is the right tool for the job here.
We can turn off testing or improve it any time when we see something is wrong with it
Sure, but I see some problems that I'd like to resolve before we jump to that step.
But before improving it I want to know where is the issue
I think we should first specify how we want our UI tests to work before we start fixing issues with them. Otherwise, we will be moving in the wrong direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version took couple hours to put together. Now I know what is possible and we kicked of the discussion.
Thank you for your comments and, yes, let's align on the test strategy.
Aligned on tech details here: #3270 |
@@ -653,7 +665,7 @@ class DartPadAppBar extends StatelessWidget implements PreferredSizeWidget { | |||
bottom: bottom, | |||
actions: [ | |||
// Hide the Install SDK button when the screen width is too small. | |||
if (constraints.maxWidth > smallScreenWidth) | |||
if (constraints.maxWidth >= minLargeScreenWidth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes intentional? What's the reason behind changing the breakpoint behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to set width to this value and I want screen to be still wide, to test for overload.
This was my idea, that did not work out, because test rendering sizes are little different than on mac and web, may be because they are on ubuntu.
But it is still good, because eventually we may want to test for overload for web platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this affect the user experience?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not 'overload', but 'overflow'.
Overflow means there may be visual cuts that developer did not expect. In debug mode it is error, in prod it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change of > to >= will not change user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Contributes to #3254