-
Notifications
You must be signed in to change notification settings - Fork 767
driver(external): wait for Start() to complete on the server side
#4345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
norio-nomura
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not resolve the real problem (See #4343 (comment)).
This change should not be accepted.
b4b0dc4 to
dbb1bf6
Compare
|
Now it waits for |
|
With following patch: @@ -0,0 +1,21 @@
diff --git forkSrcPrefix/pkg/driver/vz/vz_driver_darwin.go forkDstPrefix/pkg/driver/vz/vz_driver_darwin.go
index c87995a62c8b53b7724ba4020981c3f5851cd86f..83b074f95bd39c5b7b371cc7861d5b7c3b298037 100644
--- forkSrcPrefix/pkg/driver/vz/vz_driver_darwin.go
+++ forkDstPrefix/pkg/driver/vz/vz_driver_darwin.go
@@ -299,6 +299,7 @@ func (l *LimaVzDriver) CreateDisk(ctx context.Context) error {
func (l *LimaVzDriver) Start(ctx context.Context) (chan error, error) {
logrus.Infof("Starting VZ (hint: to watch the boot progress, see %q)", filepath.Join(l.Instance.Dir, "serial*.log"))
+ defer logrus.Infof("Started VZ")
vm, waitSSHLocalPortAccessible, errCh, err := startVM(ctx, l.Instance, l.SSHLocalPort)
if err != nil {
if errors.Is(err, vz.ErrUnsupportedOSVersion) {
@@ -441,6 +442,8 @@ func (l *LimaVzDriver) ForwardGuestAgent() bool {
}
func (l *LimaVzDriver) AdditionalSetupForSSH(ctx context.Context) error {
+ logrus.Infof("AdditionalSetupForSSH: waiting for SSH port %d to be accessible", l.SSHLocalPort)
+ defer logrus.Infof("AdditionalSetupForSSH: SSH port %d is accessible", l.SSHLocalPort)
ticker := time.NewTicker(100 * time.Millisecond)
defer ticker.Stop()
following logs are recorded in It indicates that I think:
additionally:
Thoughts about driver's implementation that not related to this PR:
|
|
Added link to gRPC API document on above comment. |
dbb1bf6 to
4d6a8cc
Compare
|
Thanks for the detailed explanation! |
4d6a8cc to
0003cec
Compare
unsuman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check now?
pkg/driver/vz/vz_driver_darwin.go
Outdated
| ticker := time.NewTicker(100 * time.Millisecond) | ||
| defer ticker.Stop() | ||
|
|
||
| timeout := time.After(60 * time.Second) | ||
|
|
||
| for { | ||
| if l.waitSSHLocalPortAccessible != nil { | ||
| <-l.waitSSHLocalPortAccessible | ||
| return nil | ||
| } | ||
|
|
||
| select { | ||
| case <-ctx.Done(): | ||
| return ctx.Err() | ||
| case <-timeout: | ||
| return errors.New("timeout waiting for Start() to initialize") | ||
| case <-ticker.C: | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting these changes also works but I think it can remain so to avoid listening on an empty or uninitialised channel? @norio-nomura
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is "we don't know why it's happening, but we know what's going on", I think it's an effective code to deal with, but now that we know why it's happening, it's an unnecessary code.
If there might be something else to deal with, try not to create a situation where the channel has not been created.
When I decided to continue using waitSSHLocalPortAccessible in #4251, I didn't expect AdditionalSetupForSSH() to be called before Start() was called, so I left it created in Start().
However, if I had created a channel at the same time as creating the LimaVzDriver instance, the problem with AdditionalSetupForSSH() would not have occurred this time, and it might not have caught my attention, and the behavior of DriverClient.Start() might not have been known.
And it was causing another problem in another place (already occurring in wsl2?) It may be, so I think it's good to be as it is now.
If it had been specified as a document that "DriverClient.Start() does not wait for the actual Driver.Start()", it might have been different.
I think the current Driver API lacks documentation (doc comment).
Previously, in #4248, I tried to use SSHAddress() as the current AdditionalSetupForSSH().
I stopped for the time being because the explanation was given, but to be honest, I didn't understand it well even after reading the explanation.
I want it to be clearly stated in the document.
- When is it used: Before the instance starts? After activation?
- What is it used for?
There may be people who want to judge from the usage situation.
But for example:
lima/pkg/hostagent/hostagent.go
Lines 164 to 168 in 8b72891
| vSockPort := limaDriver.Info().VsockPort | |
| virtioPort := limaDriver.Info().VirtioPort | |
| noCloudInit := limaDriver.Info().Features.NoCloudInit | |
| rosettaEnabled := limaDriver.Info().Features.RosettaEnabled | |
| rosettaBinFmt := limaDriver.Info().Features.RosettaBinFmt |
Questions such as come to mind:
- Why does this call the same API 5 times in a row?
- Does this assume that something will change on the driver side by calling it continuously?
- Is this method of use correct?
To be honest, I'm not very interested in external drivers, so it doesn't make sense to answer the question here.
However, new contributors who may implement external drivers have no choice but to see how they are used in the current situation where there is no documentation, and I think they will have similar questions.
I don't know if there is anyone who can answer when someone asks that question in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and it might not have caught my attention, and the behavior of DriverClient.Start() might not have been known.
Many thanks!🙏🏻
To be honest, I'm not very interested in external drivers, so it doesn't make sense to answer the question here.
However, new contributors who may implement external drivers have no choice but to see how they are used in the current situation where there is no documentation, and I think they will have similar questions.
I don't know if there is anyone who can answer when someone asks that question in the future.
Thanks for pointing this out. I implemented all these things as a sprint behaviour during GSOC and also I was new to the codebase. As you’ve pointed out, many of them are inefficient. Also when implementing registry package and driver-server I assumed that one limactl process may invoke multiple external drivers but I was wrong and have to work on this too!
Since the driver API hasn’t been finalised yet, a descriptive doc is yet to be created.
0003cec to
2fe01b9
Compare
AdditionalSetupForSSH()Start() to complete on the server side
Start() to complete on the server side Start() to complete on the server side
|
|
||
| // Blocking to receive an initial response to ensure Start() is initiated | ||
| // at the server-side. | ||
| for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to loop here.
initialResp.Error should be returned as the return value of Start().
The subsequent StartResponse.Error should be sent to errCh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If initialResp.Success is false, there is no need to return errCh.
|
|
||
| errCh := make(chan error, 1) | ||
| go func() { | ||
| defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be defer func(), but go func() outside and wait for ctx.Done() to call stream.CloseSend().
In the current situation, the only way to get out of the go routine is to shut down on the server side.
When stream.CloseSend() is called, stream.Recv() should return an error and exit the go routine.
| }() | ||
|
|
||
| for { | ||
| errorStream, err := stream.Recv() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name errStream is strange. The content is *pb.StartResponse.
| for { | ||
| errorStream, err := stream.Recv() | ||
| if err != nil { | ||
| d.logger.Errorf("Error receiving response from driver: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the stream is closed from the server side, it is better to leave it in the log with Infof instead of Errorf().
| if !errorStream.Success { | ||
| errCh <- errors.New(errorStream.Error) | ||
| } else { | ||
| errCh <- nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the driver closes errChan on the server side, it is decided to send StartResponse{Success:true} to the client, and it may be good to close errCh on the client side here.
Even if the driver closes or does not close errChan at any time, even if it is not defined.
| func (s *DriverServer) Start(_ *emptypb.Empty, stream pb.Driver_StartServer) error { | ||
| s.logger.Debug("Received Start request") | ||
| errChan, err := s.driver.Start(stream.Context()) | ||
| if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The err here should be sent to the client as initialResponse.
If initialResponse.Success is true, the subsequent error received by errChan is sent to the client.
When errChan is closed, it would be a good idea to send pb.StartResponse{Success: true} to express channel closure.
d40d24a to
826b868
Compare
norio-nomura
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it has become the desired operation. 👍🏻
After that, since it's a good opportunity, let's add the action decided in this PR as a doc comment to the definition below.
pkg/driver/external/driver.proto:StartResponsepkg/driver/external/client/methods.go:Start()pkg/driver/external/server/methods.go:Start()
Actually, it would be better to test the behavior with a mock driver, but I think it's okay to do another PR.
| return nil, err | ||
| } | ||
| if !initialResp.Success { | ||
| return nil, errors.New(initialResp.Error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to stream.CloseSend() here?
The server side should also be closed, so there may be no problem, but I'm not familiar enough with gRPC to determine whether it's a problem or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server side should also be closed, so there may be no problem, but I'm not familiar enough with gRPC to determine whether it's a problem or not.
Yes, no need to close here!
| // at the server-side. | ||
| initialResp, err := stream.Recv() | ||
| if err != nil { | ||
| d.logger.Errorf("Error receiving initial response from driver start: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have previously been asked to use logger.WithError(err) when recording Error with logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case then all the error logging of pkg/driver/external/*/methods.go needs to be changed which can be a new issue. I am only changing for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #4380
826b868 to
d784575
Compare
Signed-off-by: Ansuman Sahoo <[email protected]>
d784575 to
12bf4bc
Compare
Do you mean to add a test in the CI which builds the current drivers as external and tests the gRPC transport? |
No. Isn't it possible to test the behavior of However, there is no doubt that #4343 could have been found earlier if it had been tested in CI. |
norio-nomura
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍🏻
I don't have permission to merge PR, so I will wait for it to be checked by @lima-vm/committers.
Fixes #4343
[Edit]: Also fixes #4348