-
Notifications
You must be signed in to change notification settings - Fork 38
DockerIM should use operation mechanism properly #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return nil, fmt.Errorf("operation not found for %q", name) | ||
| } | ||
| entry := val.(*operationEntry) | ||
| ctx, cancel := context.WithTimeout(context.TODO(), 3*time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can context.TODO() be context.Background() ?
| return entry.(*operationEntry).op | ||
| } | ||
| } | ||
| panic("Reached newOperationRetryLimit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoud this be panic? looks like it stops service if newOperation is failed, how about just logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to be panic, as retry count increments when uuid conflict happens. I believe it's very close to 0% with large retry couny, even less than sha256 hash conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave a comment explaining that the chances of hitting this are practically zero and not worth it of changing the API to return an error and that in the unlikely scenario this triggers a panic is better than running forever or returning an invalid/nil operation.
| case "POST /operations/deletingbar/:wait": | ||
| writeOK(w, apiv1.HostInstance{Name: "bar"}) | ||
| case "POST /operations/deletingbaz/:wait": | ||
| writeOK(w, apiv1.HostInstance{Name: "baz"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiting for a delete host operation should return an empty response: https://github.com/google/cloud-android-orchestration/blob/main/pkg/app/instances/gce.go#L344. Please fix the DockerIM to return an empty response first.
| return | ||
| } | ||
| ins := &apiv1.HostInstance{} | ||
| if err := c.waitForOperation(&op, ins); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiting for a delete host operation should return an empty response: https://github.com/google/cloud-android-orchestration/blob/main/pkg/app/instances/gce.go#L344.
create a new helper: waitForOperation() where you don't have to pass a response argument, after fixing the DockerIM implementation.
| mutexes sync.Map | ||
| Config Config | ||
| Client *client.Client | ||
| mutexes sync.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CO IM is designed to be stateless.
Why do you need to add this new state to the Docker IM implementation? What are the actual operations you need this for? You should be relying on Docker Engine to query whether a container was created, deleted and so on, to track such operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was motivated from taking long(> 1min) times to reply REST API, such as POST /hosts and DELETE /hosts.
The CO IM is designed to be stateless.
Then this PR should be definitely my fault.. How should I get such information? I think these circumstances are repeated, and I wish to retrieve how we manage those before working on it.
Context: b/453876231