Skip to content

Conversation

@0405ysj
Copy link
Collaborator

@0405ysj 0405ysj commented Nov 10, 2025

Context: b/453876231

@0405ysj 0405ysj marked this pull request as ready for review November 10, 2025 03:53
@0405ysj 0405ysj requested review from ikicha and k311093 and removed request for Databean, adelva1984, jmacnak and rmuthiah November 10, 2025 03:53
return nil, fmt.Errorf("operation not found for %q", name)
}
entry := val.(*operationEntry)
ctx, cancel := context.WithTimeout(context.TODO(), 3*time.Minute)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can context.TODO() be context.Background() ?

return entry.(*operationEntry).op
}
}
panic("Reached newOperationRetryLimit")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoud this be panic? looks like it stops service if newOperation is failed, how about just logging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to be panic, as retry count increments when uuid conflict happens. I believe it's very close to 0% with large retry couny, even less than sha256 hash conflict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a comment explaining that the chances of hitting this are practically zero and not worth it of changing the API to return an error and that in the unlikely scenario this triggers a panic is better than running forever or returning an invalid/nil operation.

case "POST /operations/deletingbar/:wait":
writeOK(w, apiv1.HostInstance{Name: "bar"})
case "POST /operations/deletingbaz/:wait":
writeOK(w, apiv1.HostInstance{Name: "baz"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for a delete host operation should return an empty response: https://github.com/google/cloud-android-orchestration/blob/main/pkg/app/instances/gce.go#L344. Please fix the DockerIM to return an empty response first.

return
}
ins := &apiv1.HostInstance{}
if err := c.waitForOperation(&op, ins); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for a delete host operation should return an empty response: https://github.com/google/cloud-android-orchestration/blob/main/pkg/app/instances/gce.go#L344.

create a new helper: waitForOperation() where you don't have to pass a response argument, after fixing the DockerIM implementation.

mutexes sync.Map
Config Config
Client *client.Client
mutexes sync.Map
Copy link
Member

@ser-io ser-io Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CO IM is designed to be stateless.

Why do you need to add this new state to the Docker IM implementation? What are the actual operations you need this for? You should be relying on Docker Engine to query whether a container was created, deleted and so on, to track such operations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was motivated from taking long(> 1min) times to reply REST API, such as POST /hosts and DELETE /hosts.

The CO IM is designed to be stateless.

Then this PR should be definitely my fault.. How should I get such information? I think these circumstances are repeated, and I wish to retrieve how we manage those before working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants