Skip to content

[feature] Allow force-deletion of machines managed by a (static) infra provider #1272

@utkuozdemir

Description

@utkuozdemir

Problem Description

  • Your Omni instance has some machines managed by the bare-metal infra provider.
  • Bare-metal infra provider instance is shut down/removed completely, while its machines are still present in Omni.
  • Try to delete these machines (delete the machine from the UI / delete via omnictl delete link)
  • The resources will get stuck because the bare-metal infra provider still has its finalizers on them.
  • The only way out of this scenario at this moment is to bring back the provider. The provider ID has to match, it can be a dummy provider, it just needs to be able to attempt to finalize the machines (i.e., wipe their disks). Provider's finalization is already best-effort (i.e., even if it cannot wipe, it will remove its finalizer from the Omni resources Link, InfraMachine etc.)

The user, when they know that the bare-metal provider or the machines are gone for good (formatted/disconnected forever etc.), should be able to bring Omni to the clear, fresh state.

Solution

Outcome of in-team discussion:

  • It must not be possible to remove an InfraProvider as long as it has any machines that are not in tearingDown phase. In other words, it should either have no machines or the machines which were attempted to be removed. Do this via validation.
  • User needs to explicitly attempt to remove all machines of this provider. They might get stuck in the tearingDown phase.
  • Then, the user attempts to remove the InfraProvider on Omni.
  • InfraProvider goes into tearingDown phase. In this phase, it won't let any new machine registrations for its ID.
  • A new controller watches InfraProviders as input, setting its finalizer on them. When an input goes tearingDown, it should clean up the finalizers of this InfraProvider from the respective resources (e.g., Links, InfraMachines and so on. Check for more). Effectively, acting as a "stub" of the infra provider being deleted, freeing these machines further deletion.
    • Idea: we should also remove the own namespace of the provider, to not leave any traces of it.
  • After this new controller is done with its job, it removes its finalizer from InfraProvider, so it gets deleted as well.

Alternative Solutions

No response

Notes

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions