Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to new vSphere environment #6877

Open
14 tasks
sbueringer opened this issue Jun 7, 2024 · 3 comments
Open
14 tasks

Migration to new vSphere environment #6877

sbueringer opened this issue Jun 7, 2024 · 3 comments
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.

Comments

@sbueringer
Copy link
Member

sbueringer commented Jun 7, 2024

Context

  • cluster-api-provider-vsphere, cloud-provider-vsphere & image-builder are currently using a VMware owned and managed VMC project for CI
  • Goal of this issue is to track the migration to a new community-owned environment (Google Cloud VMware Engine (GVE))

Prerequisites

  • [WIP] Finalize funding & billing (responsible people are already engaged)
    • Currently waiting for final approval
    • Clarify how billing works exactly

Critical path

  • Setup new GCP project + GVE instance (+ related objects)
  • Setup vCenter configuration
  • Setup Boskos configuration (see separate section below for details)
  • Change existing ProwJobs & presets (+ corresponding secrets in GCP Secrets Manager)
    • Update ProwJobs to run on a community-owned cluster (including setting resources)
    • Update presets to point to secrets with the new vSphere config (URL, credentials, thumbprint, ...) and VPN credentials

Open points

Networking

Requirements:

  • ProwJobs (tests & janitor) need access to the vCenter API and to VMs running inside of vCenter
  • VMs running inside of vCenter need access to the vCenter API

Current implementation in VMC: (VPN tunnel)

  • VPN VM with public IP running within vCenter
  • vCenter and VM IPs are not public
  • ProwJobs get VPN certificates & config via presets
  • Advantages:
    • We already have a working implementation, we just have to replicate it
    • No restrictions regarding how many IPs we can use for VMs within vCenter because they are private (we need ~ at least 1024, more would be better)

Alternatives to be explored: (sorry didn't understand the entire discusison in the meeting, just chime in below)

  • Expose vCenter API & VM IPs publicly
    • I really would like to avoid this for security reasons
  • Peering between existing Prow cluster and GVE instance
  • Additional Prow cluster for vSphere jobs in the same private network as the GVE instance

Authentication / Authorization (Okta?)

Requirements:

  • vCenter access for the following users:
    • for tests: (technical users)
      • cluster-api-provider-vsphere
      • cloud-provider-vsphere
      • image-builder
    • for cleanup: (technical user)
      • janitor (currently implemented as periodic ProwJob, cleans up resources from Boskos)
    • administrative access:

Boskos configuration & presets

The following describes our current setup in VMC. We would like to use the same in the new GVE environment, using the same in GVE will also make the migration simpler and faster.

Notes:

  • vCenter:
    • Resource pools and folders have the following structure, e.g. /prow/cluster-api-provider-vsphere/{001, 002, ...}
      • This allows us to track resource usage per repository/project
    • One user per project (which only has permissions on the corresponding project resource pool & folder)
      • This ensures we have isolation between projects
    • One user for janitor which has access to all project resource pools / folders to cleanup
  • Presets:
    • VPN credentials and the respective user credentials are injected into the ProwJobs via presets
  • Boskos:
    • Contains one resource for each (resource pool, folder) pair (user data also contains the corresponding IP pool configuration)
    • We use different resource types for the different repositories/projects

boskos drawio

(picture source on sbueringer#1, can be opened with drawio)
(current Boskos setup in the old VMC environment can be seen here: sbueringer#1)

Jobs that still have to be migrated

I checked all jobs that are still using the current vSphere environment and also the ones that are still using credentials from a VMware-owned GCP project to push images for: cluster-api-provider-vsphere, cloud-provider-vsphere, vsphere-csi-driver and image-builder. No suprises there.

The following jobs can be migrated once the new env is functional:

  • cluster-api-provider-vsphere:
    • periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-{{ ReplaceAll $.branch "." "-" }}
    • periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-{{ ReplaceAll $.branch "." "-" }}
    • periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-ci-latest-{{ ReplaceAll $.branch "." "-" }}
    • periodic-cluster-api-provider-vsphere-janitor
    • periodic-cluster-api-provider-vsphere-e2e-exp-kk-alpha-features
    • periodic-cluster-api-provider-vsphere-e2e-exp-kk-serial
    • periodic-cluster-api-provider-vsphere-e2e-exp-kk-slow
    • periodic-cluster-api-provider-vsphere-e2e-exp-kk
    • periodic-cluster-api-provider-vsphere-e2e-{{ $mode }}-upgrade
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-blocking-{{ ReplaceAll $.branch "." "-" }}
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-{{ ReplaceAll $.branch "." "-" }}
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-upgrade
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-{{ ReplaceAll $.branch "." "-" }}
    • pull-cluster-api-provider-vsphere-e2e-{{ $mode }}-conformance-ci-latest-{{ ReplaceAll $.branch "." "-" }}
    • pull-cluster-api-provider-vsphere-janitor-main
  • cloud-provider-vsphere:
    • pull-cloud-provider-vsphere-e2e-test
    • pull-cloud-provider-vsphere-e2e-test-on-latest-k8s-version
    • pull-cloud-provider-vsphere-e2e-test-1-26-minus
  • image-builder:
    • pull-ova-all

The following jobs can be migrated today: (I talked to the maintainers of vsphere-csi-driver about it)

  • vsphere-csi-driver:
    • post-vsphere-csi-driver-deploy
    • post-vsphere-csi-driver-release
@sbueringer
Copy link
Member Author

/assign @chrischdi @fabriziopandini @sbueringer

@sbueringer
Copy link
Member Author

/cc @BenTheElder @upodroid @ameukam @dims

@ameukam
Copy link
Member

ameukam commented Jun 25, 2024

/sig k8s-infra
/priority backlog
Let freeze this until the requirements are met
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Jun 25, 2024
@ameukam ameukam moved this to Backlog in SIG K8S Infra Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Projects
Status: Backlog
Development

No branches or pull requests

5 participants