Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal-cert-controller disable flag #2426

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 39 additions & 11 deletions cmd/training-operator.v2alpha1/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,24 @@ limitations under the License.
package main

import (
"context"
"crypto/tls"
"errors"
"flag"
"fmt"
"net/http"
"os"

cmapi "github.com/cert-manager/cert-manager/pkg/apis/certmanager/v1"
cmmeta "github.com/cert-manager/cert-manager/pkg/apis/meta/v1"
zaplog "go.uber.org/zap"
"go.uber.org/zap/zapcore"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
apiruntime "k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
Expand All @@ -38,17 +44,12 @@ import (
schedulerpluginsv1alpha1 "sigs.k8s.io/scheduler-plugins/apis/scheduling/v1alpha1"

kubeflowv2 "github.com/kubeflow/training-operator/pkg/apis/kubeflow.org/v2alpha1"
"github.com/kubeflow/training-operator/pkg/cert"
controllerv2 "github.com/kubeflow/training-operator/pkg/controller.v2"
runtime "github.com/kubeflow/training-operator/pkg/runtime.v2"
runtimecore "github.com/kubeflow/training-operator/pkg/runtime.v2/core"
webhookv2 "github.com/kubeflow/training-operator/pkg/webhook.v2"
)

const (
webhookConfigurationName = "validator.training-operator-v2.kubeflow.org"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small confusion should we use webhookConfigurationName instead of namespace(as of V1) ?

)

var (
scheme = apiruntime.NewScheme()
setupLog = ctrl.Log.WithName("setup")
Expand All @@ -67,10 +68,12 @@ func main() {
var probeAddr string
var secureMetrics bool
var enableHTTP2 bool
var namespace string
var webhookServerPort int
var webhookServiceName string
var webhookSecretName string
var tlsOpts []func(*tls.Config)
var enableCertController bool

flag.StringVar(&metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+
"Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.")
Expand All @@ -82,6 +85,7 @@ func main() {
"If set, the metrics endpoint is served securely via HTTPS. Use --metrics-secure=false to use HTTP instead.")
flag.BoolVar(&enableHTTP2, "enable-http2", false,
"If set, HTTP/2 will be enabled for the metrics and webhook servers")
flag.BoolVar(&enableCertController, "enable-cert-controller", false, "Enable the internal certificate controller.")

// Cert generation flags
flag.IntVar(&webhookServerPort, "webhook-server-port", 9443, "Endpoint port for the webhook server.")
Expand Down Expand Up @@ -128,12 +132,8 @@ func main() {
}

certsReady := make(chan struct{})
if err = cert.ManageCerts(mgr, cert.Config{
WebhookSecretName: webhookSecretName,
WebhookServiceName: webhookServiceName,
WebhookConfigurationName: webhookConfigurationName,
}, certsReady); err != nil {
setupLog.Error(err, "unable to set up cert rotation")
if err := createCertificate(mgr.GetClient(), namespace, webhookSecretName, webhookServiceName); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be configurable, so cert-manager is an option, not a requirement?

My understanding is we still want it possible to install Kubeflow trainer without cert-manager.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cert-manager is optional. We should not create resources automatically in manager.
We just add Certifications by manifests, not by automatic mechanism like current impl.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y can you please brief if any changes are required in current PR ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Prepare the flag whether or not the controller manager launch internal cert controller.
  2. Add CertManager installation manifests like https://github.com/kubeflow/katib/tree/master/manifests/v1beta1/installs/katib-cert-manager

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned the previous comment, we should not remove and replace the existing CertManagement mechanism. We just provide the command flag whether or not manager uses internal-certmanagement mechanism.

  1. Add a flag --enable-internal-cert (default true)
  2. If the flag is false, manager does not perform cert.ManageCerts.
  3. Add installation manifests with CertManager

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y yes i will revert the code back to cert generation by controller

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := createCertificate(mgr.GetClient(), namespace, webhookSecretName, webhookServiceName); err != nil {

setupLog.Error(err, "Unable to create CertManager certificate")
os.Exit(1)
}

Expand All @@ -155,6 +155,34 @@ func main() {
}
}

// Cert generation Using cert-manager
func createCertificate(client client.Client, namespace, secretName, serviceName string) error {
cert := &cmapi.Certificate{
ObjectMeta: metav1.ObjectMeta{
Name: secretName,
Namespace: namespace,
},
Spec: cmapi.CertificateSpec{
SecretName: secretName,
IssuerRef: cmmeta.ObjectReference{
Name: "letsencrypt-prod",
Kind: "ClusterIssuer",
},
CommonName: serviceName,
DNSNames: []string{serviceName},
Usages: []cmapi.KeyUsage{
cmapi.UsageServerAuth,
cmapi.UsageClientAuth,
},
},
}

if err := client.Create(context.Background(), cert); err != nil {
return fmt.Errorf("failed to create certificate: %v", err)
}
return nil
}

Comment on lines +158 to +185
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Cert generation Using cert-manager
func createCertificate(client client.Client, namespace, secretName, serviceName string) error {
cert := &cmapi.Certificate{
ObjectMeta: metav1.ObjectMeta{
Name: secretName,
Namespace: namespace,
},
Spec: cmapi.CertificateSpec{
SecretName: secretName,
IssuerRef: cmmeta.ObjectReference{
Name: "letsencrypt-prod",
Kind: "ClusterIssuer",
},
CommonName: serviceName,
DNSNames: []string{serviceName},
Usages: []cmapi.KeyUsage{
cmapi.UsageServerAuth,
cmapi.UsageClientAuth,
},
},
}
if err := client.Create(context.Background(), cert); err != nil {
return fmt.Errorf("failed to create certificate: %v", err)
}
return nil
}

func setupControllers(mgr ctrl.Manager, runtimes map[string]runtime.Runtime, certsReady <-chan struct{}) {
setupLog.Info("Waiting for certificate generation to complete")
<-certsReady
Expand Down
13 changes: 7 additions & 6 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ module github.com/kubeflow/training-operator
go 1.23

require (
github.com/cert-manager/cert-manager v1.16.3
github.com/go-logr/logr v1.4.2
github.com/google/go-cmp v0.6.0
github.com/onsi/ginkgo/v2 v2.20.1
github.com/onsi/gomega v1.35.1
github.com/open-policy-agent/cert-controller v0.12.0
github.com/prometheus/client_golang v1.20.2
github.com/prometheus/client_golang v1.20.4
github.com/sirupsen/logrus v1.9.3
github.com/stretchr/testify v1.9.0
go.uber.org/zap v1.27.0
Expand All @@ -17,8 +18,8 @@ require (
k8s.io/client-go v0.31.3
k8s.io/code-generator v0.31.3
k8s.io/klog/v2 v2.130.1
k8s.io/kube-openapi v0.0.0-20240430033511-f0e62f92d13f
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8
k8s.io/kube-openapi v0.0.0-20240903163716-9e1beecbcb38
k8s.io/utils v0.0.0-20240921022957-49e7df575cb6
sigs.k8s.io/controller-runtime v0.19.1
sigs.k8s.io/jobset v0.5.2
sigs.k8s.io/kueue v0.6.3
Expand All @@ -33,7 +34,6 @@ require (
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/emicklei/go-restful/v3 v3.12.1 // indirect
github.com/evanphx/json-patch v5.9.0+incompatible // indirect
github.com/evanphx/json-patch/v5 v5.9.0 // indirect
github.com/fsnotify/fsnotify v1.7.0 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
Expand Down Expand Up @@ -69,7 +69,7 @@ require (
golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56 // indirect
golang.org/x/mod v0.20.0 // indirect
golang.org/x/net v0.33.0 // indirect
golang.org/x/oauth2 v0.21.0 // indirect
golang.org/x/oauth2 v0.23.0 // indirect
golang.org/x/sync v0.10.0 // indirect
golang.org/x/sys v0.28.0 // indirect
golang.org/x/term v0.27.0 // indirect
Expand All @@ -83,6 +83,7 @@ require (
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/apiextensions-apiserver v0.31.2 // indirect
k8s.io/gengo/v2 v2.0.0-20240228010128-51d4e06bde70 // indirect
k8s.io/gengo/v2 v2.0.0-20240826214909-a7b603a56eb7 // indirect
sigs.k8s.io/gateway-api v1.1.0 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
)
24 changes: 14 additions & 10 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/cert-manager/cert-manager v1.16.3 h1:seEF5eidFaeduaCuM85PFEuzH/1X/HOV5Y8zDQrHgpc=
github.com/cert-manager/cert-manager v1.16.3/go.mod h1:6JQ/GAZ6dH+erqS1BbaqorPy8idJzCtWFUmJQBTjo6Q=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
Expand Down Expand Up @@ -84,8 +86,8 @@ github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINE
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U=
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/prometheus/client_golang v1.20.2 h1:5ctymQzZlyOON1666svgwn3s6IKWgfbjsejTMiXIyjg=
github.com/prometheus/client_golang v1.20.2/go.mod h1:PIEt8X02hGcP8JWbeHyeZ53Y/jReSnHgO035n//V5WE=
github.com/prometheus/client_golang v1.20.4 h1:Tgh3Yr67PaOv/uTqloMsCEdeuFTatm5zIq5+qNN23vI=
github.com/prometheus/client_golang v1.20.4/go.mod h1:PIEt8X02hGcP8JWbeHyeZ53Y/jReSnHgO035n//V5WE=
github.com/prometheus/client_model v0.6.1 h1:ZKSh/rekM+n3CeS952MLRAdFwIKqeY8b62p8ais2e9E=
github.com/prometheus/client_model v0.6.1/go.mod h1:OrxVMOVHjw3lKMa8+x6HeMGkHMQyHDk9E3jmP2AmGiY=
github.com/prometheus/common v0.55.0 h1:KEi6DK7lXW/m7Ig5i47x0vRzuBsHuvJdi5ee6Y3G1dc=
Expand Down Expand Up @@ -130,8 +132,8 @@ golang.org/x/net v0.0.0-20200226121028-0de0cce0169b/go.mod h1:z5CRVTTTmAJ677TzLL
golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU=
golang.org/x/net v0.33.0 h1:74SYHlV8BIgHIFC/LrYkOGIwL19eTYXQ5wc6TBuO36I=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
golang.org/x/oauth2 v0.21.0 h1:tsimM75w1tF/uws5rbeHzIWxEqElMehnc+iW793zsZs=
golang.org/x/oauth2 v0.21.0/go.mod h1:XYTD2NtWslqkgxebSiOHnXEap4TF09sJSc7H1sXbhtI=
golang.org/x/oauth2 v0.23.0 h1:PbgcYx2W7i4LvjJWEbf0ngHV6qJYr86PkAV3bXdLEbs=
golang.org/x/oauth2 v0.23.0/go.mod h1:XYTD2NtWslqkgxebSiOHnXEap4TF09sJSc7H1sXbhtI=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
Expand Down Expand Up @@ -188,18 +190,20 @@ k8s.io/client-go v0.31.3 h1:CAlZuM+PH2cm+86LOBemaJI/lQ5linJ6UFxKX/SoG+4=
k8s.io/client-go v0.31.3/go.mod h1:2CgjPUTpv3fE5dNygAr2NcM8nhHzXvxB8KL5gYc3kJs=
k8s.io/code-generator v0.31.3 h1:Pj0fYOBms+ZrsulLi4DMsCEx1jG8fWKRLy44onHsLBI=
k8s.io/code-generator v0.31.3/go.mod h1:/umCIlT84g1+Yu5ZXtP1KGSRTnGiIzzX5AzUAxsNlts=
k8s.io/gengo/v2 v2.0.0-20240228010128-51d4e06bde70 h1:NGrVE502P0s0/1hudf8zjgwki1X/TByhmAoILTarmzo=
k8s.io/gengo/v2 v2.0.0-20240228010128-51d4e06bde70/go.mod h1:VH3AT8AaQOqiGjMF9p0/IM1Dj+82ZwjfxUP1IxaHE+8=
k8s.io/gengo/v2 v2.0.0-20240826214909-a7b603a56eb7 h1:cErOOTkQ3JW19o4lo91fFurouhP8NcoBvb7CkvhZZpk=
k8s.io/gengo/v2 v2.0.0-20240826214909-a7b603a56eb7/go.mod h1:EJykeLsmFC60UQbYJezXkEsG2FLrt0GPNkU5iK5GWxU=
k8s.io/klog/v2 v2.130.1 h1:n9Xl7H1Xvksem4KFG4PYbdQCQxqc/tTUyrgXaOhHSzk=
k8s.io/klog/v2 v2.130.1/go.mod h1:3Jpz1GvMt720eyJH1ckRHK1EDfpxISzJ7I9OYgaDtPE=
k8s.io/kube-aggregator v0.31.2 h1:Uw1zUP2D/4wiSjKWVVzSOcCGLuW/+IdRwjjC0FJooYU=
k8s.io/kube-aggregator v0.31.2/go.mod h1:41/VIXH+/Qcg9ERNAY6bRF/WQR6xL1wFgYagdHac1X4=
k8s.io/kube-openapi v0.0.0-20240430033511-f0e62f92d13f h1:0LQagt0gDpKqvIkAMPaRGcXawNMouPECM1+F9BVxEaM=
k8s.io/kube-openapi v0.0.0-20240430033511-f0e62f92d13f/go.mod h1:S9tOR0FxgyusSNR+MboCuiDpVWkAifZvaYI1Q2ubgro=
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8 h1:pUdcCO1Lk/tbT5ztQWOBi5HBgbBP1J8+AsQnQCKsi8A=
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8/go.mod h1:OLgZIPagt7ERELqWJFomSt595RzquPNLL48iOWgYOg0=
k8s.io/kube-openapi v0.0.0-20240903163716-9e1beecbcb38 h1:1dWzkmJrrprYvjGwh9kEUxmcUV/CtNU8QM7h1FLWQOo=
k8s.io/kube-openapi v0.0.0-20240903163716-9e1beecbcb38/go.mod h1:coRQXBK9NxO98XUv3ZD6AK3xzHCxV6+b7lrquKwaKzA=
k8s.io/utils v0.0.0-20240921022957-49e7df575cb6 h1:MDF6h2H/h4tbzmtIKTuctcwZmY0tY9mD9fNT47QO6HI=
k8s.io/utils v0.0.0-20240921022957-49e7df575cb6/go.mod h1:OLgZIPagt7ERELqWJFomSt595RzquPNLL48iOWgYOg0=
sigs.k8s.io/controller-runtime v0.19.1 h1:Son+Q40+Be3QWb+niBXAg2vFiYWolDjjRfO8hn/cxOk=
sigs.k8s.io/controller-runtime v0.19.1/go.mod h1:iRmWllt8IlaLjvTTDLhRBXIEtkCK6hwVBJJsYS9Ajf4=
sigs.k8s.io/gateway-api v1.1.0 h1:DsLDXCi6jR+Xz8/xd0Z1PYl2Pn0TyaFMOPPZIj4inDM=
sigs.k8s.io/gateway-api v1.1.0/go.mod h1:ZH4lHrL2sDi0FHZ9jjneb8kKnGzFWyrTya35sWUTrRs=
sigs.k8s.io/jobset v0.5.2 h1:276q5Pi/ErLYj+GQ0ydEXR6tx3LwBhEzHLQv+k8bYF4=
sigs.k8s.io/jobset v0.5.2/go.mod h1:Vg99rj/6OoGvy1uvywGEHOcVLCWWJYkJtisKqdWzcFw=
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd h1:EDPBXCAspyGV4jQlpZSudPeMmr1bNJefnuqLsRAsHZo=
Expand Down
22 changes: 22 additions & 0 deletions manifests/v2/overlays/trainer-cert-manager/certificate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: training-operator-webhook-cert
namespace: kubeflow
spec:
isCA: true
secretName: training-operator-webhook-cert
issuerRef:
kind: Issuer
name: trainer-cert-issuer
commonName: ${TRAINER_SERVICE_NAME}.${TRAINER_NAMESPACE}.svc
dnsNames:
- ${TRAINER_SERVICE_NAME}.${TRAINER_NAMESPACE}.svc
- ${TRAINER_SERVICE_NAME}.${TRAINER_NAMESPACE}.svc.cluster.local
---
apiVersion: cert-manager.io/v1
kind: issuer
metadata:
name: trainer-cert-issuer
spec:
selfSigned: {}
17 changes: 17 additions & 0 deletions manifests/v2/overlays/trainer-cert-manager/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
varReference:
- path: spec/commanName
kind: Certificate
- path: spec/dnsNames
kind: Certificate
- path: spec/issuerRef/name
kind: Certificate
- path: metadata/annotations
kind: validatingwebhookconfiguration
nameReference:
- kind: issuer
group: cert-manager.io
fieldSpecs:
- kind: Certificate
group: cert-manager.io
path: spec/issuerRef/name
50 changes: 50 additions & 0 deletions manifests/v2/overlays/trainer-cert-manager/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow-system
resources:
- ../../base/crds/
- ../../base/manager/
- ../../base/webhook/
- ../../base/rbac/
- ../../base/runtimes/
- certificate.yaml

images:
- name: kubeflow/training-operator-v2
newName: kubeflow/training-operator-v2
newTag: latest

patchesStrategicMerge:
- patch.yaml

vars:
- name: TRAINER_NAMESPACE
objref:
name: trainer-controller-manager
kind: Service
apiVersion: v2alpha1
fieldref:
fieldpath: metadata.namespace
- name: TRAINER_SERVICE_NAME
objref:
name: trainer-controller-manager
kind: Service
apiVersion: v2alpha1
fieldref:
fieldpath: metadata.name
- name: TRAINER_CERT_NAME
objref:
name: trainer-webhook-cert
kind: Certificate
apiVersion: cert-manager.io/v1
fieldref:
fieldpath: metadata.name

configurations:
- config.yaml

secretGenerator:
- name: training-operator-v2-webhook-cert
namespace: kubeflow-system
options:
disableNameSuffixHash: true
7 changes: 7 additions & 0 deletions manifests/v2/overlays/trainer-cert-manager/patch.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
apiVersion: admissionregistration.k8s.io/v1
kind: validatingwebhookconfiguration
metadata:
name: trainer.kubeflow.org
annotations:
cert-manager.io/inject-ca-from: $(TRAINER_NAMESPACE)/$(TRAINER_CERT_NAME)