-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeflow install not working #1015
Comments
another issue is the ordering here, the current setup was relying on just brute force applies instead of installing the needed CRD's before creating the desired CR's. Additionally there is an error with the knative dependcy relying a deprecated api which doesn't work with 1.30.5 error: resource mapping not found for name: "webhook" namespace: "knative-serving" from "kubeflow.yaml": no matches for kind "HorizontalPodAutoscaler" in version "autoscaling/v2beta2" so this should move to v2.
ensure CRDs are installed first |
proposed fix
|
Hey Shawn! The issue that comes with using the kubeflow spec as released is they distribute it as a single yaml file that is 600k lines long, and their official install method is to loop over the install until it doesn't fail. This method means that any object that is applied out of order in regards to making things come up on the first pass over the yaml, of which there are numerous examples in the spec, will cause a full re-application of the full 600k lines of yaml, with the number 2 run and all runs after that coming with the additional overhead of a full compare-and-reconcile of each k8s object in the spec. This repeats until there is no error. The changes we have made are purely organizational; the spec is broken into stages, the stages are ordered so that they satisfy dependencies, and we apply them one-by-one with a script that can individually retry the specs. The issue that has happened here is that the script is a bit naive, and it does not assume that the list of files could possibly return the list out of order, which is the event that is occurring in the install job pod. This is easy enough to fix, I am working on it at the moment, essentially some more checking and sorting of the list before application will fix the issue. To be more exact, the files are all prepended with a 2 digit int, with two files that are civo-specific additions being 98 and 99, so that they apply after the kf specs. The OS started returning them before file 00 some time in the last week, and that is what is causing the error on installation of the kf spec. |
Hi Dave! 👋🏼 The missing closing brace from marketplace-installer was a new one for me, so I had just assumed that had to be reworked to avoid the known errors if crd's aren't available yet. I did find the documented install loop you talked about, albeit it looks like that One additional critique of the install was the additional call back out to the git repo even though it should already be pulled down and available to the marketplace-installer binary at that point so should just be able to call the kubeflow.yaml locally. I don't think that was the issue, just more of it seemed unnecessary when the file should already be local. |
it was reported that kubeflow install was not working and I was able to recreate this on k3s 1.30.5 as well.
I believe it has to do with the way the install is currently being handled breaking from the desired pattern as its performing an additional git retrieval operation to pull the manifest down
kubernetes-marketplace/kubeflow/install.sh
Line 3 in c49ad2c
The desired fix would be o implement a supported install method
The text was updated successfully, but these errors were encountered: