-
Notifications
You must be signed in to change notification settings - Fork 77
feat: Make readiness check of pods respect snapshot loading #401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@romange Tagging you here just in case this slips under the radar 😄 Not sure who else should be the contact point for this. |
|
@NegatioN does this issue affect your deployment as well? It is possible to inject readiness screipt by mounting an external read-only volume that contains only that script. I think that extending helm charts with mounting that script instead of changing the containter itself is a viable option. |
|
@romange I'm not sure I understood everything you said, so feel free to point out any misunderstandings.
It doesn't affect the
I was under the impression that to interact with the actual pods of Dragonfly that the operator manages, that I would have to manage the Go-code here. Are you saying there's another yaml definition that's better for me to modify? |
|
No, I am not saying that. Some folks use helm-charts to deploy and for them my suggestion could be more relevant , i guess. I am not familiar with inner workings of DF operator, I will check with the team and reply. |
|
I am sorry, I was totally confused. I missed entirely that this PR is in the dragonfly-operator repo :) Ignore my comment. I will ask someone to review. |
|
it's backward incompatible I believe, have you tried updating existing datastore with newer version of operator? will it result in update of existing statefulset? |
I have not tried that. (And when you say datastore here, I'm unsure what you mean. This is the first time I've written anything for a Helm app in years and years) I'm pretty sure it's not backwards compatible though. The pods spawned by this Operator depends on there existing a Edit: Also it seems like the tests are failing because the |
|
Decided on adding a configmap just for the tests in code. The |
|
What would be the next steps on this @moredure @vyavdoshenko ? |
LGTM |
Code looks fine, but I am concerned about the operator version being updated over existing crd will result in issuing statefulset update, some users might not expect it. @Abhra303 did we have any conventions over backward compatibility in this terms. Probably worth flagging this change as optional, or skip comparing it inside |
|
@NegatioN sorry for replying late, the script itself looks good. But its not gonna work in df operator. Your code creates a configmap only when you're using the helm chart. This code has some critical downsides -
You either create a configmap when creating the dragonfly resources (by the operator) or you can add a new optional field in the CRD "customHealthCheckConfigMap" (or something similar) to let users configure the configmap. E.g. the steps to have a custom health check script:
If the field is nil, go with the default health script. |
|
So existing code looks good. You just need to add an optional struct field to the operator CRD (see other merged PRs e.g.) which is by default nil. |
|
|
||
| It("Should create configmap successfully", func() { | ||
| err := k8sClient.Create(ctx, &corev1.ConfigMap{ | ||
| TypeMeta: metav1.TypeMeta{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to mention TypeMeta explicitly as it is already a configmap struct.
|
Hi @Abhra303 , thanks for the great feedback. I can definitely try to modify the PR to fit one of your suggestions. If that was desirable, we wouldn't need to manage any new resources, and there wouldn't be any breaking changes. We would however lose the configurability part of the "provide your own readiness-check"-solution. |
|
Actually, upon further investigation, the codepath I alluded to earlier which my coworker made me aware of, already does wait for snapshots. I think this entire PR and related Github Issues are based on a misunderstanding that there is downtime, when there is not (or might have been in older versions of the operator?). As a consequence, I feel like closing this PR is the right move. If the end goal is to make a configurable readiness check, maybe that should be done in another PR. Possible use-cases I see of that, are saving large snapshots which might make a node unresponsive, and redirecting traffic based on that 🤷 Sorry for wasting the time of people who reviewed this PR. |
Added a modified
healthcheck.shasreadiness.shthat the operator can use to determine if snapshotting has completed.Confirmed that
helm package $repo->helm install dragonfly-operator->kubectl apply -f config/samples/v1alpha1_dragonfly.yaml(With updated docker image) leads us to a state where aConfigMaphosts ourreadiness.shscript, accessible in the pod.Not done: Bumping charts and preparing for release of this repo. Maybe it's better if a core dev does that (?) 🤷
Related: dragonflydb/dragonfly#5921 and #397
Did not run the precommit-hook for this, because I would have to reclone the repo and use Git. Will do it when/if PR gets approved.