-
Notifications
You must be signed in to change notification settings - Fork 433
feat(controllers): optionally do not cache resources created without CommonLabels #1818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
389e8d6
to
e4ed220
Compare
e4ed220
to
4e03b7a
Compare
4848451
to
9a9bb49
Compare
d1f1f0b
to
78841fb
Compare
I marked this ready, but I forgot it is blocked by #1833 |
81e6afc
to
fc47228
Compare
fc47228
to
a55e286
Compare
903567b
to
43b2a3e
Compare
Rebased on master and worked it into the additions from #1832. |
20389e7
to
e945cc3
Compare
I've added an experimental feature toggle in the form of the This ended up bit complex to validate, but I think I managed to cover most of it. Ensure that the PR works and can be enabled
Continue with testing it does not break the watch label selector(sharding)
|
a9139e2
to
10d3e30
Compare
main.go
Outdated
setupLog.Error(err, fmt.Sprintf("unable to parse %s", watchLabelSelectorsEnvVar)) | ||
os.Exit(1) //nolint | ||
// Allow users to enable the above cache limits before a full rollout | ||
if enableCacheLabelLimits == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Name and idea behind this variable hints that it must be a boolean-like value, thus we should not do empty string comparisons.
- The code here states that it would enable caching limits whereas it actually lifts those limits meaning everything will be cached.
- I think a simpler way to implement all of this would be to do something like this:
cacheOptions := cache.Options{
ByObject: map[client.Object]cache.ByObject{
&v1.Deployment{}: cacheByObject,
&corev1.Service{}: cacheByObject,
&corev1.ServiceAccount{}: cacheByObject,
&networkingv1.Ingress{}: cacheByObject,
&corev1.PersistentVolumeClaim{}: cacheByObject,
&corev1.ConfigMap{}: cacheByObject,
&corev1.Secret{}: cacheByObject,
},
}
// TODO: Curious what would happen in vanilla k8s if we don't have this check for OpenShift
if isOpenShift {
cacheOptions.ByObject[&routev1.Route{}] = cacheByObject
}
// I like this name more, it's self-explanatory. By the way, I don't think we have to prefix environment variables with EXPERIMENTAL_, we just need to clarify that in docs / helm / code comments
if cacheOnlyLabeledResources == "true" {
controllerOptions.Cache = cacheOptions
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more point would be:
Should secret / configMap caching be enabled, do we want it to be only for labeled resources? Should we have a separate configuration option that tweaks that behaviour or it's better to not use cacheByObject
for them at all? Or should we even go as far as prometheus-operator which rather filters out by secret type. I'm not sure what's the best way to go, just highlighting a concern.
@Baarsgaard I've just modified some of the comments that were added a few minutes ago, so, please, refer to the latest versions. Thx! |
17c7180
to
2df48b8
Compare
feat: Toggle caching of ConfigMaps and Secrets with CommonLabels
…on in Helm values
2df48b8
to
330af1f
Compare
Refactored this to use an env var with different levels. Also moved the code around a bit to make the opt-in nature more apparent and easier to review. @weisdd @Baarsgaard if you have a minute, I'd appreciate a re-review as I'm now obviously biased that this is good to merge 😅 |
330af1f
to
06da208
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming and code structure are clear now, everything looks good to me :)
I read a blog post on operator memory pitfalls mentioning
Owns()
being a footgun, which is used in thegrafana_reconciler
SetupWithManager.TLDR: By declaring
Owns()
or usingGet/List
you tell the thecontroller-runtime
to watch and cache all instances of theclient.Object
, which on large clusters could result in a lot ofConfigMaps
,Secrets
andDeployments
in the Grafana-Operators case.I expected this to be a problem due to the pprof profiles uploaded in #1622 which was verified by following the steps outlined below.
The post linked to an Operator SDK trick for configuring the
client.Object
cache with labels.I remembered that #1661 added common labels to resources created by the operator to reduce memory consumption.
Verifying cache issues:
kubectl port-forward -n grafana-operator-system deploy/grafana-operator-controller-manager-v5 8888 & go tool pprof -top -nodecount 20 http://localhost:8888/debug/pprof/heap
fallocate -l 393216 large_file
ConfigMaps
Current progress
Watching and caching has been limited to resources controlled by the operator of Kind:
if IsOpenShift
This is done with the existing
CommonLabels
selector introduced in #1661:app.kubernetes.io/managed-by: "grafana-operator"
Memory consumption in an empty kind cluster after ~1 minute1:
ConfigMaps
andSecrets
CacheAn option to cache
ConfigMaps
andSecrets
has been addedFootnotes
Heap will increase over time as the operator stabilizes. ↩
The reduction is by no means representative of real deployments.
For clusters mixing the Grafana-Operator and other workloads in cluster scoped mode, the reduction is likely significantly higher.
Even if the Grafana-Operator was the only Deployment in a cluster, this should still reduce memory as it won't cache itself 😉 ↩