-
Notifications
You must be signed in to change notification settings - Fork 427
Make kubelet restarting as a windows service #1752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d820d52
to
ec1e27d
Compare
@laozc Can you please take at view of this PR? |
Do we have a issue related to this PR? Also can you please elaborate the description. Looking at trying to understand whats the core issue thats being addressed here. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the service need to deleted and restarted in the script?
# limitations under the License. | ||
|
||
# From https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/kubeadm/scripts/PrepareNode.ps1 | ||
# Does not support kubeadm KUBELET_KUBEADM_ARGS which is used by Cluster API to pass extra user args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove the comments which do not apply anymore
for ($i = 0; $i -lt 10; $i++) { | ||
$service = Get-Service -Name kubelet -ErrorAction SilentlyContinue | ||
if ($null -eq $service) { | ||
Write-Host "kbuelet service deleted successfully, restarting" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write-Host "kbuelet service deleted successfully, restarting" | |
Write-Host "kubelet service deleted successfully, restarting" |
} | ||
|
||
|
||
Write-Host "kbuelet service failed to restart." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write-Host "kbuelet service failed to restart." | |
Write-Host "kubelet service failed to restart." |
# Used by sc.exe to create the service | ||
$KubeletCommandLine = "`"" + "\`"" + "$env:SYSTEMDRIVE\k\kube-log-runner.exe" + "\`" " + "--log-file=/var/log/kubelet/kubelet.log " + "$env:SYSTEMDRIVE\k\kubelet.exe " + $KubeletArgListStr + "`"" | ||
|
||
# Write-Output $kubeletCommandLine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Write-Output $kubeletCommandLine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer to keep it here as the parameters (kubeletcommdnline) send to sc.exe tend to be tricky, and we can use this to check that kind of commandline is used to start the kubelet service, when we need to do some debugging
I remembered that we had some discussions at that time, but maybe it is landed in being removed from the KEP. The idea is to utilize the kube-log-runner (from the k8s repo) to redirect the log to stdout/stderr |
This is the solution I have after a few try of other options. let me try to explain a bit here so that more people may have a better idea on what I'm trying to achieve here. To support features such as windows graceful shutdown, we will need to make kubelet (itself) running as a windows service. this means we will have to remove the dependency on nssm which will hijack the call SCM made to the running executable. However, the benefit of using nssm is that it can load the envs/config changes dynamically so that without updating the service itself, through the control of config files, the service will run differently. This was used in the CAPZ set up for kublet to dynamically adopt the changes passed from kubeadm. With this PR, we still use nssm to start kubelet in the beginning to apply dynamic changes made through kubeadm. but we will stop/delete the service and re-install/restart the kubelet with the same parameters through SCM, in this way, the kubelet itself will be running as windows service. that's the reason we will need to stop and delete the original service and re-install a new one because the service will be different (except the same service name of "kubelet") Ideally, we should totally remove the dependency of nssm, maybe this is still the future direction, but I have tried quite a few times and had difficulties to make kubelet (started from SC) fit well with the cloud-init and join the CAPZ cluster. I believe it is achievable, but it probably will make things trickier so maybe this could be a future direction that we can target with. |
- name: Create file to restart kubelet as a windows service | ||
ansible.windows.win_template: | ||
src: templates/RestartKubelet.ps1 | ||
dest: "{{ kubernetes_install_path }}\\RestartKubelet.ps1" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you modify the default install to get sc?
"windows_service_manager": "nssm", |
We should also update the document to say that Service manager now supports kubeadm.
https://image-builder.sigs.k8s.io/capi/windows/windows#service-manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, we still use nssm, I might need to revert the change in sc.yaml file as it has caught a bit confusion. as replied in the above comments. totally independent on nssm could be a long term target but not achieved in this PR yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this section in restartkubelet
do the same thing when it is installed as a service via Service manager? Otherwise I don't think it would work?
$FileContent = Get-Content -Path "$env:SYSTEMDRIVE/var/lib/kubelet/kubeadm-flags.env"
$kubeAdmArgs = $FileContent.TrimStart('KUBELET_KUBEADM_ARGS=').Trim('"')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, startkubelet.ps1 and restartkubelet.ps1are essentially the same, expect some special escape required by sc.exe.
In these two ps1 files, both of them will read parameters from env file and use them to construct commandline parameters for kbuelet executable. The different is that. nssm will register the powershell startkubelet.ps1 itself as windows service, which mean during boot time, it will read the env file and reload the parameters. This unfortunately is not the case for sc.exe, which mean after we register the kubelet.exe as a windows service during image building, we do not have the env yet and will miss the correpsonding parameters for kubelet, but this defect kubelet will be registered as windos service and failed the node to join the cluster when booting with this image.
I had also tried to manipulate kublet to read from config files such as (kubelet.json etc) to make it somewhat "dynamic", but looks that not all kubelet commandline parameter will have equivalent support in the json/yaml config file.
So the idea here is: use nssm to start kubelet and join the cluster, as what we have done before. and after this, use the same parameters to restart the kubelet.exe as a windows service through sc.exe, this looks to be able to work stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately is not the case for sc.exe, which mean after we register the kubelet.exe as a windows service during image building, we do not have the env yet and will miss the correpsonding parameters for kubelet, but this defect kubelet will be registered as windos service and failed the node to join the cluster when booting with this image.
I believe this is why I ended up using NSSM in the end. This is definitely not ideal but I see why and how this is working now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can definitely be improved, although I would expect a good amount of efforts probably needed.
In this PR, basically, we just added one more script, which will be pretty safe. people can still choose to use it or stick with the old way if they want.
--enable-debugging-handlers --cgroups-per-qos=false --enforce-node-allocatable=`"`" | ||
--resolv-conf=`"`" --log-dir={{ systemdrive.stdout | trim }}/var/log/kubelet | ||
--logtostderr=false {{ additional_kubelet_args if additional_kubelet_args is defined }} | ||
- name: Create file to restart kubelet as a windows service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add this file in both sc.yml and nssm.yml? Only one is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert the change in sc.yaml as this is not the one we will use
New changes are detected. LGTM label has been removed. |
Added more explanation/clarification |
return | ||
} | ||
else { | ||
Write-Host "Waiting for service to be fully deleted... (attempt $($i + 1)/$Retries)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable $Retries
is undefined here.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# Need to keey sync with StartKubelet.ps1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep sync with
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -10,7 +10,7 @@ | |||
"prepull": "true", | |||
"runtime": "containerd", | |||
"ssh_source_url": "", | |||
"windows_service_manager": "nssm", | |||
"windows_service_manager": "windows_service", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe I should make it applied to azure only in the begining?
Change description
To support features such as windows graceful shutdown, we will need to make kubelet (itself) running as a windows service. this means we will have to remove the dependency on nssm which will hijack the call SCM made to the running executable. However, the benefit of using nssm is that it can load the envs/config changes dynamically so that without updating the service itself, through the control of config files, the service will run differently. This was used in the CAPZ set up for kublet to dynamically adopt the changes passed from kubeadm. With this PR, we still use nssm to start kubelet in the beginning to apply dynamic changes made through kubeadm. but we will stop/delete the service and re-install/restart the kubelet with the same parameters through SCM, in this way, the kubelet itself will be running as windows service. that's the reason we will need to stop and delete the original service and re-install a new one because the service will be different (except the same service name of "kubelet")
Ideally, we should totally remove the dependency of nssm, maybe this is still the future direction, but I have tried quite a few times and had difficulties to make kubelet (started from SC) fit well with the cloud-init and join the CAPZ cluster. I believe it is achievable, but it probably will make things trickier so maybe this could be a future direction that we can target with.
Related issues
Additional context