Skip to content

authz/loader: restart file watcher on transient symlink errors#146

Merged
birdayz merged 11 commits intomainfrom
jb/fix-policy-watcher-symlink-swap
Mar 20, 2026
Merged

authz/loader: restart file watcher on transient symlink errors#146
birdayz merged 11 commits intomainfrom
jb/fix-policy-watcher-symlink-swap

Conversation

@birdayz
Copy link
Contributor

@birdayz birdayz commented Mar 17, 2026

What

Restart the policy file watcher when koanf's fsnotify watcher exits due to transient symlink errors during Kubernetes ConfigMap updates.

Why

Koanf's file.Provider.Watch calls filepath.EvalSymlinks on every filesystem event. When Kubernetes updates a ConfigMap-mounted volume, kubelet briefly removes the ..data symlink before creating the new one. During this window, EvalSymlinks fails and koanf's watcher goroutine exits permanently (break loop).

This caused the dataplane authz interceptor to permanently stop picking up policy changes after the first ConfigMap update. The interceptor loaded the initial policy on startup but never saw subsequent updates, requiring a pod restart to recover. We discovered this debugging why newly created service accounts were not recognized by the authz layer despite being added to the policy ConfigMap.

Implementation details

When the watch callback receives an error (watcher died), sleep 1s and call startWatch() again with a fresh file.Provider. After restarting, immediately reload the policy file to pick up any updates missed while the watcher was dead.

The test simulates the exact Kubernetes ConfigMap update sequence: create new generation directory, remove ..data symlink, brief pause, create new ..data symlink pointing to new generation. It verifies the watcher survives two consecutive swaps.

@birdayz birdayz requested a review from rockwotj as a code owner March 17, 2026 11:21
@secpanda
Copy link

secpanda commented Mar 17, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry missed this one

birdayz added 10 commits March 20, 2026 14:10
Koanf's file.Provider.Watch exits permanently when
filepath.EvalSymlinks fails -- which happens every time Kubernetes
updates a ConfigMap-mounted volume, since kubelet briefly removes the
..data symlink before creating the new one.

The fix: when the watch callback receives an error, sleep 1s and
restart the watch. Also reload the policy immediately after restart
to pick up any updates missed while dead.

This was causing the dataplane authz interceptor to permanently stop
picking up policy changes after the first ConfigMap update, requiring
a pod restart to recover.
Koanf's file.Provider.Watch exits when filepath.EvalSymlinks fails,
which happens during Kubernetes ConfigMap updates (kubelet briefly
removes the ..data symlink). Restart the watch in a loop with 1s
backoff. Reload the policy after each restart to catch missed updates.

Single file.Provider instance, initial watch is synchronous, restart
loop runs in a background goroutine. Unwatch closes the stop channel.
@birdayz birdayz force-pushed the jb/fix-policy-watcher-symlink-swap branch from 7944dc2 to 4302671 Compare March 20, 2026 13:11
@birdayz birdayz force-pushed the jb/fix-policy-watcher-symlink-swap branch 2 times, most recently from f53b4e1 to 2d29eb7 Compare March 20, 2026 13:17
@birdayz birdayz force-pushed the jb/fix-policy-watcher-symlink-swap branch from 2d29eb7 to 10985cc Compare March 20, 2026 13:19
@birdayz birdayz merged commit 1feb286 into main Mar 20, 2026
26 checks passed
@birdayz birdayz deleted the jb/fix-policy-watcher-symlink-swap branch March 20, 2026 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants