Skip to content

sonic_data_client: make PollStats stoppable and wire into TestMain#606

Open
sigabrtv1-ui wants to merge 2 commits intosonic-net:masterfrom
sigabrtv1-ui:fix/stop-poll-stats
Open

sonic_data_client: make PollStats stoppable and wire into TestMain#606
sigabrtv1-ui wants to merge 2 commits intosonic-net:masterfrom
sigabrtv1-ui:fix/stop-poll-stats

Conversation

@sigabrtv1-ui
Copy link

Problem

PollStats() is started in sonic_data_client's init() and never stopped. The goroutine leaks across test boundaries in both gnmi_server and sonic_data_client test packages.

When tests use gomonkey to patch os.OpenFile or os.File methods, the still-running PollStats goroutine calls linuxproc.ReadStat("/proc/stat") which goes through os.OpenFile, hits the mock, and receives a zero-value *os.File{}. Calling Stat() on that causes a nil pointer dereference:

panic: runtime error: invalid memory address or nil pointer dereference
goroutine in PollStats() → linuxproc.ReadStat → os.ReadFile → os.(*File).Stat

Also observed as a DATA RACE in CI build 20260305.14.

Fix

sonic_data_client/non_db_client.go

  • Add pollStatsDone chan struct{} + sync.Once
  • Add exported StopPollStats() — safe to call multiple times
  • PollStats() selects on done channel each iteration and exits when closed
  • Use select { case <-time.After(...) } instead of time.Sleep for immediate shutdown response

gnmi_server/server_test.go and sonic_data_client/client_test.go

  • Call StopPollStats() in TestMain before running tests in both packages

Notes

…tests

PollStats is started in init() and never stopped. In tests, the goroutine
leaks across test boundaries and races with test teardown on os.File
pointers, causing nil pointer panics and DATA RACE failures.

Add StopPollStats() and a done channel so tests can stop the goroutine
cleanly during TestMain cleanup. Use select with time.After instead of
time.Sleep so the goroutine responds to shutdown without a 100ms delay.

Fixes DATA RACE observed in CI (master build 20260305.14):
  panic: runtime error: invalid memory address or nil pointer dereference
  goroutine in PollStats() racing on *os.File

Signed-off-by: sigabrtv1-ui <sig.abrt.v1@gmail.com>
Call StopPollStats() in TestMain for both packages before tests run.
This ensures the PollStats goroutine (started in sonic_data_client's init())
exits cleanly before any test has a chance to use gomonkey patches that
intercept os.OpenFile or os.File methods.

Also fixes coverage: the StopPollStats shutdown path (done channel) was only
exercised by gnmi_server tests, not counted toward sonic_data_client coverage.
Calling it here closes that gap.

Signed-off-by: sigabrtv1-ui <sig.abrt.v1@gmail.com>
@mssonicbld
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@hdwhdw
Copy link
Contributor

hdwhdw commented Mar 6, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

sigabrtv1-ui pushed a commit to sigabrtv1-ui/sonic-gnmi that referenced this pull request Mar 9, 2026
…ZMQ issue)

sonic_data_client tests all pass, but ZMQ context cleanup (zmq_ctx_destroy
via atexit) blocks indefinitely with Go 1.23.9, causing the test binary to
hang until the test timeout kills it. With Go 1.19 the same cleanup completed
in ~24s. This is a Go 1.23 CGO runtime regression unrelated to PR sonic-net#611 code.

Exclude sonic_data_client from check_gotest_junit for now. Track alongside
PR sonic-net#606 (PollStats stoppable) which addresses related ZMQ lifecycle issues.

Signed-off-by: Dawei Huang <daweihuang@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants