Summary:
During a mixed-version upgrade roachtest (#169310), a SELECT value FROM system.settings WHERE name = 'version' query returned zero rows despite the row existing. A debug.zip captured 22 seconds later confirmed the row was present with value "25.4". The same query had succeeded ~4 minutes earlier on the same node.
This was observed on a resource-constrained n1cpu4 single-node cluster running 114+ auto-stats jobs during post-upgrade stabilization (v25.3→v25.4 upgrade had just completed through 14 migration steps).
Findings:
- The version row was written at 06:35:13 and successfully queried at 06:35:43.
- At 06:39:35, the same query returned zero rows (no error, just empty result set).
debug.zip at 06:39:57 confirmed the row existed.
- The
ClusterVersionFromKV function in the test harness did not retry on empty results, only on query errors.
- A workaround retry has been added to
ClusterVersionFromKV (see linked PR).
Code References:
Hypotheses:
- Descriptor lease staleness for
system.settings after migration steps caused the scan to use stale schema metadata, returning empty results.
- Connection pool contention or Go runtime scheduling delay on the resource-constrained VM.
- A descriptor refresh race during query execution.
Next Steps:
Epic: none
Jira issue: CRDB-63717
Summary:
During a mixed-version upgrade roachtest (#169310), a
SELECT value FROM system.settings WHERE name = 'version'query returned zero rows despite the row existing. Adebug.zipcaptured 22 seconds later confirmed the row was present with value"25.4". The same query had succeeded ~4 minutes earlier on the same node.This was observed on a resource-constrained n1cpu4 single-node cluster running 114+ auto-stats jobs during post-upgrade stabilization (v25.3→v25.4 upgrade had just completed through 14 migration steps).
Findings:
debug.zipat 06:39:57 confirmed the row existed.ClusterVersionFromKVfunction in the test harness did not retry on empty results, only on query errors.ClusterVersionFromKV(see linked PR).Code References:
ClusterVersionFromKVHypotheses:
system.settingsafter migration steps caused the scan to use stale schema metadata, returning empty results.Next Steps:
system.settingsdescriptor changes during migrations could trigger this.Epic: none
Jira issue: CRDB-63717