Skip to content

Commit 55c877f

Browse files
authored
gpu: add info metric about devices (#2070)
1 parent dcf8503 commit 55c877f

File tree

7 files changed

+299
-15
lines changed

7 files changed

+299
-15
lines changed

.idea/dictionaries/project.xml

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/collector.gpu.md

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -20,24 +20,25 @@ These metrics are available on supported versions of Windows with compatible GPU
2020

2121
### Adapter-level Metrics
2222

23-
| Name | Description | Type | Labels |
24-
|----------------------------------------------|----------------------------------------------------------|-------|--------|
25-
| `windows_gpu_adapter_memory_committed_bytes` | Total committed GPU memory in bytes per physical GPU | gauge | `phys` |
26-
| `windows_gpu_adapter_memory_dedicated_bytes` | Dedicated GPU memory usage in bytes per physical GPU | gauge | `phys` |
27-
| `windows_gpu_adapter_memory_shared_bytes` | Shared GPU memory usage in bytes per physical GPU | gauge | `phys` |
28-
| `windows_gpu_local_adapter_memory_bytes` | Local adapter memory usage in bytes per physical GPU | gauge | `phys` |
29-
| `windows_gpu_non_local_adapter_memory_bytes` | Non-local adapter memory usage in bytes per physical GPU | gauge | `phys` |
23+
| Name | Description | Type | Labels |
24+
|----------------------------------------------|-------------------------------------------------------------------------|-------|--------------------------------------------------------------------------------------|
25+
| `windows_gpu_adapter_memory_committed_bytes` | Total committed GPU memory in bytes per physical GPU | gauge | `phys` |
26+
| `windows_gpu_adapter_memory_dedicated_bytes` | Dedicated GPU memory usage in bytes per physical GPU | gauge | `phys` |
27+
| `windows_gpu_adapter_memory_shared_bytes` | Shared GPU memory usage in bytes per physical GPU | gauge | `phys` |
28+
| `windows_gpu_info` | A metric with a constant '1' value labeled with gpu device information. | gauge | `phys`, `physical_device_object_name`, `hardware_id`, `friendly_name`, `description` |
29+
| `windows_gpu_local_adapter_memory_bytes` | Local adapter memory usage in bytes per physical GPU | gauge | `phys` |
30+
| `windows_gpu_non_local_adapter_memory_bytes` | Non-local adapter memory usage in bytes per physical GPU | gauge | `phys` |
3031

3132
### Per-process Metrics
3233

33-
| Name | Description | Type | Labels |
34-
|----------------------------------------------|-------------------------------------------------|---------|----------------------------------------|
35-
| `windows_gpu_engine_time_seconds` | Total running time of the GPU engine in seconds | counter | `phys`, `eng`, `engtype`, `process_id` |
36-
| `windows_gpu_process_memory_committed_bytes` | Total committed GPU memory in bytes per process | gauge | `phys`,`process_id` |
37-
| `windows_gpu_process_memory_dedicated_bytes` | Dedicated GPU memory usage in bytes per process | gauge | `phys`,`process_id` |
38-
| `windows_gpu_process_memory_local_bytes` | Local GPU memory usage in bytes per process | gauge | `phys`,`process_id` |
39-
| `windows_gpu_process_memory_non_local_bytes` | Non-local GPU memory usage in bytes per process | gauge | `phys`,`process_id` |
40-
| `windows_gpu_process_memory_shared_bytes` | Shared GPU memory usage in bytes per process | gauge | `phys`,`process_id` |
34+
| Name | Description | Type | Labels |
35+
|----------------------------------------------|-------------------------------------------------------------------------|---------|--------------------------------------------------------------------------------------|
36+
| `windows_gpu_engine_time_seconds` | Total running time of the GPU engine in seconds | counter | `phys`, `eng`, `engtype`, `process_id` |
37+
| `windows_gpu_process_memory_committed_bytes` | Total committed GPU memory in bytes per process | gauge | `phys`,`process_id` |
38+
| `windows_gpu_process_memory_dedicated_bytes` | Dedicated GPU memory usage in bytes per process | gauge | `phys`,`process_id` |
39+
| `windows_gpu_process_memory_local_bytes` | Local GPU memory usage in bytes per process | gauge | `phys`,`process_id` |
40+
| `windows_gpu_process_memory_non_local_bytes` | Non-local GPU memory usage in bytes per process | gauge | `phys`,`process_id` |
41+
| `windows_gpu_process_memory_shared_bytes` | Shared GPU memory usage in bytes per process | gauge | `phys`,`process_id` |
4142

4243
## Metric Labels
4344

@@ -50,6 +51,12 @@ These metrics are available on supported versions of Windows with compatible GPU
5051

5152
These are basic queries to help you get started with GPU monitoring on Windows using Prometheus.
5253

54+
**Show GPU information for a specific physical GPU (0):**
55+
56+
```promql
57+
windows_gpu_info{description="NVIDIA GeForce GTX 1070",friendly_name="",hardware_id="PCI\\VEN_10DE&DEV_1B81&SUBSYS_61733842&REV_A1",phys="0",physical_device_object_name="\\Device\\NTPNP_PCI0027"} 1
58+
```
59+
5360
**Show total dedicated GPU memory (in bytes) usage on GPU 0:**
5461

5562
```promql

internal/collector/gpu/gpu.go

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,10 @@ import (
2121
"errors"
2222
"fmt"
2323
"log/slog"
24+
"strconv"
2425

2526
"github.com/alecthomas/kingpin/v2"
27+
"github.com/prometheus-community/windows_exporter/internal/headers/setupapi"
2628
"github.com/prometheus-community/windows_exporter/internal/mi"
2729
"github.com/prometheus-community/windows_exporter/internal/pdh"
2830
"github.com/prometheus-community/windows_exporter/internal/types"
@@ -43,6 +45,7 @@ type Collector struct {
4345
gpuEnginePerfDataCollector *pdh.Collector
4446
gpuEnginePerfDataObject []gpuEnginePerfDataCounterValues
4547

48+
gpuInfo *prometheus.Desc
4649
gpuEngineRunningTime *prometheus.Desc
4750

4851
// GPU Adapter Memory
@@ -109,6 +112,13 @@ func (c *Collector) Close() error {
109112
func (c *Collector) Build(_ *slog.Logger, _ *mi.Session) error {
110113
var err error
111114

115+
c.gpuInfo = prometheus.NewDesc(
116+
prometheus.BuildFQName(types.Namespace, Name, "info"),
117+
"A metric with a constant '1' value labeled with gpu device information.",
118+
[]string{"phys", "physical_device_object_name", "hardware_id", "friendly_name", "description"},
119+
nil,
120+
)
121+
112122
c.gpuEngineRunningTime = prometheus.NewDesc(
113123
prometheus.BuildFQName(types.Namespace, Name, "engine_time_seconds"),
114124
"Total running time of the GPU in seconds.",
@@ -213,6 +223,10 @@ func (c *Collector) Build(_ *slog.Logger, _ *mi.Session) error {
213223
func (c *Collector) Collect(ch chan<- prometheus.Metric) error {
214224
errs := make([]error, 0)
215225

226+
if err := c.collectGpuInfo(ch); err != nil {
227+
errs = append(errs, err)
228+
}
229+
216230
if err := c.collectGpuEngineMetrics(ch); err != nil {
217231
errs = append(errs, err)
218232
}
@@ -236,6 +250,28 @@ func (c *Collector) Collect(ch chan<- prometheus.Metric) error {
236250
return errors.Join(errs...)
237251
}
238252

253+
func (c *Collector) collectGpuInfo(ch chan<- prometheus.Metric) error {
254+
gpus, err := setupapi.GetGPUDevices()
255+
if err != nil {
256+
return fmt.Errorf("failed to get GPU devices: %w", err)
257+
}
258+
259+
for i, gpu := range gpus {
260+
ch <- prometheus.MustNewConstMetric(
261+
c.gpuInfo,
262+
prometheus.GaugeValue,
263+
1.0,
264+
strconv.Itoa(i),
265+
gpu.PhysicalDeviceObjectName,
266+
gpu.HardwareID,
267+
gpu.FriendlyName,
268+
gpu.DeviceDesc,
269+
)
270+
}
271+
272+
return nil
273+
}
274+
239275
func (c *Collector) collectGpuEngineMetrics(ch chan<- prometheus.Metric) error {
240276
// Collect the GPU Engine perf data.
241277
if err := c.gpuEnginePerfDataCollector.Collect(&c.gpuEnginePerfDataObject); err != nil {

internal/headers/setupapi/gpu.go

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
// SPDX-License-Identifier: Apache-2.0
2+
//
3+
// Copyright The Prometheus Authors
4+
// Licensed under the Apache License, Version 2.0 (the "License");
5+
// you may not use this file except in compliance with the License.
6+
// You may obtain a copy of the License at
7+
//
8+
// http://www.apache.org/licenses/LICENSE-2.0
9+
//
10+
// Unless required by applicable law or agreed to in writing, software
11+
// distributed under the License is distributed on an "AS IS" BASIS,
12+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
// See the License for the specific language governing permissions and
14+
// limitations under the License.
15+
16+
//go:build windows
17+
18+
package setupapi
19+
20+
import (
21+
"sync"
22+
"unsafe"
23+
24+
"golang.org/x/sys/windows"
25+
)
26+
27+
//nolint:gochecknoglobals
28+
var GUID_DISPLAY_ADAPTER = sync.OnceValue(func() *windows.GUID {
29+
return &windows.GUID{
30+
Data1: 0x4d36e968,
31+
Data2: 0xe325,
32+
Data3: 0x11ce,
33+
Data4: [8]byte{0xbf, 0xc1, 0x08, 0x00, 0x2b, 0xe1, 0x03, 0x18},
34+
}
35+
})
36+
37+
func GetGPUDevices() ([]GPUDevice, error) {
38+
hDevInfo, _, err := procSetupDiGetClassDevsW.Call(
39+
uintptr(unsafe.Pointer(GUID_DISPLAY_ADAPTER())),
40+
0,
41+
0,
42+
DIGCF_PRESENT,
43+
)
44+
45+
if windows.Handle(hDevInfo) == windows.InvalidHandle {
46+
return nil, err
47+
}
48+
49+
var (
50+
devices []GPUDevice
51+
deviceData SP_DEVINFO_DATA
52+
propertyBuffer [256]uint16
53+
)
54+
55+
deviceData.CbSize = uint32(unsafe.Sizeof(deviceData))
56+
57+
for i := 0; ; i++ {
58+
ret, _, _ := procSetupDiEnumDeviceInfo.Call(hDevInfo, uintptr(i), uintptr(unsafe.Pointer(&deviceData)))
59+
if ret == 0 {
60+
break // No more devices
61+
}
62+
63+
ret, _, _ = procSetupDiGetDeviceRegistryPropertyW.Call(
64+
hDevInfo,
65+
uintptr(unsafe.Pointer(&deviceData)),
66+
uintptr(SPDRP_DEVICEDESC),
67+
0,
68+
uintptr(unsafe.Pointer(&propertyBuffer[0])),
69+
uintptr(len(propertyBuffer)*2),
70+
0,
71+
)
72+
73+
gpuDevice := GPUDevice{}
74+
75+
if ret == 0 {
76+
gpuDevice.DeviceDesc = ""
77+
} else {
78+
gpuDevice.DeviceDesc = windows.UTF16ToString(propertyBuffer[:])
79+
}
80+
81+
ret, _, _ = procSetupDiGetDeviceRegistryPropertyW.Call(
82+
hDevInfo,
83+
uintptr(unsafe.Pointer(&deviceData)),
84+
uintptr(SPDRP_FRIENDLYNAME),
85+
0,
86+
uintptr(unsafe.Pointer(&propertyBuffer[0])),
87+
uintptr(len(propertyBuffer)*2),
88+
0,
89+
)
90+
91+
if ret == 0 {
92+
gpuDevice.FriendlyName = ""
93+
} else {
94+
gpuDevice.FriendlyName = windows.UTF16ToString(propertyBuffer[:])
95+
}
96+
97+
ret, _, _ = procSetupDiGetDeviceRegistryPropertyW.Call(
98+
hDevInfo,
99+
uintptr(unsafe.Pointer(&deviceData)),
100+
uintptr(SPDRP_HARDWAREID),
101+
0,
102+
uintptr(unsafe.Pointer(&propertyBuffer[0])),
103+
uintptr(len(propertyBuffer)*2),
104+
0,
105+
)
106+
107+
if ret == 0 {
108+
gpuDevice.HardwareID = "unknown"
109+
} else {
110+
gpuDevice.HardwareID = windows.UTF16ToString(propertyBuffer[:])
111+
}
112+
113+
ret, _, _ = procSetupDiGetDeviceRegistryPropertyW.Call(
114+
hDevInfo,
115+
uintptr(unsafe.Pointer(&deviceData)),
116+
uintptr(SPDRP_PHYSICAL_DEVICE_OBJECT_NAME),
117+
0,
118+
uintptr(unsafe.Pointer(&propertyBuffer[0])),
119+
uintptr(len(propertyBuffer)*2),
120+
0,
121+
)
122+
123+
if ret == 0 {
124+
gpuDevice.PhysicalDeviceObjectName = "unknown"
125+
} else {
126+
gpuDevice.PhysicalDeviceObjectName = windows.UTF16ToString(propertyBuffer[:])
127+
}
128+
129+
devices = append(devices, gpuDevice)
130+
}
131+
132+
_, _, _ = procSetupDiDestroyDeviceInfoList.Call(hDevInfo)
133+
134+
return devices, nil
135+
}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
// SPDX-License-Identifier: Apache-2.0
2+
//
3+
// Copyright The Prometheus Authors
4+
// Licensed under the Apache License, Version 2.0 (the "License");
5+
// you may not use this file except in compliance with the License.
6+
// You may obtain a copy of the License at
7+
//
8+
// http://www.apache.org/licenses/LICENSE-2.0
9+
//
10+
// Unless required by applicable law or agreed to in writing, software
11+
// distributed under the License is distributed on an "AS IS" BASIS,
12+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
// See the License for the specific language governing permissions and
14+
// limitations under the License.
15+
16+
//go:build windows
17+
18+
package setupapi_test
19+
20+
import (
21+
"testing"
22+
23+
"github.com/prometheus-community/windows_exporter/internal/headers/setupapi"
24+
"github.com/stretchr/testify/require"
25+
)
26+
27+
func TestGetGPUDevices(t *testing.T) {
28+
devices, err := setupapi.GetGPUDevices()
29+
require.NoError(t, err, "Failed to get GPU devices")
30+
31+
require.NotNil(t, devices)
32+
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
// SPDX-License-Identifier: Apache-2.0
2+
//
3+
// Copyright The Prometheus Authors
4+
// Licensed under the Apache License, Version 2.0 (the "License");
5+
// you may not use this file except in compliance with the License.
6+
// You may obtain a copy of the License at
7+
//
8+
// http://www.apache.org/licenses/LICENSE-2.0
9+
//
10+
// Unless required by applicable law or agreed to in writing, software
11+
// distributed under the License is distributed on an "AS IS" BASIS,
12+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
// See the License for the specific language governing permissions and
14+
// limitations under the License.
15+
16+
//go:build windows
17+
18+
package setupapi
19+
20+
import (
21+
"golang.org/x/sys/windows"
22+
)
23+
24+
//nolint:gochecknoglobals
25+
var (
26+
modSetupAPI = windows.NewLazySystemDLL("setupapi.dll")
27+
procSetupDiGetClassDevsW = modSetupAPI.NewProc("SetupDiGetClassDevsW")
28+
procSetupDiEnumDeviceInfo = modSetupAPI.NewProc("SetupDiEnumDeviceInfo")
29+
procSetupDiGetDeviceRegistryPropertyW = modSetupAPI.NewProc("SetupDiGetDeviceRegistryPropertyW")
30+
procSetupDiDestroyDeviceInfoList = modSetupAPI.NewProc("SetupDiDestroyDeviceInfoList")
31+
)

internal/headers/setupapi/types.go

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
// SPDX-License-Identifier: Apache-2.0
2+
//
3+
// Copyright The Prometheus Authors
4+
// Licensed under the Apache License, Version 2.0 (the "License");
5+
// you may not use this file except in compliance with the License.
6+
// You may obtain a copy of the License at
7+
//
8+
// http://www.apache.org/licenses/LICENSE-2.0
9+
//
10+
// Unless required by applicable law or agreed to in writing, software
11+
// distributed under the License is distributed on an "AS IS" BASIS,
12+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
// See the License for the specific language governing permissions and
14+
// limitations under the License.
15+
16+
//go:build windows
17+
18+
package setupapi
19+
20+
import "golang.org/x/sys/windows"
21+
22+
const (
23+
DIGCF_PRESENT = 0x00000002
24+
SPDRP_DEVICEDESC = 0x00000000
25+
SPDRP_FRIENDLYNAME = 0x0000000C
26+
SPDRP_HARDWAREID = 0x00000001
27+
SPDRP_PHYSICAL_DEVICE_OBJECT_NAME = 0x0000000E
28+
)
29+
30+
type SP_DEVINFO_DATA struct {
31+
CbSize uint32
32+
ClassGuid windows.GUID
33+
DevInst uint32
34+
_ uintptr // Reserved
35+
}
36+
37+
type GPUDevice struct {
38+
DeviceDesc string
39+
FriendlyName string
40+
HardwareID string
41+
PhysicalDeviceObjectName string
42+
}

0 commit comments

Comments
 (0)