-
Notifications
You must be signed in to change notification settings - Fork 215
Add new awsebsnvmereceiver #1603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
66fb288
to
546fb3e
Compare
546fb3e
to
be8e93b
Compare
receiver/awsebsnvmereceiver/internal/nvme/device_file_attributes.go
Outdated
Show resolved
Hide resolved
receiver/awsebsnvmereceiver/internal/nvme/device_file_attributes.go
Outdated
Show resolved
Hide resolved
receiver/awsebsnvmereceiver/internal/nvme/device_file_attributes.go
Outdated
Show resolved
Hide resolved
"This PR should be merged into the ebs branch" Is this still true? We are just adding the receiver not enabling it so putting it to main seems fine |
Oh, got it. Then I'll leave the destination branch as is |
Attaching the agent config and translated configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also provide a sample yaml config of the receiver?
receiver/awsebsnvmereceiver/internal/nvme/device_file_attributes.go
Outdated
Show resolved
Hide resolved
IsEbsDevice(device *DeviceFileAttributes) (bool, error) | ||
} | ||
|
||
type Util struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I know we use Util
as a name all over the place, but it doesn't help me understand what this is. Would rather it be called like DeviceInfoProvider
or something more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why is this a struct if it doesn't have a state? Why not just have the functions exported as package level functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I know we use Util as a name all over the place, but it doesn't help me understand what this is. Would rather it be called like DeviceInfoProvider or something more descriptive.
Will do.
nit: Why is this a struct if it doesn't have a state? Why not just have the functions exported as package level functions?
I made it so it would be a bit easier to mock for unit testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common pattern is to have a constructor function like
func NewDeviceInfoProvider() DeviceInfoProvider {
return &util{}
}
so that only the interface is exposed.
var ( | ||
ErrInvalidEBSMagic = errors.New("invalid EBS magic number") | ||
ErrParseLogPage = errors.New("failed to parse log page") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If they aren't used outside of the package, don't export them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comment below about keeping the code as similar as possible to what is found in the EBS CSI driver code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could put CSI driver code into a separate package (e.g. internal/nvme/csidriver
). Are we planning on trying to keep this in sync with the CSI driver code or are we willing to diverge?
func nvmeReadLogPage(fd uintptr, logID uint8) ([]byte, error) { | ||
data := make([]byte, 4096) // 4096 bytes is the length of the log page. | ||
bufferLen := len(data) | ||
|
||
if bufferLen > math.MaxUint32 { | ||
return nil, errors.New("nvmeReadLogPage: bufferLen exceeds MaxUint32") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this check. If we define the length of the slice, how would this ever return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes directly from the EBS CSI driver code. I did see some things I could change, but I do have a preference to keep it the same as much as possible.
volumeID string | ||
type ebsDevices struct { | ||
volumeID string | ||
deviceNames []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one volume can have multiple devices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's all based on the controller ID. For example: nvme0, nvme0n1, nvme0n2, nvme0n1p1 will all have the same serial/volume ID, and metrics. Whereas nvme1 would be different
return devices, nil | ||
} | ||
|
||
func (u *Util) GetDeviceSerial(device *DeviceFileAttributes) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validated in what way? device.BaseDeviceName()
does the same check.
t.Run(tt.name, func(t *testing.T) { | ||
got, err := ParseNvmeDeviceFileName(tt.device) | ||
if (err != nil) != tt.wantErr { | ||
t.Errorf("ParseNvmeDeviceFileName() error = %v, wantErr %v", err, tt.wantErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For consistency with other tests, consider using github.com/stretchr/testify/assert
or github.com/stretchr/testify/require
var ( | ||
ErrInvalidEBSMagic = errors.New("invalid EBS magic number") | ||
ErrParseLogPage = errors.New("failed to parse log page") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could put CSI driver code into a separate package (e.g. internal/nvme/csidriver
). Are we planning on trying to keep this in sync with the CSI driver code or are we willing to diverge?
IsEbsDevice(device *DeviceFileAttributes) (bool, error) | ||
} | ||
|
||
type Util struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common pattern is to have a constructor function like
func NewDeviceInfoProvider() DeviceInfoProvider {
return &util{}
}
so that only the interface is exposed.
// retrieving the volume ID and validating if it's an EBS device | ||
if entry, seenController := devices[device.Controller()]; seenController { | ||
entry.deviceNames = append(entry.deviceNames, deviceName) | ||
s.logger.Debug("skipping unnecessary device validation steps", zap.String("device", deviceName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Some of these logs seem excessive especially if this is going to get logged multiple times during each scrape.
Note to Reviewers
This PR does not enable this new receiver. This is strictly just adding a new receiver that will be enabled later.
Changes 2:
Resources
toDevices
in the receiver configChanges 3:
Description of the issue
Elastic Block Storage (EBS) exposes performance statistics for EBS volumes attached to EC2 instances as NVMe devices in a vendor unique log page. The log page can be retrieved by making a system call to the NVMe device. CloudWatch Agent (CWA) is going to collect the retrieved metrics and emit them to CloudWatch.
Description of changes
Main Scraper Implementation (scraper.go):
nvmeScraper
structmetadata.yaml
NVMe Metrics Collection (internal/nvme):
EBSMetrics
structGenerated components (internal/metadata);
metadata.yaml
The receiver collects the following metrics from EBS NVMe devices:
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
The EC2 instance that the manual tests were ran on have two EBS volumes attached (nvme0 and nvme1)
Resource is explicitly empty
One device (nvme0) is in resources
*
for resourcesSample Config
Requirements
Before commit the code, please do the following steps.
make fmt
andmake fmt-sh
make lint