-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new awsebsnvmereceiver #1603
base: main
Are you sure you want to change the base?
Conversation
66fb288
to
546fb3e
Compare
546fb3e
to
be8e93b
Compare
receiver/awsebsnvmereceiver/internal/nvme/device_file_attributes.go
Outdated
Show resolved
Hide resolved
receiver/awsebsnvmereceiver/internal/nvme/device_file_attributes.go
Outdated
Show resolved
Hide resolved
receiver/awsebsnvmereceiver/internal/nvme/device_file_attributes.go
Outdated
Show resolved
Hide resolved
"This PR should be merged into the ebs branch" Is this still true? We are just adding the receiver not enabling it so putting it to main seems fine |
Oh, got it. Then I'll leave the destination branch as is |
Attaching the agent config and translated configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also provide a sample yaml config of the receiver?
namespace := -1 | ||
partition := -1 | ||
|
||
fmt.Sscanf(device, "nvme%dn%dp%d", &controller, &namespace, &partition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check if this returns an error? if so then we are unable to parse the device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to ignore the error and chose to throw an error if we couldn't at least get the controller ID. The reason is that this function parses the three different device file name patterns (nvme{id}, nvme{id}n{namespace}, nvme{id}n{namespace}p{partition}). If the input isn't the third pattern, then there's an "EOF" error.
I could instead just do something like check for nvme%dn%dp%d, then nvme%dn%d, and lastly nvme%d. I guess that'll give a better error message but it seems like a lot of excess work. In which then I kind of like the implementation I had before.
Or, we could do a regular expression like: ^nvme(\d+)(?:n(\d+))?(?:p(\d+))?$
. https://regex101.com/r/5TjfEE/1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Make it clear that it can throw an error and we're explicitly ignoring it.
fmt.Sscanf(device, "nvme%dn%dp%d", &controller, &namespace, &partition) | |
_, _ = fmt.Sscanf(device, "nvme%dn%dp%d", &controller, &namespace, &partition) |
} | ||
|
||
// Check if all devices should be collected. Otherwise check if defined by user | ||
_, hasAsterisk := s.allowedDevices["*"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have plans on filtering by devices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, it's possible to filter by devices already. it's just below the line you highlighted 😄
amazon-cloudwatch-agent/receiver/awsebsnvmereceiver/scraper.go
Lines 131 to 134 in 9d4a053
if _, isAllowed := s.allowedDevices[deviceName]; !isAllowed { | |
s.logger.Debug("skipping un-allowed device", zap.String("device", deviceName)) | |
continue | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Typically, in Go conventon, the file would be named something like util_notunix.go
like https://github.com/aws/amazon-cloudwatch-agent/blob/main/plugins/inputs/windows_event_log/windows_event_log_notwindows.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I was following what I found in the Go src. Example: https://github.com/golang/go/blob/master/src/net/cgo_stub.go. I can change it to what we do for the agent to keep it consistent though.
IsEbsDevice(device *DeviceFileAttributes) (bool, error) | ||
} | ||
|
||
type Util struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I know we use Util
as a name all over the place, but it doesn't help me understand what this is. Would rather it be called like DeviceInfoProvider
or something more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why is this a struct if it doesn't have a state? Why not just have the functions exported as package level functions?
|
||
package nvme | ||
|
||
type UtilInterface interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: In Go, Interface
typically isn't included in interface names. The main purpose of an interface is to define a set of functions and use the interface to expose it. One common pattern is for the interface to be exported and the struct to not be.
type Util interface {
}
type util struct {
}
import "errors" | ||
|
||
func (u *Util) GetAllDevices() ([]DeviceFileAttributes, error) { | ||
return nil, errors.New("nvme stub: nvme not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Don't include the "file name" in the error.
return nil, errors.New("nvme stub: nvme not supported") | |
return nil, errors.New("nvme not supported") |
) | ||
} | ||
|
||
func arrayToSet(arr []string) map[string]struct{} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We technically already have this function as part of https://github.com/aws/amazon-cloudwatch-agent/blob/main/internal/util/collections/collections.go#L103
if foundWorkingDevice { | ||
s.logger.Debug("emitted metrics for nvme device with controller id", zap.Int("controllerID", id)) | ||
} else { | ||
s.logger.Info("unable to get metrics for nvme device with controller id", zap.Int("controllerID", id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why is this an info log?
NvmeDevicePrefix = "nvme" | ||
DevDirectoryPath = "/dev" | ||
NvmeSysDirectoryPath = "/sys/class/nvme" | ||
|
||
EbsNvmeModelName = "Amazon Elastic Block Store" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Most of these are only used within the package and should not be exported.
NvmeDevicePrefix = "nvme" | |
DevDirectoryPath = "/dev" | |
NvmeSysDirectoryPath = "/sys/class/nvme" | |
EbsNvmeModelName = "Amazon Elastic Block Store" | |
devicePrefix = "nvme" | |
devDirectoryPath = "/dev" | |
sysDirectoryPath = "/sys/class/nvme" | |
ebsModelName = "Amazon Elastic Block Store" |
Even DevDirectoryPath
could avoid being exported if you had a simple function like
func DevPath(device string) string {
return filepath.Join(devDirectoryPath, device)
}
|
||
devices := []DeviceFileAttributes{} | ||
for _, entry := range entries { | ||
if strings.HasPrefix(entry.Name(), NvmeDevicePrefix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Likely not an issue in this directory, but you could check entry.IsDir()
first and ignore those.
var ( | ||
ErrInvalidEBSMagic = errors.New("invalid EBS magic number") | ||
ErrParseLogPage = errors.New("failed to parse log page") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If they aren't used outside of the package, don't export them.
func nvmeReadLogPage(fd uintptr, logID uint8) ([]byte, error) { | ||
data := make([]byte, 4096) // 4096 bytes is the length of the log page. | ||
bufferLen := len(data) | ||
|
||
if bufferLen > math.MaxUint32 { | ||
return nil, errors.New("nvmeReadLogPage: bufferLen exceeds MaxUint32") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this check. If we define the length of the slice, how would this ever return an error?
Note to Reviewers
This PR does not enable this new receiver. This is strictly just adding a new receiver that will be enabled later.
Changes 2:
Resources
toDevices
in the receiver configDescription of the issue
Elastic Block Storage (EBS) exposes performance statistics for EBS volumes attached to EC2 instances as NVMe devices in a vendor unique log page. The log page can be retrieved by making a system call to the NVMe device. CloudWatch Agent (CWA) is going to collect the retrieved metrics and emit them to CloudWatch.
Description of changes
Main Scraper Implementation (scraper.go):
nvmeScraper
structmetadata.yaml
NVMe Metrics Collection (internal/nvme):
EBSMetrics
structGenerated components (internal/metadata);
metadata.yaml
The receiver collects the following metrics from EBS NVMe devices:
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
The EC2 instance that the manual tests were ran on have two EBS volumes attached (nvme0 and nvme1)
Resource is explicitly empty
One device (nvme0) is in resources
*
for resourcesSample Config
Requirements
Before commit the code, please do the following steps.
make fmt
andmake fmt-sh
make lint