Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(nodeadm): PCIe detection for nvidia GPU instances #2146

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ndbaker1
Copy link
Member

@ndbaker1 ndbaker1 commented Feb 8, 2025

Issue #, if available:

Description of changes:

In order to recognize when the running instance has GPU devices, we currently use a hard coded list of instance type families. to keep this list updated or remove it we can do one of the following:

  1. use workflows to generate PRs for the changes. this isn't so bad but introduces a build time dependency
  2. make api calls to ec2.DescribeInstanceTypes to get whether the current instance has a GPU device
  3. read local data on the instance to determine if there is an nvidia device attached

this PR implements (3) from above by looking for the nvidia vendor name in /proc/bus/pcie/device. the original logic is left in and PCIe devices are used as a fallback.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant