Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NvmExpressDxe: Request Number of Queues from Controller #1260

Open
wants to merge 3 commits into
base: dev/202405
Choose a base branch
from

Conversation

VivianNK
Copy link
Contributor

@VivianNK VivianNK commented Jan 24, 2025

Description

Request Number of Queues from the Controller. If the number of queues is <2, then we only have a synchronous queue and do not support asynchronous BlockIo2 functionality.

  • Impacts functionality?
  • Impacts security?
  • Breaking change?
  • Includes tests?
  • Includes documentation?
  • Backport to release branch?

How This Was Tested

CI and QEMU with NVMe virtualization with max_ioqpairs=1, 2, 64

Used reconnect cmd in Shell on the virtualized NVMe device handle to trigger the NvmeControllerReset function (saw in the run log).
Confirmed the device was reconnected using devtree.

Integration Instructions

N/A

@github-actions github-actions bot added the impact:non-functional Does not have a functional impact label Jan 24, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 1.60%. Comparing base (5c0d73f) to head (689fa60).

Additional details and impacted files
@@             Coverage Diff              @@
##           dev/202405    #1260    +/-   ##
============================================
  Coverage        1.59%    1.60%            
============================================
  Files            1383     1383            
  Lines          359819   359844    +25     
  Branches         5341     5524   +183     
============================================
+ Hits             5731     5760    +29     
+ Misses         353991   353977    -14     
- Partials           97      107    +10     
Flag Coverage Δ
MdePkg 5.59% <ø> (ø)
NetworkPkg 0.55% <ø> (ø)
PolicyServicePkg 30.41% <ø> (+1.65%) ⬆️
UefiCpuPkg 4.68% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@VivianNK VivianNK force-pushed the personal/vnowkakeane/nvme_num_queues branch 2 times, most recently from b273e57 to bc816e4 Compare February 3, 2025 21:39
@VivianNK VivianNK force-pushed the personal/vnowkakeane/nvme_num_queues branch 2 times, most recently from 8215f70 to 4c7d035 Compare February 4, 2025 23:33
@VivianNK VivianNK force-pushed the personal/vnowkakeane/nvme_num_queues branch 3 times, most recently from bd4cc2a to 844e813 Compare February 19, 2025 00:35
Copy link
Member

@spbrogan spbrogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't finish review but here are few comments

@VivianNK VivianNK force-pushed the personal/vnowkakeane/nvme_num_queues branch from 844e813 to 673e89e Compare February 25, 2025 19:25
@VivianNK VivianNK marked this pull request as ready for review February 25, 2025 21:48
@VivianNK VivianNK force-pushed the personal/vnowkakeane/nvme_num_queues branch 4 times, most recently from 1f55f2e to 271123f Compare March 5, 2025 21:57
@VivianNK VivianNK requested a review from kuqin12 March 5, 2025 22:13
//
// Number of Queues Allocated by the controller
//
UINT32 Nsqa; // Number of Submission Queues Allocated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above checks are only checking Nsqa.

Uefi always uses a 1:1 submission/completion queue allocation. Is there any benefit to having both Ncqa and Nsqa? Would it be better to just track a single SubmissionCompletionPairCount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there can be multiple submission queues for 1 completion queue, and the possibility of that scenario doesn't seem far off, it made sense to me to check only Nsqa. Since in that case we would have 1 completion queue, 1 synchronous submission queue, and 1 asynchronous submission queue.
The goal here was to be more flexible for future scenarios. It may be unnecessary for this PR, though. Will start a private thread for discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apop5 Changed to NumberOfDataQueuePairs since the consensus was, we only support 1:1 submission: completion queues.

@VivianNK VivianNK force-pushed the personal/vnowkakeane/nvme_num_queues branch 2 times, most recently from 8fb7b4d to ef5976c Compare March 6, 2025 00:31
@VivianNK VivianNK force-pushed the personal/vnowkakeane/nvme_num_queues branch 4 times, most recently from f301c1f to 6bc6aef Compare March 12, 2025 00:48
@VivianNK VivianNK requested a review from apop5 March 12, 2025 00:49
@VivianNK VivianNK requested a review from spbrogan March 12, 2025 00:50
Added unions to support using the Set Feature
Command for Number of Queues.

Signed-off-by: Vivian Nowka-Keane <[email protected]>
the controller.

Instead of assuming we always want to create two sets of data queues,
we request two and the controller will determine how many are allocated.

Signed-off-by: Vivian Nowka-Keane <[email protected]>

to squash

Remove check for blockIo2 installation

Check for both NSQA and NCQA
@VivianNK VivianNK force-pushed the personal/vnowkakeane/nvme_num_queues branch from 6bc6aef to 1c08614 Compare March 13, 2025 21:21
@@ -66,4 +86,25 @@ NvmeIdentifyNamespace (
IN VOID *Buffer
);

// MU_CHANGE [BEGIN] - Allocate IO Queue Buffer
// TODO should this be used on reset or do we want to trust our private data structure?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apop5 thoughts?

the number of queues with the controller.
Copy link
Contributor

@os-d os-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large PR that includes several different pieces, in order to prepare for upstreaming and making review easier, I would recommend splitting it up into at least a few different commits, including splitting commits per pkg (new NVMe industry standard definitions should be their own commit, refactoring of exsiting MU_CHANGE may be its own commit, get/set features another commit).

@@ -271,6 +276,22 @@ EnumerateNvmeDevNamespace (
goto Exit;
}

if (Private->NumberOfDataQueuePairs > 1) {
// We have multiple data queues, so we can support the BlockIo2 protocol
Status = gBS->InstallProtocolInterface (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UEFI spec recommends using InstallMultipleProtocolInterfaces() always, even for a single protocol: https://uefi.org/specs/UEFI/2.10/07_Services_Boot_Services.html?highlight=installprotocolinterface#efi-boot-services-installprotocolinterface

InstallMultipleProtocolInterfaces() performs more error checking than InstallProtocolInterface(), so it is recommended that InstallMultipleProtocolInterfaces() be used in place of InstallProtocolInterface()`

//
// If BlockIo2 is installed, uninstall it.
//
if (Device->Controller->NumberOfDataQueuePairs > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because BlockIo2 is not widely used, I would recommend trying to uninstall the other protocols first, since that's where real usage would be, then do BlockIo2 if required and not duplicate the OpenProtocol logic, just maintain if either uninstall failed

// MU_CHANGE [BEGIN] - Request Number of Queues from Controller
if (Private->NumberOfDataQueuePairs > 1) {
// We have multiple data queues, so we need to uninstall the BlockIo2 protocol
gBS->UninstallProtocolInterface (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -957,9 +1011,11 @@ NvmExpressDriverBindingStart (
EFI_PHYSICAL_ADDRESS MappedAddr;
UINTN Bytes;
EFI_NVM_EXPRESS_PASS_THRU_PROTOCOL *Passthru;
// MU_CHANGE - Support alternative hardware queue sizes in NVME driver
UINTN QueuePageCount = PcdGetBool (PcdSupportAlternativeQueueSize) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are changing a previous MU_CHANGE, should this PR either be marked [SQUASH ON REBASE] or if completely superceding the previous commit(s), reverting that commit and applying this one as a [REBASE & FF]?

That might look like splitting this PR into multiple commits, one of which is squashed with the previous commit or reverting it.

// Set the sizes of the admin submission & completion queues in number of entries
Aqa->Asqs = PcdGetBool (PcdSupportAlternativeQueueSize) ? MIN (NVME_ALTERNATIVE_MAX_QUEUE_SIZE, Private->Cap.Mqes) : NVME_ASQ_SIZE;
Aqa->Rsvd1 = 0;
Aqa->Acqs = PcdGetBool (PcdSupportAlternativeQueueSize) ? MIN (NVME_ALTERNATIVE_MAX_QUEUE_SIZE, Private->Cap.Mqes) : NVME_ACQ_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misremembering this change, but if we are now querying the NVMe controller for the queue size supported, should we not just use that, regardless of a PCD? Or is this for the case where a controller is wrong about its queue size?

In either case, it either seems to me like we don't need the PCD (and just take what the controller supports or the max we support) or that the HW needs to be fixed, but failing that we would need a PCD that tells us what queue size to use (but again I am dubious of not doing what the HW tells us).

//
// Program admin submission queue address.
//
// TODO how can we program the data queue base address? Or do we need to?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we address or remove?

CopyMem (Sn, Private->ControllerData->Sn, sizeof (Private->ControllerData->Sn));
Sn[20] = 0;
CopyMem (Mn, Private->ControllerData->Mn, sizeof (Private->ControllerData->Mn));
Mn[40] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on what the setting of random values is?

}

// Using the first data queue size for the number of pages required for the data queues
QueuePairPageCount = EFI_SIZE_TO_PAGES (Private->SqData[1].NumberOfEntries * 2^Private->SqData[1].EntrySize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments

if (EFI_ERROR (Status)) {
return Status;
}

DEBUG ((DEBUG_INFO, "Private->DataQueueBuffer = [%016X]\n", (UINT64)(UINTN)Private->DataQueueBuffer));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need this print checked in

UINT8 Sn[21];
UINT8 Mn[41];

DEBUG ((DEBUG_INFO, "%a: Begin Controller Reset\n", __FUNCTION__));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DEBUG ((DEBUG_INFO, "%a: Begin Controller Reset\n", __FUNCTION__));
DEBUG ((DEBUG_INFO, "%a: Begin Controller Reset\n", __func__));

@VivianNK VivianNK force-pushed the personal/vnowkakeane/nvme_num_queues branch from 1c08614 to 689fa60 Compare March 17, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact:non-functional Does not have a functional impact
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants