Commit d85f69d5 authored by Ankit Agrawal's avatar Ankit Agrawal Committed by Alex Williamson
Browse files

vfio/nvgrace-gpu: Check the HBM training and C2C link status



In contrast to Grace Hopper systems, the HBM training has been moved
out of the UEFI on the Grace Blackwell systems. This reduces the system
bootup time significantly.

The onus of checking whether the HBM training has completed thus falls
on the module.

The HBM training status can be determined from a BAR0 register.
Similarly, another BAR0 register exposes the status of the CPU-GPU
chip-to-chip (C2C) cache coherent interconnect.

Based on testing, 30s is determined to be sufficient to ensure
initialization completion on all the Grace based systems. Thus poll
these register and check for 30s. If the HBM training is not complete
or if the C2C link is not ready, fail the probe.

While the time is not required on Grace Hopper systems, it is
beneficial to make the check to ensure the device is in an
expected state. Hence keeping it generalized to both the generations.

Ensure that the BAR0 is enabled before accessing the registers.

CC: Alex Williamson <alex.williamson@redhat.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: default avatarAnkit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20250124183102.3976-4-ankita@nvidia.com


Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
parent 6a9eb2d1
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment