Commit 6e4aa08f authored by Yunxiang Li's avatar Yunxiang Li Committed by Alex Deucher
Browse files

drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic



The retry loop for SRIOV reset have refcount and memory leak issue.
Depending on which function call fails it can potentially call
amdgpu_amdkfd_pre/post_reset different number of times and causes
kfd_locked count to be wrong. This will block all future attempts at
opening /dev/kfd. The retry loop also leakes resources by calling
amdgpu_virt_init_data_exchange multiple times without calling the
corresponding fini function.

Align with the bare-metal reset path which doesn't have these issues.
This means taking the amdgpu_amdkfd_pre/post_reset functions out of the
reset loop and calling amdgpu_device_pre_asic_reset each retry which
properly free the resources from previous try by calling
amdgpu_virt_fini_data_exchange.

Signed-off-by: default avatarYunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: default avatarEmily Deng <Emily.Deng@amd.com>
Reviewed-by: default avatarZhigang Luo <zhigang.luo@amd.com>
Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
parent a5b84326
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment