Commit 2160dcc3 authored by Jacob Moroni's avatar Jacob Moroni Committed by Greg Kroah-Hartman
Browse files

IB/cm: use rwlock for MAD agent lock

[ Upstream commit 4dab26be ]

In workloads where there are many processes establishing connections using
RDMA CM in parallel (large scale MPI), there can be heavy contention for
mad_agent_lock in cm_alloc_msg.

This contention can occur while inside of a spin_lock_irq region, leading
to interrupts being disabled for extended durations on many
cores. Furthermore, it leads to the serialization of rdma_create_ah calls,
which has negative performance impacts for NICs which are capable of
processing multiple address handle creations in parallel.

The end result is the machine becoming unresponsive, hung task warnings,
netdev TX timeouts, etc.

Since the lock appears to be only for protection from cm_remove_one, it
can be changed to a rwlock to resolve these issues.

Reproducer:

Server:
  for i in $(seq 1 512); do
    ucmatose -c 32 -p $((i + 5000)) &
  done

Client:
  for i in $(seq 1 512); do
    ucmatose -c 32 -p $((i + 5000)) -s 10.2.0.52 &
  done

Fixes: 76039ac9 ("IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock")
Link: https://patch.msgid.link/r/20250220175612.2763122-1-jmoroni@google.com


Signed-off-by: default avatarJacob Moroni <jmoroni@google.com>
Acked-by: default avatarEric Dumazet <edumazet@google.com>
Reviewed-by: default avatarZhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
parent f9507cf2
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment