Implement async TensorFromTransportOptions for GDR
Instead of blocking on completion of an RDMA op, RecvTensor client will
now post a work request to the NIC send queue and return immediately.
The GDR background polling thread will handle the callback after the
corresponding RDMA op is completed, i.e. polled from the completion
queue on NIC. The old epoll based mechanism is removed to trade higher
CPU usage for improved throughput and lower latencies for RDMA ops.
The maximum numbers of work request (WR) in the send/recv queues on
NIC are increased to entertain the increased number of concurrent
RDMA ops. The threshold of tensor size below which we pass the tensor
content in metadata is also increased to reduce the pressure to send/recv
queues on NIC.
This fixes #23933.
Signed-off-by:
Bairen Yi <byronyi@clustar.ai>
Loading
Please sign in to comment