Commit 462a79b7 authored by A. Unique TensorFlower's avatar A. Unique TensorFlower Committed by TensorFlower Gardener
Browse files

Change RecvBufRespExtra.tensor_content to a repeated string and fill

it with many small strings instead of one large one, when using gRPC.

Typing tensor_content as a Cord instead of a single string leads to
roughly a 20% speedup in a 2-worker (8 v100 GPUs each) benchmark training of
resnet50 using collective all-reduce for gradient reduction and gRPC for
all inter-worker transport.  It is hypothesized that without the Cord
type gRPC is stalling incoming RecvBuf RPCs as it repeatedly reallocates
and copies the strings.  Using a Cord to receive the value leads to much
better flow control.

Unfortunately, proto3 does not yet support [ctype=CORD], so we can't
use that simple and effective optimization.  This CL changes
tensor_content to a sequence of strings and sets a max single-string
size of 4KB, the likely page size.  (This default can be changed via
ConfigProto.experimental.recv_buf_max_chunk.) It achieves roughly a
12% speedup on the benchmark test.  The speedups are highly dependent
on topology and network weather since the major effect is believed to
be on flow control.

PiperOrigin-RevId: 219322231
parent b4b3c410
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment