Begin defaulting GPUOptions.experimental.pending_cap to 8. This
has the effect of limiting the maximum number of pending GPU kernels (when using the default single compute stream configuration) to no more than 8. Testing with benchmark programs in single-GPU and distributed configurations suggests this always yields better performance than unlimited queuing, probabaly because of effects on GPU/CPU i/o which needs to wait on pending kernels for memory allocation safety. PiperOrigin-RevId: 233456126
Loading
Please sign in to comment