Throw an exception when the user's batch size isn't divisible by GPUs.
The alternative to this is to have an adaptive approach that would unevenly split input into per-tower batches. The concern with that was that all towers will be as slow as the one with more input reducing the performance. Batch size seems to be commonly tailored to the available hardware. PiperOrigin-RevId: 184192793
Loading
Please sign in to comment