Fix pontential issue with number of blocks launched for depthwise kernels: the...
Fix pontential issue with number of blocks launched for depthwise kernels: the number of work_elements was too small, which could return a block_count that is too small to cover all elements. We also have been ignoring the suggested thread_per_block, so were potentially launching more blocks than necessary to fill the GPU (which is inefficient, but functionally correct). Changing 'assert(false && ...' to LOG(FATAL) because it shouldn't be debug only. PiperOrigin-RevId: 186037306
Loading
Please sign in to comment