A faster BatchSelectFunctor for tf.where on CPU.
Op 'tf.where(c, t, e)' supports that 't' and 'e' are N-D tensors while 'c' is a 1D tensor, which would call BatchSelectFunctor to get the result. But its basic implementation broadcasts 'c' to the same dimension with 't' and 'e', which would get bad efficiency on CPU for large tensors. Here a loop-based implementation would be adopted to make this operation faster on CPU.
Loading
Please sign in to comment