Improvements to the existing custom NEON path for 8bit quantized GEMV:
1. Drop the requirement that the output depth be a multiple of 4. We still process groups of 4 rows at a time, but if there remains a few rows at the end, we process the last 4 rows, possibly re-processing some already-processed rows. 2. Also use this fast GEMV path in Conv, not just in FullyConnected. Indeed, in some newer models, we see GEMV's being encoded as Conv instead of FullyConnected nodes. (Seen in MobileNet v2 tflite files). PiperOrigin-RevId: 234148907
Loading
Please sign in to comment