[XLA:CPU] Implement Ax+b dot output fusion for Matrix-vector products
I had to roll in the change to generalize CPU layout assignment as without it we lose the make-rhs-column-major optimization and that causes a performance regression. PiperOrigin-RevId: 178970986
Loading
Please sign in to comment