Adding a version of rolled triangular solver code for the right-multiply case,...
Adding a version of rolled triangular solver code for the right-multiply case, which is used in Cholesky decomposition. Replacing the unrolled version with a While loop drastically reduces XLA compilation times which allows much larger models to be run on TPU. PiperOrigin-RevId: 195163298
Loading
Please sign in to comment