SVD-operation on the GPU (#11878)
* Added GPU implementation of SVD, but without complex number support and it requires m>=n * Removed debug logs * Fixed SVD: V-matrix was not transposed * Fixed SVD implementation, there was a memory corruption error. Also renamed it to the suffix .cu.cc for consistency. * Fixed stupid left-over from debug statements * fixed formatting with clang-format * small improvements based on the comments * further tiny improvemens * Refactoring, now it should also support m<n, but tests still fail sometimes * Added explicit gpu tests * Formatting * Added checks if DoTranspose was successfull * It really is a bug in cuSolver that it always returns zeros for Vt if n==1, but my workaround does not always work yet * Fixed cuSolver issue if one dimension of the input matrix is one * fixed buildifier error * improved readability * Fixed a stupid error. This happens if you make changes without testing it * Fixed a bug if n==1 and full_matrices=false * Implemented changes requested from the Github comments, Moved n==1-fix kernels outside of the batch loop to compute them once over all batches. * Cleanup tasks * Added TODO what has to be done to support complex matrices * fixed issue that the order of } (end namespace) and #endif (GOOGLE_CUDA) was mixed up. This breaks the code if CUDA is disabled
Loading
Please sign in to comment