[XLA:GPU] Cleanups to fused 021 transpose implementation.
- Fix typos. - Clarify comments. - Reduce nesting in a few places. - Add asserts that this code is dealing with specifically a loop fusion. - Rename some functions. In particular, it's confusing to have a function with a generic name like EmitCodeWithBoundCheck that actually is specialized to a tiled implementation. - Remove statement expression (GCC language extension), replacing it with an IIFE. - Don't refer to shared-memory tile space as "buffer" without other qualifying words, since that's ambiguous with what XLA refers to as a "buffer". - Use llvm::cast instead of static_cast. - Comply with style guide naming rules for compile-time constants (kFoo). - Use c_accumulate instead of std::accumulate. - Put std::function parameter at the end of the param list. This lets us cleanly embed the lambda into the call because of how clang-format formats such calls. (I think this one is possibly the most helpful change in this patch, as it suddenly makes clear to me the way that we use two calls to emit_tiled_elemental_code_with_bounds_check to emit the code.) PiperOrigin-RevId: 204134102
Loading
Please sign in to comment