Make `TensorBuffer::data()` non-virtual and move the pointer into the base class.
All existing `TensorBuffer` subclasses already store a pointer to their buffer. Accessing that pointer by calling a virtual method is inefficient. We currently generate the following instruction sequence at the callsite (when compiling for x86_64 with a recent version of Clang): 1dd002: mov (%rdi),%rax tensor.h:655 1dd005: callq *0x10(%rax) tensor.h:655 ...and the following implementation for `Buffer::data()`: 236520: mov 0x10(%rdi),%rax tensor.h:888 236524: retq tensor.h:888 With this change, we generate a single `mov` instruction inline at the call site, and avoid any branching. PiperOrigin-RevId: 223985477
Loading
Please sign in to comment