Commit dcddfdf0 authored Jan 11, 2018 by A. Unique TensorFlower Committed by TensorFlower Gardener Jan 11, 2018

Improve performance of several utility functions in TensorFlow

framework/types.h defines a variety of functions on DataType enums. Some of these functions are implemented by allocating arrays in the heap. Even though DataTypeVector is a typedef for InlinedVector, it only stores 4 elements inline. Many of the vectors used in types.h/types.cc contain more than 4 elements.

To make matters worse, some of these functions are called quite frequently under load, so we're wasting time allocating and copying arrays.

The set of distinct DataType values is so small, however, that we can represent a set of DataType values as a bitmask, and use bit-shifts and tests instead of sequential scans of arrays.

Even the functions that do not allocate, such as DataTypeCanUseMemcpy(), are needlessly inefficient (read: they use control-flow and indirect jumps when a simple table-based load would do; they are also not inlined). These costs were significant enough that they consumed about 1.2% of CPU cycles under heavy load.

The surprising cost of DataTypeCanUseMemcpy() inspired this change. I went ahead and made the change fully general, by adding a DataTypeSet type and changing all of the utility functions in framework/types.h to use it (with the exception of DataTypeAlwaysOnHost because it uses a _REF type), for the sake of generality and performance.

PiperOrigin-RevId: 181695458

parent f7f51589

Show whitespace changes

Inline Side-by-side

Please to comment