ART: Improve JitProfile perf in arm/arm64 mterp
ART currently requires two profiling-related things from the interpreters: hotness updates and OSR switch checks. The hotness updates previously used the existing instrumentation framework - which is flexible, but quite heavyweight. For most things, the instrumentation framework overhead is acceptable, but because we do a hotness update on every backwards branch the overhead is unacceptable. Prior to this CL, branch profiling dominates interpreter cost. Here, we bypass the instrumentation framework for hotness updates and deliver a significant performance improvement. Running interpreter-only (dalvikvm -Xint) on a Nexus 6, we see the logic subtest of Caffeinemark improving from 2600 to 9200, and the overall score going from 1979 to over 3000. Compared to the C++ switch interpreter, we see a 6x improvement on the branchy logic subtest and a 2.6x improvement overall. Compared with the previous mterp which did not have support for jit profiling, we see a few (1% to 5%) performance loss on the standard command-line benchmarks. I consider this acceptable (we could create an alternate non-profiling mterp which would have no penalty, but I don't consider this overhead big enough to justify that). Change-Id: I50b5b8c5ed8ebda3c8b4e65d27ba7393c3feae04
Loading
Please sign in to comment