Last August I asked the question “How fast can CFI/EXIDX-based stack unwinding be?” At the time I was experimenting with native unwinding using our in-tree Breakpad copy, but getting dismal performance results. The posting observed that Breakpad’s CFI unwinder is around 30 times slower than Valgrind’s CFI unwinder, and looked in detail at the reasons for this slowness.
LUL has been integrated into the SPS profiler, and landed a couple of weeks back.
It currently provides unwinding on x86_64-linux, x86_32-linux and arm-android, using the Dwarf CFI and ARM EXIDX unwind formats. Unwinding by stack scanning is also supported, although that should rarely be needed. Compared to the Breakpad unwinder, there is a very substantial performance increase, achieving a cost of about 40% of a 1.2 GHz Cortex A9 for 1000 unwinds/second from leaf frames all the way back to XRE_Main().
To use LUL, build with –enable-profiling –enable-optimize=”-g -O2″. I then start the desktop builds with the following environment variable settings:
MOZ_PROFILER_INTERVAL=1 MOZ_PROFILER_NEW=1 MOZ_PROFILER_VERBOSE=1 MOZ_PROFILER_MODE=native
In particular, setting MOZ_PROFILER_MODE=help gives more details.
On Android, a suitable magic incantation is:
adb logcat -c ; \ adb shell sh /system/bin/am start -S -n \ org.mozilla.fennec_sewardj/.App \ --es env0 MOZ_PROFILER_INTERVAL=1 \ --es env1 MOZ_PROFILER_MODE=native \ --es env2 MOZ_PROFILER_NEW=1 \ --es env3 MOZ_PROFILER_VERBOSE=1 \ --es env4 MOZ_PROFILER_STARTUP=1 ; \ adb logcat 2>&1 | tee logfile.txt
What next for LUL? I’d like to implement the space-saving schemes mentioned earlier. But more important, it would be nice to have developers using the SPS/LUL combination, so as to give real-use feedback. That will help to move it forward in the most immediately useful direction.