ETW: Event Tracing for Windows, part 2: field reporting

(See all of my ETW posts)

One use of event tracing is gathering field reports: the user notices something going horribly wrong, and is willing to help out by providing useful information. ETW is very valuable for this use, since 90% of Firefox users are on Windows and ETW logging comes standard with Windows. So here’s a guide to submitting useful trace info. If you are reporting a performance-related bug, it would probably be useful if you could offer up a .etl file with your bug report. (But you probably shouldn’t directly attach it to the bug if it’s big, and it probably will be.)

I’m going to give the “universal” option first, which will work on Windows XP on up with no additional installs needed:

  logman start "NT Kernel Logger" -o browser.etl -p "Windows Kernel Trace" 0x237f -bs 1024 -nb 10 100 -ets
  run your test scenario...
  logman stop "NT Kernel Logger" -ets

This will produce a file “browser.etl” containing all kinds of system-level tracing data. If you’re uploading it somewhere, you might want to compress it first. These files compress pretty well. I’ll explain what all the wacky options mean in a later post.

The other option is to use xperf to control the tracing. It is part of the Windows Performance Toolkit (WPT), which itself ships with the Windows 7 SDK. You do not need to install the entire SDK to get WPT! (The SDK is pretty huge.) So if you are running Vista or Windows 7 (or later, presumably):

  1. Download and run the Windows 7 SDK (click on the Download button).
  2. Select just the Windows Performance Toolkit components
  3. Install!

Here’s the important part where you select just the component you need, stolen shamelessly from a very good blog post giving precise instructions for doing exactly this.

If you are running the 64-bit version of Vista or Windows 7, you’ll need to do one more thing to get full functionality: run

REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f

This will allow you to collect stack traces.

If you are running Windows XP, it’s a little more work. You can do it, but you will need to copy things from a machine with WPT already installed. Make sure you are copying from the same architecture (64-bit binaries won’t work on 32-bit Windows XP). Copy over just the files xperf.exe and perfctrl.dll into somewhere in your $PATH. I would link to a copy of those files if I knew it was legal. On your Windows 7 machine with the WPT installed, it will probably be somewhere like C:\Program Files\Microsoft Windows Performance Toolkit\xperf.exe and you can copy it to C:\Windows on your XP machine.

Now you can run it:

  xperf -on DiagEasy -stackwalk profile
  run your scenario...
  xperf -stop -d browser.etl

This will gather more or less the same information as the logman command above, except it’ll also grab stacktraces during the profile events. I think it may even cram some additional xperf-specific details into the trace log, which make it easier to analyze the file on another machine according to the ntdebugging guy(s?).

Either way, link to this .etl file in a bug report and it will be much easier to make sense of what’s going wrong. The xperf-produced .etl files are preferred, since they supposedly have more goodies in them, but if installing xperf is too much trouble then the logman-generated ones are still quite handy.

If you want to see what you’ve collected, run

  xperf browser.etl

I’ll talk about how to interpret what you see later. This article is all about capturing traces, not analyzing them.

So what are those commands doing? To really answer this question, I’d need to go into the details of the ETW architecture, which will be a followup blog post. So I’ll just give a short version for now:

The logman command starts up a tracing session aka logger using the predefined name “NT Kernel Logger”, and connects it up to receive events from the provider named “Windows Kernel Trace”. The 0x237f option is a bitmask of what flags aka keywords to include in the trace.

  logman query providers "Windows Kernel Trace"

shows the list of available keyword bits for that provider and the descriptions of what they cover. The trace session is logged to the file browser.etl, and when the session is stopped all final events are flushed to it. The buffer settings are also configured to something reasonable.

The xperf command starts up the same logger with the DiagEasy group of keywords, which turns out to be the same as the 0x237f from the logman command. The log file is not specified, so it will default to \kernel.etl but we’re not going to directly use that file this time. Then the xperf -d command stops the logger and merges its output into browser.etl. According to the help message, it also adds “image identification information required for safe symbol decoding”, which sounds like a good thing.

Tags: , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.