ETW: Event Tracing for Windows, part 4: collection

(See all of my ETW posts)
Enough theory. Let’s walk through examples of all this.

Providers

You can enumerate the available event providers either the old-school way:

 logman query providers

or the new way:

  xperf -providers I

(the ‘I’ is for ‘installed/known’). There is also

  xperf -providers R

(‘R’ for ‘registered’), which for me dumps out the same set of providers with less information and different formatting. Welcome to the joys of the Windows command line. The old-school logman tends to have saner output, but is mysteriously much slower when constructing the list. And finally, some people prefer

  wevtutil ep

(or you can expand ‘ep’ to ‘enum-publishers’), though that contains a bunch of other weird stuff that logman and xperf do not.

Each provider has a globally unique id (guid). The logman output and the xperf -providers I output show these together with the readable names.

Now let’s look at the flags and flag groups defined for the kernel provider:

  xperf -providers K

Notice, for instance, the kernel group ‘DiagEasy’, which maps to ‘PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+DPC+INTERRUPT+CSWITCH+PERF_COUNTER‘. If you look at the earlier part of the output, you can map each of those to a descriptive string. I don’t know how to map them to their hex values. I’ll give two untrustworthy ways, though. First, we’ll do it by starting a session using the kernel provider, then using logman to dump out its info:

  xperf -on DiagEasy
  logman query "NT Kernel Logger" -ets
  xperf -stop

In the “Provider:” section of the output, I see

  KeywordsAny: 0x237f (process,thread,img,proccntr,cswitch,dpc,isr,disk,file,hf)

which is very similar to the DiagEasy kernel group expansion if you squint a little. The second way is to first figure out what the provider’s name is. From the previous, that seems to be “Windows Kernel Trace” (not “Circular Kernel Session Provider”, though that would work too.) Now we can dump out all keywords and their corresponding hex values in a simple format:

  logman query providers "Windows Kernel Trace"

and attempt to match up these keywords with the PROC_THREAD etc. from the DiagEasy kernel group. Note that if we used “Circular Kernel Session Provider” in the above command, we’d get exactly the same set of keywords. Hm.

You can use the same logman command to investigate other providers. It will often give some additional information, such as levels (another way to tag events for selection; defaults are things like Error, Warning, and Informational).

I can’t figure out any way to get the xperf tool to display similar information, nor do I know how it stores the equivalence between its keyword-like things (“PROC_THREAD”) and logman’s understandable ones (“proc”=0x1,”thread”=0x2, with descriptive strings for each.)

Event Trace Sessions

Let’s say you have some sessions running. (And you do; Windows starts several up for you in its default configuration.) You can get a very basic listing with

  logman -ets

where it will refer to them as “Data Collector Sets”. (logman and perfmon talk about Data Collector Sets, which seem to be about the same as xperf’s event tracing sessions except that Data Collector Sets aren’t necessarily running; they can be lying in wait with Status=Stopped.) xperf, instead, refers to them as “Loggers” or more verbosely as “active logging sessions”, which seems a little more familiar. Run

  xperf -Loggers

and it will dump out each one, together with a whole bunch of information about each one’s configuration and state. Of particular interest is the “Trace Flags” field, which will give the xperf version of the flags/keyword bitmask. To experiment with this a bit, try running

  xperf -on DiagEasy
  xperf -Loggers "NT Kernel Logger"
  xperf -stop

You’ll see the Trace Flags match the DiagEasy kernel group you looked at before. You can also look at the Trace Flags for the “DiagLog” logger. It pulls data from a wide range of providers, requesting specific flags/keywords from each. You can use

  xperf -LoggersWithNames

to have it decode those settings even more.

Remember, one event tracing session (aka logger) can pull events from multiple providers, and we should be able to modify the set of providers at runtime. Let’s give that a try. First, start up a logger:

  xperf -start blogtest1 -on Microsoft-Windows-Kernel-Process:0x10

And ask what’s going on in three different ways:

  xperf -LoggersWithNames blogtest1
  logman -ets
  logman query blogtest1 -ets

xperf should tell you that you’re looking for the “WINEVENT_KEYWORD_PROCESS” keyword/flag from the “Microsoft-Windows-Kernel-Process” provider (the trailing 0xff means all levels, info+warning+error+whatever). The first logman invocation should just show you that ‘blogtest1’ is indeed running. The second tells you something similar to what xperf did, though in a format that splits out 3 things: the logger itself, something I don’t understand that seems to be the logging engine associated with the logger (it knows the buffer configuration, output file, etc.), and the provider that the logger (session) is sucking events from. Now let’s add in another provider:

  logman update blogtest1 -p "Windows Kernel Trace" -ets

I can’t figure out how to do the equivalent with xperf. I know it has an -update option, but it spits in my face whenever I try to use it. (This is a common feature in the command-line and even the GUI tools. There are invisible restrictions of various sorts, and when you do something not allowed, you get either silent failure or a seemingly unrelated error message back.)

Now when we run the querying commands, we see both providers listed. Now, let’s shut down:

  xperf -stop blogtest1

Or, if you prefer,

  logman stop blogtest1 -ets

Kernel vs non-Kernel Events

One note: the above probably only works because both providers are really within the kernel. If you tried to update with an application-based provider, you’d get back the cryptic “Property value conflict.” If you started with an application-based provider and used logman to update with a kernel provider, it would act completely happy, but only capture events from the application-based provider. And yet, if you stopped it with

  xperf -stop blogtest1 -d output.etl

it would merge relevant portions of the kernel log with the application logs to produce output.etl, or in other words, it would Do The Right Thing behind your back.

Also: at least in Windows 7, there are a number of providers named “Microsoft-Windows-Kernel-Something“. These seem to be manifest-based views onto the real kernel provider. Or something. On the web, I’ve found them mentioned as “Crimson” aka “Unified Windows Eventing 6.0” providers. At any rate, they seem to allow you to pull in events from the kernel as if it were a regular application provider. You can do things like eg logman query providers Microsoft-Windows-Kernel-Process and get a sense for which events each handles, and eventually reverse-engineer the keyword groups like ‘DiagEasy’ using a mixture of several providers. I like the idea of relays that make the kernel look exactly like any other provider, but I don’t know if that’s what’s actually happening. I’m still waiting for responses to my questions about this on ServerFault and er… the Windows forum seems to have eaten my post.

Predefined Tracing Configuration

Something I’ve been ignoring up until now is predefined tracing configuration, largely because it still baffles me. But it is a major part of all the tools.

XPerf seems to only talk about something called profiles, which are for “High-level ETW trace control” via “reusable, persisted configuration files (.xpp)”. logman has an entirely different mechanism, which seems to center around “Data Collector Sets” (or collectors for short). Collectors seem to be able to exist independently of any loggers actively using them. To see any dormant collectors you might have lying around, run logman with no options (not even the -ets that I’ve been throwing in everywhere without explaining.)

Data Collector Sets

XPerf’s profiles sound like something I’d want a GUI to generate, so let’s start with logman’s collectors:

  logman create trace blogtest1 -p Microsoft-Windows-Kernel-Process 0xffff

That creates a Data Collector Set named “blogtest1” with a specific provider (and keyword bitmask). Note that it is not running; it is just lying around somewhere ready for use. You can see it with

  logman

or dig into its details with

  logman query blogtest1

To make it not quite so trivial, let’s add some providers to it:

  logman update blogtest1 -p Microsoft-Windows-Kernel-Disk 0xffff
  logman update blogtest1 -p Microsoft-Windows-HttpService 0xffff

You probably want to query it again to see what a mess you’ve made. Now fire it up with

  logman start blogtest1

It now has status ‘Running’. The query output will tell you where it’s logging to: %systemdrive%\PerfLogs\Admin or more specifically in my case, C:\PerfLogs\Admin\blogtest1_000001.etl (I guess it really, really wants to be able to roll over from one log file to the next). Stop this running collector with

  logman stop blogtest1

So what’s the deal with the magic -ets argument? The logman -h output is mostly useless: “Send commands to Event Trace Sessions directly without saving or scheduling.” A technet article is much more illuminating: “Creates and starts an event trace session with the options specified on the command line”. By experimentation, it seems like this event trace session has no relationship to a Data Collector Set created without the -ets flag, even one with the same name. If you ran logman start blogtest1 -ets above, it would have created a trace session named ‘blogtest1’ with no providers. If you run logman (or equivalently, logman query), it shows the non-ets collectors. logman -ets or logman query -ets shows the with-ets collectors. xperf -Loggers only sees the with-ets collectors. I don’t understand the difference, but it seems to be fundamental and major.

The non-ets collectors look to be useful for automatically scheduling tracing by time of day and duration: see logman’s -b, -e, -r, and -rf options. I confess that I haven’t gotten those to work for me, though.

Multiple Providers

logman normally only allows you to specify one provider at a time. If you want to merge together the output from multiple providers, you have a few options:

  1. Use logman update to add the providers one at a time, as we did above
  2. Use the -pf option to point to a file containing providers, one per line (put the flags right after them, as you would on the -p command line)
  3. Use xperf instead of logman. XPerf lets you do -on provider1:0xff+provider2:0xffff+.... In fact, it gives you several other per-provider configuration options as well, ones that make sense in the manifest-based world.

Profiles

I haven’t used XPerf’s profiles at all, so I really can’t say anything beyond what xperf -help profiles tells you.

Data Sources

Early on, I made the claim that there are different types of providers: performance counters, event traces, configuration settings, and alerts. That may be bad terminology: those four can definitely be referred to as things you can put into Data Collector Sets, but “provider” might only refer to event trace providers. I’m not sure.

It looks to me like the configuration settings have been folded up as a provider in Windows Vista and later, at least by looking at

  logman query providers Microsoft-Windows-Kernel-Registry

It has keywords for all the different registry key operations. Note that you can see the same data in a more verbose and yuckier format with

  wevtutil gp Microsoft-Windows-Kernel-Registry

I have been bad about ignoring the wevtutil tool, but that’s mostly because I haven’t found it to be useful for much. (I use it to register my custom manifest-based provider, but even then, nothing makes very good use of that registered information.)

Alerts don’t seem to be a source of data so much as something that a logger can emit. If you look at the logman -h output, you’ll see an example of creating an alert to fire off when too much ‘Processor Time’ is used, whatever that means. Probably the best place to learn about alerts are the perfmon help pages. I’ll ignore them.

Performance counters sound much more interesting. logman can be used to automate their creation and management, but performance counters depend heavily on knowing what counters are available, so I would recommend using perfmon to deal with them. It provides a UI for selecting from the many, many counters available. It doesn’t seem to provide much assistance for the expression language involved, but once again, I don’t know enough on this topic to be talking. For now, I’ll just point you to perfmon. Perhaps when I start actively using all of this stuff, I’ll have more to say.

Other Resources

There are a lot of people talking about little bits and pieces of ETW on the web. Most weren’t useful for whatever sort of audience I happen to be, but this technet article is a better help message for logman than logman’s own -h output, and the MS forum FAQ is somewhat helpful. The Pigs Can Fly blog has some useful bits in it, from an experienced user, such as this one about using the Windows sample profiler with XPerf.

Stack Traces

You’ve probably noticed, but I tend to stick to using logman rather than xperf for most things. I just like its syntax better; it seems a little more rational. But there is one really big reason why you’ll probably end up using xperf in actual practice: stack traces. Unfortunately, I don’t have enough material on that to make up a blog post yet, so I’m going to wait until I get some more experience in analyzing traces and seeing what needs to get into them before writing that part.

Who Cares? or: How to Use Collected Traces

I must apologize for crawling through all of this boring detail on the trace collection side. The only thing interesting about any of this is how you use the collected traces, and I haven’t said anything about that yet.

Well, sorry. That’s for a later article. Partly because this one is already far too long, and partly because I haven’t figured out enough yet to be worth wasting your time reading.

Tags: , , ,

3 comments

  1. Thanks for the series. Having a somewhat long account of how all these pieces should be put together is helpful. Like you I’ve been trying to piece together how to use ETW the modern way by crawling through small blog posts and forum questions. Besides the Windows SDK, MSDN documents, and the Microsoft blog you mentioned in part 1 there is no other comprehensive overview that I know of.

    Here’s a question for you. Were you able to get further than this, and if so would you consider extending this series? I’m currently trying to get a manifest-based provider to log custom events to a channel set up for this purpose.

    One of ETW’s features is that the programmer is able to fire off events without having to manage whether or how logs are actually being generated. This is making things difficult to get started, however, as none of my events are showing up in the Event Viewer. Reading about your experience with this would be a great help to me.

  2. I am looking for sameples of how to set the “Filter” of a trace provider, in the PerfMon. Would you help writing something?
    Thanks!

  3. ETW is a bitch, with 7 different ways to do anything. Nice writing though 🙂