November 3rd, 2010
(See all of my ETW posts)
Enough theory. Let’s walk through examples of all this.
You can enumerate the available event providers either the old-school way:
logman query providers
or the new way:
xperf -providers I
(the ‘I’ is for ‘installed/known’). There is also
xperf -providers R
(‘R’ for ‘registered’), which for me dumps out the same set of providers with less information and different formatting. Welcome to the joys of the Windows command line. The old-school logman tends to have saner output, but is mysteriously much slower when constructing the list. And finally, some people prefer
(or you can expand ‘ep’ to ‘enum-publishers’), though that contains a bunch of other weird stuff that logman and xperf do not.
Each provider has a globally unique id (guid). The logman output and the
xperf -providers I output show these together with the readable names.
Now let’s look at the flags and flag groups defined for the kernel provider:
xperf -providers K
Notice, for instance, the kernel group ‘DiagEasy’, which maps to ‘
PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+DPC+INTERRUPT+CSWITCH+PERF_COUNTER‘. If you look at the earlier part of the output, you can map each of those to a descriptive string. I don’t know how to map them to their hex values. I’ll give two untrustworthy ways, though. First, we’ll do it by starting a session using the kernel provider, then using logman to dump out its info:
xperf -on DiagEasy logman query "NT Kernel Logger" -ets xperf -stop
In the “Provider:” section of the output, I see
KeywordsAny: 0x237f (process,thread,img,proccntr,cswitch,dpc,isr,disk,file,hf)
which is very similar to the DiagEasy kernel group expansion if you squint a little. The second way is to first figure out what the provider’s name is. From the previous, that seems to be “Windows Kernel Trace” (not “Circular Kernel Session Provider”, though that would work too.) Now we can dump out all keywords and their corresponding hex values in a simple format:
logman query providers "Windows Kernel Trace"
and attempt to match up these keywords with the PROC_THREAD etc. from the DiagEasy kernel group. Note that if we used “Circular Kernel Session Provider” in the above command, we’d get exactly the same set of keywords. Hm.
You can use the same logman command to investigate other providers. It will often give some additional information, such as levels (another way to tag events for selection; defaults are things like Error, Warning, and Informational).
I can’t figure out any way to get the xperf tool to display similar information, nor do I know how it stores the equivalence between its keyword-like things (“PROC_THREAD”) and logman’s understandable ones (“proc”=0x1,”thread”=0x2, with descriptive strings for each.)
Event Trace Sessions
Let’s say you have some sessions running. (And you do; Windows starts several up for you in its default configuration.) You can get a very basic listing with
where it will refer to them as “Data Collector Sets”. (logman and perfmon talk about Data Collector Sets, which seem to be about the same as xperf’s event tracing sessions except that Data Collector Sets aren’t necessarily running; they can be lying in wait with Status=Stopped.) xperf, instead, refers to them as “Loggers” or more verbosely as “active logging sessions”, which seems a little more familiar. Run
and it will dump out each one, together with a whole bunch of information about each one’s configuration and state. Of particular interest is the “Trace Flags” field, which will give the xperf version of the flags/keyword bitmask. To experiment with this a bit, try running
xperf -on DiagEasy xperf -Loggers "NT Kernel Logger" xperf -stop
You’ll see the Trace Flags match the DiagEasy kernel group you looked at before. You can also look at the Trace Flags for the “DiagLog” logger. It pulls data from a wide range of providers, requesting specific flags/keywords from each. You can use
to have it decode those settings even more.
Remember, one event tracing session (aka logger) can pull events from multiple providers, and we should be able to modify the set of providers at runtime. Let’s give that a try. First, start up a logger:
xperf -start blogtest1 -on Microsoft-Windows-Kernel-Process:0x10
And ask what’s going on in three different ways:
xperf -LoggersWithNames blogtest1 logman -ets logman query blogtest1 -ets
xperf should tell you that you’re looking for the “WINEVENT_KEYWORD_PROCESS” keyword/flag from the “Microsoft-Windows-Kernel-Process” provider (the trailing 0xff means all levels, info+warning+error+whatever). The first logman invocation should just show you that ‘blogtest1′ is indeed running. The second tells you something similar to what xperf did, though in a format that splits out 3 things: the logger itself, something I don’t understand that seems to be the logging engine associated with the logger (it knows the buffer configuration, output file, etc.), and the provider that the logger (session) is sucking events from. Now let’s add in another provider:
logman update blogtest1 -p "Windows Kernel Trace" -ets
I can’t figure out how to do the equivalent with xperf. I know it has an -update option, but it spits in my face whenever I try to use it. (This is a common feature in the command-line and even the GUI tools. There are invisible restrictions of various sorts, and when you do something not allowed, you get either silent failure or a seemingly unrelated error message back.)
Now when we run the querying commands, we see both providers listed. Now, let’s shut down:
xperf -stop blogtest1
Or, if you prefer,
logman stop blogtest1 -ets
Kernel vs non-Kernel Events
One note: the above probably only works because both providers are really within the kernel. If you tried to update with an application-based provider, you’d get back the cryptic “Property value conflict.” If you started with an application-based provider and used logman to update with a kernel provider, it would act completely happy, but only capture events from the application-based provider. And yet, if you stopped it with
xperf -stop blogtest1 -d output.etl
it would merge relevant portions of the kernel log with the application logs to produce output.etl, or in other words, it would Do The Right Thing behind your back.
Also: at least in Windows 7, there are a number of providers named “Microsoft-Windows-Kernel-Something“. These seem to be manifest-based views onto the real kernel provider. Or something. On the web, I’ve found them mentioned as “Crimson” aka “Unified Windows Eventing 6.0″ providers. At any rate, they seem to allow you to pull in events from the kernel as if it were a regular application provider. You can do things like eg
logman query providers Microsoft-Windows-Kernel-Process and get a sense for which events each handles, and eventually reverse-engineer the keyword groups like ‘DiagEasy’ using a mixture of several providers. I like the idea of relays that make the kernel look exactly like any other provider, but I don’t know if that’s what’s actually happening. I’m still waiting for responses to my questions about this on ServerFault and er… the Windows forum seems to have eaten my post.
Predefined Tracing Configuration
Something I’ve been ignoring up until now is predefined tracing configuration, largely because it still baffles me. But it is a major part of all the tools.
XPerf seems to only talk about something called profiles, which are for “High-level ETW trace control” via “reusable, persisted configuration files (.xpp)”. logman has an entirely different mechanism, which seems to center around “Data Collector Sets” (or collectors for short). Collectors seem to be able to exist independently of any loggers actively using them. To see any dormant collectors you might have lying around, run logman with no options (not even the
-ets that I’ve been throwing in everywhere without explaining.)
Data Collector Sets
XPerf’s profiles sound like something I’d want a GUI to generate, so let’s start with logman’s collectors:
logman create trace blogtest1 -p Microsoft-Windows-Kernel-Process 0xffff
That creates a Data Collector Set named “blogtest1″ with a specific provider (and keyword bitmask). Note that it is not running; it is just lying around somewhere ready for use. You can see it with
or dig into its details with
logman query blogtest1
To make it not quite so trivial, let’s add some providers to it:
logman update blogtest1 -p Microsoft-Windows-Kernel-Disk 0xffff logman update blogtest1 -p Microsoft-Windows-HttpService 0xffff
You probably want to query it again to see what a mess you’ve made. Now fire it up with
logman start blogtest1
It now has status ‘Running’. The query output will tell you where it’s logging to:
%systemdrive%\PerfLogs\Admin or more specifically in my case,
C:\PerfLogs\Admin\blogtest1_000001.etl (I guess it really, really wants to be able to roll over from one log file to the next). Stop this running collector with
logman stop blogtest1
So what’s the deal with the magic -ets argument? The
logman -h output is mostly useless: “Send commands to Event Trace Sessions directly without saving or scheduling.” A technet article is much more illuminating: “Creates and starts an event trace session with the options specified on the command line”. By experimentation, it seems like this event trace session has no relationship to a Data Collector Set created without the -ets flag, even one with the same name. If you ran
logman start blogtest1 -ets above, it would have created a trace session named ‘blogtest1′ with no providers. If you run
logman (or equivalently,
logman query), it shows the non-ets collectors.
logman -ets or
logman query -ets shows the with-ets collectors.
xperf -Loggers only sees the with-ets collectors. I don’t understand the difference, but it seems to be fundamental and major.
The non-ets collectors look to be useful for automatically scheduling tracing by time of day and duration: see logman’s
-rf options. I confess that I haven’t gotten those to work for me, though.
logman normally only allows you to specify one provider at a time. If you want to merge together the output from multiple providers, you have a few options:
logman updateto add the providers one at a time, as we did above
- Use the
-pfoption to point to a file containing providers, one per line (put the flags right after them, as you would on the
- Use xperf instead of logman. XPerf lets you do
-on provider1:0xff+provider2:0xffff+.... In fact, it gives you several other per-provider configuration options as well, ones that make sense in the manifest-based world.
I haven’t used XPerf’s profiles at all, so I really can’t say anything beyond what
xperf -help profiles tells you.
Early on, I made the claim that there are different types of providers: performance counters, event traces, configuration settings, and alerts. That may be bad terminology: those four can definitely be referred to as things you can put into Data Collector Sets, but “provider” might only refer to event trace providers. I’m not sure.
It looks to me like the configuration settings have been folded up as a provider in Windows Vista and later, at least by looking at
logman query providers Microsoft-Windows-Kernel-Registry
It has keywords for all the different registry key operations. Note that you can see the same data in a more verbose and yuckier format with
wevtutil gp Microsoft-Windows-Kernel-Registry
I have been bad about ignoring the
wevtutil tool, but that’s mostly because I haven’t found it to be useful for much. (I use it to register my custom manifest-based provider, but even then, nothing makes very good use of that registered information.)
Alerts don’t seem to be a source of data so much as something that a logger can emit. If you look at the
logman -h output, you’ll see an example of creating an alert to fire off when too much ‘Processor Time’ is used, whatever that means. Probably the best place to learn about alerts are the perfmon help pages. I’ll ignore them.
Performance counters sound much more interesting. logman can be used to automate their creation and management, but performance counters depend heavily on knowing what counters are available, so I would recommend using perfmon to deal with them. It provides a UI for selecting from the many, many counters available. It doesn’t seem to provide much assistance for the expression language involved, but once again, I don’t know enough on this topic to be talking. For now, I’ll just point you to perfmon. Perhaps when I start actively using all of this stuff, I’ll have more to say.
There are a lot of people talking about little bits and pieces of ETW on the web. Most weren’t useful for whatever sort of audience I happen to be, but this technet article is a better help message for logman than logman’s own -h output, and the MS forum FAQ is somewhat helpful. The Pigs Can Fly blog has some useful bits in it, from an experienced user, such as this one about using the Windows sample profiler with XPerf.
You’ve probably noticed, but I tend to stick to using
logman rather than
xperf for most things. I just like its syntax better; it seems a little more rational. But there is one really big reason why you’ll probably end up using
xperf in actual practice: stack traces. Unfortunately, I don’t have enough material on that to make up a blog post yet, so I’m going to wait until I get some more experience in analyzing traces and seeing what needs to get into them before writing that part.
Who Cares? or: How to Use Collected Traces
I must apologize for crawling through all of this boring detail on the trace collection side. The only thing interesting about any of this is how you use the collected traces, and I haven’t said anything about that yet.
Well, sorry. That’s for a later article. Partly because this one is already far too long, and partly because I haven’t figured out enough yet to be worth wasting your time reading.