Skip to content

Parser performance#273

Merged
swannman merged 3 commits intomicrosoft:masterfrom
jdu2600:parser_performance
Mar 27, 2026
Merged

Parser performance#273
swannman merged 3 commits intomicrosoft:masterfrom
jdu2600:parser_performance

Conversation

@jdu2600
Copy link
Copy Markdown
Contributor

@jdu2600 jdu2600 commented Mar 25, 2026

I'm writing a modern Rust ETW parsing library, and was doing some parsing benchmarking.

I was surprised at just how more performant my approach was againt existing approaches so I did a little digging.

The main issue for krabs seemed to be a linear scan over a per-event property cache. This is effectively an O(N2) scan if you grab all properties - and the cache actually only hits if you grab the same property twice. 😱

Plus wstring temporaries.

Event Type krabsetw 4.4.8 krabsetw 4.4.6 ferrisetw 1.2.0 TdhFormatProperty TdhFormatProperty (naive) TdhGetProperty
Process Start (16 fields) 1286 ns 2522 ns 2503 ns 541 ns 1218 ns 3684 ns
RPC Client Call (9 fields) 516 ns 771 ns 1213 ns 338 ns 481 ns 1755 ns
Registry CreateKey (6 fields) 331 ns 467 ns 749 ns 548 ns 700 ns 1577 ns
File Create (7 fields) 410 ns 872 ns 867 ns 294 ns 489 ns 1146 ns
Thread Start (10 fields) 485 ns 1055 ns 905 ns 201 ns 849 ns 1482 ns
  • All measurements are per-event averages (ns) over the same captured set of events run on my laptop.

jdu2600 and others added 3 commits March 25, 2026 16:59
Replace the per-event deque/linear-scan property cache with a two-level
scheme: a persistent name-to-index map in schema_locator (built once per
event type, shared across all events) and a flat vector<property_info>
in parser indexed by property position. This eliminates per-event hash
table allocation and removes string comparisons during the blob walk.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace const std::wstring& with std::wstring_view in parser's
public API (parse, try_parse, view_of) and internal helpers
(find_property, assert_valid_assignment, throw_if_invalid).

This eliminates heap-allocating std::wstring temporaries when
callers pass string literals (the common case). Property names
longer than MSVC's SSO threshold of 7 wchar_t previously caused
a heap allocation and free per field access.

Not a breaking change - std::wstring_view is implicitly
constructible from both std::wstring and const wchar_t*.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@swannman swannman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outstanding work @jdu2600 - thank you for doing this!

@swannman swannman merged commit f0934fd into microsoft:master Mar 27, 2026
2 checks passed
Copy link
Copy Markdown
Member

@kylereedmsft kylereedmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. I know this has already been merged, but this change raises some new ideas that we should consider.

* Returns nullptr if pSchema is null or not in the cache.
* </summary>
*/
const property_name_map* get_property_names(const TRACE_EVENT_INFO* pSchema) const;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like the right place for this. schema_locator shouldn't be aware of names. That's a responsibility for the schema or maybe the parser.

* Keys are wstring_views pointing into stable TRACE_EVENT_INFO memory.
* </summary>
*/
using property_name_map = std::unordered_map<std::wstring_view, ULONG>;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just store the name->property_info directly. This still feels like something that should be on the schema instead of in the locator. Maybe on the trace_context?


// Maintain a mapping from property name to blob data index.
std::deque<std::pair<const wchar_t *, property_info>> propertyCache_;
// Persistent name to index map shared across all events of the same type.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this layer of caching even makes sense. It's just overhead.
I can't think of any reason not to just maintain a map of name->property_info from the schema.

I'm fuzzy on all the various event type schemas, but looking at this code, it seems like the property_info blobs are expected to be consistent. Just trying to think through whether we ever need any of the old fallback code if we pre-populate and cache.

@swannman
Copy link
Copy Markdown
Member

Thanks @kylereedmsft for the review - @jdu2600 if you have a moment, would you take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants