Skip to content

De/serialization performance improvements and span computation improvements#1810

Draft
joshhaug wants to merge 3 commits intodevelopfrom
perf/jackson
Draft

De/serialization performance improvements and span computation improvements#1810
joshhaug wants to merge 3 commits intodevelopfrom
perf/jackson

Conversation

@joshhaug
Copy link
Copy Markdown
Contributor

  • Review: By file
  • Merge strategy: squash and merge

Description

1. Jackson Streaming Migration

  • Moved away from the legacy javax.json parser, testing shows Jackson is substantially faster.
  • Added writeJson(JsonGenerator) to ValueMapper and writeTo(JsonGenerator) to SerializedValue for direct unparsing
  • In-mem objects now serialize directly to JSON streams, bypassing intermediate JsonValue or JsonNode allocations.

2. SerializedValue Optimization

  • Added native double support to SerializedValue to prevent BigDecimal overhead for double-precision resource samples and params.

3. Simulation Engine Speedups

  • SimulationEngine now pre-indexes serializable topics using an IdentityHashMap which means O(1) topic indexing. I think this is ok, but I'm curious to know what the Experts have to say.
  • Replaced the $O(N)$ topic scan in computeSpanInfo with constant-time lookups

4. Result I/O Improvements

  • SimulationResultsWriter is refactored to stream large results directly to disk/network using JsonGenerator.
  • PostgresProfileQueryHandler now streams jsonb data directly

Note: SerializedValue.Visitor now requires implementation of onDouble(double).

Verification

  • SerializationBenchmark: Micro-benchmarks comparing legacy vs. streaming paths.
  • JacksonStreamingTortureTest: Verification of complex nested record serialization.

* 3. DeferredSerializedValue: lazy caching + direct streaming
* 4. Large composite types: deeply nested records with collections
*/
class JacksonStreamingTortureTest {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any objection to calling it a "stress test"? Torture is a little harsh sounding

Comment on lines +41 to +43
default void writeJson(T value, JsonGenerator gen) throws IOException {
serializeValue(value).writeTo(gen);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context: the purpose of SerializedValue is to provide a format-agnostic interface (so you could just as easily write to avro, orc, parquet, etc).

I'm trying to decide whether or not writeJson fits that goal. I suppose I would defend it a couple ways:

  • it's a default method, which means you don't have to implement it unless you need to for performance reasons
  • JSON is currently the only format we use in practice

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought occurred to me: it seems to me the key innovation of this PR is not the ability to bypass the SerializedValue interface and write JSON directly, but the ability to serialize without building up a parallel in-memory data structure.

I think it's worth exploring an interface that does effectively the same thing as this PR at runtime, but at compile time abstracts away the "JSON" part. I think it would look very similar to writeJson, but we'd wrap JsonGenerator in an interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants