Zlib compress all wasm files and decompress them during prefetch #170

eqrion · 2025-08-29T16:47:18Z

Partially fixes #154. A full fix would target some JS files. Opening to get early thoughts on it, there are some integration questions.

JetStreamDriver.js is extended to decompress .z files using zlib during prefetch. If prefetch is disabled, these files are still prefetched to ensure the decompression time is outside of the score. In the browser this uses DecompressionStream. In the shell this uses the zlib-wasm code to decompress the file.
A compress.py script is added that finds all wasm files and compresses them with zlib and then removes the originals.

Open questions:

Is it worth using something other than zlib? We need to support the shell, and I didn't want to vendor in a new library just for this.
Should we keep all the original uncompressed files? This patch doesn't, but instead the compress.py script can automatically decompress all the files in the tree for anyone who wants to read the build artifacts.
Should the compression happen in each individual build script or one central file for the repo? I sort of liked having it in one file because then I could implement automatic decompression for all builds easily. But it adds an extra step when building that might not be obvious.

netlify · 2025-08-29T16:47:24Z

✅ Deploy Preview for webkit-jetstream-preview ready!

Name	Link
🔨 Latest commit	`5ff312f`
🔍 Latest deploy log	https://app.netlify.com/projects/webkit-jetstream-preview/deploys/68c9d2a0c8370200082ddd89
😎 Deploy Preview	https://deploy-preview-170--webkit-jetstream-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

camillobruni · 2025-09-01T16:38:21Z

We should go with a npm run script for decompression just for consistency with the rest
I'd be fine with the .z files given the easy way to get to the decompressed .wasm files
Not sure how folks feel about --no-prefetch and wasm in this case (at least for JS I'd want to have the uncompressed source files there so I can easily see the source file path in the raw profile), maybe we need to warn about this and just force manually run npm run decompress
I think you altered the code for &prefetchResources=false for JS-blobs in the browser – we should keep on using the raw sources there (see New Workload: prismjs source code highlighting #149 for an example)

danleh · 2025-09-01T16:41:26Z

Very cool! I'll leave some detailed comments on the PR next, but responding first to your question 1:

Is it worth using something other than zlib?

No, I don't think that's worth the hassle. Some quick data / experiment: I copied all .wasm files plus this list of "large input files" (including some model files from #148)

./transformersjs/build/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/onnx/model_uint8.onnx
./transformersjs/build/models/Xenova/whisper-tiny.en/onnx/decoder_model_merged_quantized.onnx
./transformersjs/build/models/Xenova/whisper-tiny.en/onnx/encoder_model_quantized.onnx
./transformersjs/build/models/Xenova/whisper-tiny.en/tokenizer.json
./transformersjs/build/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/tokenizer.json
./wasm/tfjs-model-coco-ssd.js
./wasm/tfjs-model-mobilenet-v1.js
./wasm/tfjs-model-mobilenet-v3.js
./wasm/tfjs-bundle.js
./wasm/tfjs-model-use.js
./wasm/dotnet/build-interp/wwwroot/_framework/icudt_no_CJK.dat
./wasm/dotnet/build-aot/wwwroot/_framework/icudt_no_CJK.dat
./wasm/dotnet/build-interp/wwwroot/_framework/icudt_CJK.dat
./wasm/dotnet/build-aot/wwwroot/_framework/icudt_CJK.dat
./wasm/dotnet/build-interp/wwwroot/_framework/icudt_EFIGS.dat
./wasm/dotnet/build-aot/wwwroot/_framework/icudt_EFIGS.dat
./SeaMonster/inspector-json-payload.js
./code-load/inspector-payload-minified.js

and compared different compression methods:

method	size	relative to uncompressed	relative to zlib
uncompressed	243MiB	100%	181%
zlib (script from this PR, uses -6 by default IIUC)	134MiB	55%	100%
`gzip -9`	134MiB	55%	99.6%
`zstd -19`	127MiB	52%	94.7%

I don't think those small savings of a better algorithm / library are worth adding another dependency for. (And we also could no longer use CompressionStream in the browser, since that only seems to support DEFLATE and gzip [spec])

danleh · 2025-09-01T17:42:53Z

Regarding the other points (including Camillo's):

We should go with a npm run script for decompression just for consistency with the rest

+1 to staying in the JavaScript/npm ecosystem. Would be happy to provide a port / alternative to compress.py in JavaScript in a PR later today.

Should we keep all the original uncompressed files?

One reason for this change was to make the repository smaller on disk (excluding .git/) for vendoring JetStream, so let's not keep the uncompressed files checked in. Also in particular for Wasm files or machine learning model weights, one cannot diff them conveniently anyway, e.g., when reviewing PRs here, so I don't see much value in keeping them. Having a simple script to uncompress sounds good enough.

Should the compression happen in each individual build script or one central file for the repo?

I agree that it's convenient to have a single script to decompress everything (in particular given the next point by Camillo). But I would like the build scripts to be self-contained / a single step; otherwise I think it's easy to forget or at least annoying having to run another python3 compress.py (or npm run compress) command after each build, e.g., when updating a workload. The compress command could take a list of files as input (including glob patterns), e.g., npm compress **/*.{wasm,dat} in the build script, and npm decompress could use **/*.z as the pattern by default.

Not sure how folks feel about --no-prefetch and wasm in this case (at least for JS I'd want to have the uncompressed source files there so I can easily see the source file path in the raw profile), maybe we need to warn about this and just force manually run npm run decompress

Agreed; right now compression always forces blob URLs. How about disabling decompression and stripping .z from each path when prefetchResources=false / --no-prefetch is given (i.e., make compression and no-preload-mode mutually exclusive) and then add something like Disabling resource prefetching! Also, please run 'npm decompress' to provide all the uncompressed resources in case you see failing requests or missing files. to the warning in

JetStream/JetStreamDriver.js

Line 85 in 5be6cdc

console.warn("Disabling resource prefetching!");

Based on `compress.py` from WebKit#170, with some modifications: - Can be run as `npm run compress` or simply `node compress.mjs` - Uses best zlib compression setting. - Takes arbitrary glob patterns for input files, defaulting to all .z files for decompression. - Copies the file mode over to avoid spurious git diffs.

Will also be useful for WebKit#170

JetStreamDriver.js

compress.py

JetStreamDriver.js

compress.py

JetStreamDriver.js

eqrion · 2025-09-02T16:24:52Z

Thanks for the reviews!

I like the idea of using node for the compression script, will use #172 once it has merged. And also the shared polyfill for TextDecoder.

* Not sure how folks feel about `--no-prefetch` and wasm in this case (at least for JS I'd want to have the uncompressed source files there so I can easily see the source file path in the raw profile), maybe we need to warn about this and just force manually run `npm run decompress`

Yeah that seems like a better path than just silently re-enabling prefetching for those files. I'll implement that.

    Should we keep all the original uncompressed files?
One reason for this change was to make the repository smaller on disk (excluding .git/) for vendoring JetStream, so let's not keep the uncompressed files checked in. Also in particular for Wasm files or machine learning model weights, one cannot diff them conveniently anyway, e.g., when reviewing PRs here, so I don't see much value in keeping them. Having a simple script to uncompress sounds good enough.

As long as the uncompressed files are not used by the default runner, it is fine for them to be checked in. I can exclude them when vendoring in the JS3 repo into Firefox, and only copy over the .z files. But also it does seem nice to only have one canonical version of things.

What might change this is if we wanted to compress JS files too (which can be diff'ed and inspected easily). From #154, there were three large JS files (excluding tfjs which is disabled) that could be good candidates for this:

12      ./web-tooling-benchmark/cli.js
12      ./web-tooling-benchmark/browser.js
12      ./RexBench/FlightPlanner/waypoints.js

How do folks feel about compressing JS too? If that's okay with folks, then we probably should keep the uncompressed versions around.

I agree that it's convenient to have a single script to decompress everything (in particular given the next point by Camillo). But I would like the build scripts to be self-contained / a single step; otherwise I think it's easy to forget or at least annoying having to run another python3 compress.py (or npm run compress) command after each build, e.g., when updating a workload. The compress command could take a list of files as input (including glob patterns), e.g., npm compress **/.{wasm,dat} in the build script, and npm decompress could use **/.z as the pattern by default.

That's fine with me too. I was just running out of time on Friday and wanted to have something quicker. Updating all the build scripts probably isn't too bad.

camillobruni · 2025-09-02T21:51:28Z

Thanks for kicking this off 👍

+1 on compressing large JS files too – given that this would just work transparently with prefetching!
Some of my pending PRs do have indeed huge files.
If we add an npm run shell ... helper or so, we could even hide the decompression transparently – so that would be fine

danleh · 2025-09-03T10:39:43Z

#172 landed, so feel free to use / rebase this on top of it.

~~Also +1 to compress large JS files.~~

As discussed, that could still work without keeping the original / uncompressed files in the repo. Basically the default config would do preloading and decompression during that preloading, so no uncompressed files on disk required. And without preloading, we just rewrite the URLs/file loads to strip .z and 404 / error out if not present, thus requiring to run npm compress -- -d beforehand (optionally integrated into a single step with npm run shell or npm run server as Camillo proposed).

Edit: Re-reading/thinking about the arguments, I am not so sure about compressing source JS files any more. (JS files that are just like blobs, generated, and won't be manually modified, e.g., inputs for babel are fine to compress.) Keeping uncompressed JS source files around for diff/code review/maintenance sounds like a good idea. In terms of transfer size during loading, there won't be any benefit to compressing in the repo, since a competent web server will use some compression scheme anyway. E.g., the Netlify preview uses brotli (see screenshot, ~2MB vs ~12MB uncompressed for waypoints.js)

eqrion · 2025-09-04T21:18:30Z

@danleh @camillobruni

Here's an alternative idea. What if we just left all of the files in this tree uncompressed, and only added support to JetStreamDriver for decompressing? It would then be up to anyone vendoring the tree to compress whatever files they want and rewrite the paths in the driver. I can have a script that does this as part of the mozilla vendoring process.

We wouldn't need to update any build scripts, or do anything for disablePrefetching+compression (we wouldn't be doing that on the vendored copy). I probably could drop all the shell polyfilling for zlib too because we only would be running the vendored copy in the browser. We'd also continue to get good diff's for free.

danleh · 2025-09-08T13:49:17Z

Since we could also benefit from the compression when vendoring this into the Chromium code base, I'd prefer the original idea, where at least Wasm binaries and some other non-source files are compressed and stored in-tree. CC @camillobruni

camillobruni · 2025-09-08T13:52:05Z

Our concern is mainly checked out size (the git repos should compress data internally if I'm not mistaken).
In our use-case the majority of chrome devs will not use the jetstream sources and thus having the checked-out files being compressed is probably nicer.

eqrion · 2025-09-08T18:42:29Z

Sounds good, I'll continue with the original plan.

eqrion · 2025-09-11T20:35:30Z

I've updated the patch now. All of CI seems to be good except for safari. Not sure what's up with that.

This patch only compresses a few files. I think it'd just be good to start small and expand after this lands.

JetStreamDriver.js

danleh

It seems the --no-prefetch mode is broken in v8 and jsc now, but SpiderMonkey shell seems to work still.

V8:

~/JetStream$ v8 cli.js -- --no-prefetch argon2-wasm
console.warn: Disabling resource prefetching! All compressed files must have been decompressed using `node utils/compress.mjs -d`
Starting JetStream3
Running argon2-wasm:
console.error: JetStream3 failed: ReferenceError: ShellPrefetchedResources is not defined
console.error: ReferenceError: ShellPrefetchedResources is not defined
    at JetStream.getBinary ((d8):28:21)
    at Benchmark.init (./wasm/argon2/benchmark.js:80:45)
    at doRun ((d8):4:35)
    at (d8):27:9
    at globalObject.loadString (./JetStreamDriver.js:687:30)
    at ShellScripts.run (./JetStreamDriver.js:724:26)
    at WasmEMCCBenchmark.run (./JetStreamDriver.js:939:34)
    at Driver.start (./JetStreamDriver.js:293:33)
    at async runJetStream (cli.js:68:9)
cli.js:74: ReferenceError: ShellPrefetchedResources is not defined
}
^
ReferenceError: ShellPrefetchedResources is not defined
    at JetStream.getBinary ((d8):28:21)
    at Benchmark.init (./wasm/argon2/benchmark.js:80:45)
    at doRun ((d8):4:35)
    at (d8):27:9
    at globalObject.loadString (./JetStreamDriver.js:687:30)
    at ShellScripts.run (./JetStreamDriver.js:724:26)
    at WasmEMCCBenchmark.run (./JetStreamDriver.js:939:34)
    at Driver.start (./JetStreamDriver.js:293:33)
    at async runJetStream (cli.js:68:9)

1 pending unhandled Promise rejection(s) detected.

JSC:

~/JetStream$ jsc cli.js -- --no-prefetch argon2-wasm
Warn: Disabling resource prefetching! All compressed files must have been decompressed using `node utils/compress.mjs -d`
Starting JetStream3
Running argon2-wasm:
Error: JetStream3 failed: ReferenceError: Can't find variable: ShellPrefetchedResources
Error: @
@
init@/home/dlehmann/JetStream/wasm/argon2/benchmark.js:80:54
init@/home/dlehmann/JetStream/wasm/argon2/benchmark.js:79:20
doRun@
doRun@
global code@
loadString@[native code]
run@/home/dlehmann/JetStream/JetStreamDriver.js:724:36
run@/home/dlehmann/JetStream/JetStreamDriver.js:939:37
run@/home/dlehmann/JetStream/JetStreamDriver.js:887:19
start@/home/dlehmann/JetStream/JetStreamDriver.js:293:36

JetStreamDriver.js

eqrion · 2025-09-16T21:13:29Z

@danleh Ah, that wasn't caught because we didn't test prefetch=false with a benchmark that has compressed files. I added argon2-wasm to the test list for that configuration to address that.

danleh mentioned this pull request Sep 1, 2025

Add a zlib-based compress.mjs script #172

Merged

danleh added a commit to danleh/JetStream that referenced this pull request Sep 2, 2025

Move and share TextEncoder/TextDecoder polyfill

7f080a0

Will also be useful for WebKit#170

danleh mentioned this pull request Sep 2, 2025

Move and share shell TextEncoder/TextDecoder polyfill #173

Merged

danleh added a commit to danleh/JetStream that referenced this pull request Sep 2, 2025

Move and share TextEncoder/TextDecoder polyfill

78eea34

Will also be useful for WebKit#170

danleh reviewed Sep 2, 2025

View reviewed changes

eqrion added 3 commits September 11, 2025 15:12

Decompress .z files in prefetch

44ec925

Compress argon2.wasm

3538bbe

Compress some large JS files

7b31e0e

eqrion force-pushed the zlib-compress branch from c3dce11 to 7b31e0e Compare September 11, 2025 20:23

camillobruni reviewed Sep 12, 2025

View reviewed changes

JetStreamDriver.js Outdated Show resolved Hide resolved

camillobruni reviewed Sep 12, 2025

View reviewed changes

JetStreamDriver.js Outdated Show resolved Hide resolved

camillobruni reviewed Sep 12, 2025

View reviewed changes

JetStreamDriver.js Outdated Show resolved Hide resolved

danleh reviewed Sep 16, 2025

View reviewed changes

JetStreamDriver.js Outdated Show resolved Hide resolved

eqrion added 3 commits September 16, 2025 16:10

Remove debugging logging

85eacd5

Test compressed files without prefetching

b382dc5

Address review comments

5ff312f

Zlib compress all wasm files and decompress them during prefetch #170

Are you sure you want to change the base?

Zlib compress all wasm files and decompress them during prefetch #170

Uh oh!

Conversation

eqrion commented Aug 29, 2025

Uh oh!

netlify bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for webkit-jetstream-preview ready!

Uh oh!

camillobruni commented Sep 1, 2025

Uh oh!

danleh commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danleh commented Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eqrion commented Sep 2, 2025

Uh oh!

camillobruni commented Sep 2, 2025

Uh oh!

danleh commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eqrion commented Sep 4, 2025

Uh oh!

danleh commented Sep 8, 2025

Uh oh!

camillobruni commented Sep 8, 2025

Uh oh!

eqrion commented Sep 8, 2025

Uh oh!

eqrion commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danleh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eqrion commented Sep 16, 2025

Uh oh!

Uh oh!

netlify bot commented Aug 29, 2025 •

edited

Loading

danleh commented Sep 1, 2025 •

edited

Loading

danleh commented Sep 3, 2025 •

edited

Loading