torch.compile ae.decode by yorickvP · Pull Request #25 · replicate/cog-flux

yorickvP · 2024-09-27T18:27:40Z

It takes about 80 seconds on my machine to compile this. Makes the encoding step about 50% faster on A5000 (0.3 -> 0.2s), haven't tried H100.

daanelson

this is great! you can push to an internal H100 model (just don't leave it running 😄) on Replicate to test perf in prod, good to have solid metrics on that before we merge

daanelson · 2024-09-27T22:47:36Z

predict.py

@@ -166,12 +167,65 @@ def base_setup(
            shared_models=shared_models,
        )



nit - since these flags are just simple little flags we set setup for dev/schnell predictor, I don't mind adding a separate compile_ae flag

daanelson · 2024-09-27T22:56:17Z

predict.py

+        # the order is important:
+        # torch.compile has to recompile if it makes invalid assumptions
+        # about the input sizes. Having higher input sizes first makes
+        # for fewer recompiles.


any way we can compile once with craftier use of dynamo.mark_dynamic - add a max=192 on dims 2 & 3? I assume you've tried this, curious how it breaks

I tried max=192, but it didn't have any effect. Setting torch.compile(dynamic=True) makes for one fewer recompile, but I should check the runtime performance of that.

yorickvP · 2024-10-01T19:01:06Z

Did some H100 benchmarks.

flux-schnell 1 image, VAE not compiled

30ms prepare
355ms denoise-single-item
117ms vae-decode
total: 505ms

flux-schnell 4 images, VAE not compiled

30 ms prepare
4x 355 ms denoise-single-item
3.21s vae-decode
total: 4.69s

flux-schnell 4 images, VAE compiled

30ms prepare
4x 355 ms denoise-single-item
152ms vae-decode
total: 1.62s

The VAE speed seems reproducible, where the uncompiled VAE spends a lot of time in nchwToNhwcKernel while the compiled version manages to avoid it.

At the same time, I had a cog bug saying output streams failed to drain, crashing the pod instantly, but this seems unrelated to my PR.

jonluca · 2024-10-17T22:23:45Z

Did you figure out what the output streams failed to drain issue was? I'm seeing that in prod with our cog deploy as well

yorickvP · 2024-10-18T06:37:36Z

@jonluca as I understand it, it was a regression in cog and should be fixed when building with 0.9.25 and later.
It was caused by cog replacing stdout/stderr during predictions, but not during setup, causing forked processes to attempt to write to the original stdout/stderr. Should be fixed in replicate/cog#1969 but let me know if it's not!

yorickvP requested a review from daanelson September 27, 2024 18:27

yorickvP force-pushed the yorickvp/torch-compile-vae branch from 461db42 to 99cecf1 Compare September 27, 2024 19:01

daanelson reviewed Sep 27, 2024

View reviewed changes

yorickvP added 4 commits October 10, 2024 18:00

torch.compile ae.decode

08fc060

Fix ruff errors

4dd56f6

Add compile_ae argument to base_setup

f676e86

fp8/pipeline: add some profiler annotation for prepare/denoise/vae

0039a42

yorickvP force-pushed the yorickvp/torch-compile-vae branch from 99cecf1 to 0039a42 Compare October 10, 2024 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.compile ae.decode#25

torch.compile ae.decode#25
yorickvP wants to merge 4 commits intomainfrom
yorickvp/torch-compile-vae

yorickvP commented Sep 27, 2024 •

edited

Loading

Uh oh!

daanelson left a comment

Uh oh!

daanelson Sep 27, 2024

Uh oh!

daanelson Sep 27, 2024

Uh oh!

yorickvP Sep 29, 2024

Uh oh!

yorickvP commented Oct 1, 2024

Uh oh!

jonluca commented Oct 17, 2024

Uh oh!

yorickvP commented Oct 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -166,12 +167,65 @@ def base_setup(
		shared_models=shared_models,
		)

Conversation

yorickvP commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daanelson left a comment

Choose a reason for hiding this comment

Uh oh!

daanelson Sep 27, 2024

Choose a reason for hiding this comment

Uh oh!

daanelson Sep 27, 2024

Choose a reason for hiding this comment

Uh oh!

yorickvP Sep 29, 2024

Choose a reason for hiding this comment

Uh oh!

yorickvP commented Oct 1, 2024

flux-schnell 1 image, VAE not compiled

flux-schnell 4 images, VAE not compiled

flux-schnell 4 images, VAE compiled

Uh oh!

jonluca commented Oct 17, 2024

Uh oh!

yorickvP commented Oct 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yorickvP commented Sep 27, 2024 •

edited

Loading