Skip to content

Conversation

@GuillaumeGomez
Copy link
Member

We have a not great handling of whitespace characters in our templates. The fix doesn't look great either, however the result is pretty nice. Here are some numbers:

page before with this PR diff
https://docs.rs/sysinfo/0.29.7/sysinfo/index.html 51819 44845 -13.4%
https://docs.rs/about 16603 14048 -15.4%
https://docs.rs/crate/sysinfo/0.29.7/builds 23524 19768 -15.97%

@github-actions github-actions bot added the S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed label Sep 14, 2023
@Nemo157
Copy link
Contributor

Nemo157 commented Sep 14, 2023

Does whitespace impact the parser speed much? I assume this will make very little transfer size difference post-compression.

@GuillaumeGomez
Copy link
Member Author

Isn't the parsing done once when we start the server and never again normally?

@Nemo157
Copy link
Contributor

Nemo157 commented Sep 14, 2023

I mean the web browser parsing the response, since that's the only thing I could see this really impacting.

(If we do want to reduce the page size, rather than having to write minified templates I would consider just running every request through a middleware using something like minify_html).

@GuillaumeGomez
Copy link
Member Author

I mean the web browser parsing the response, since that's the only thing I could see this really impacting.

I wonder about the data transfer size too. Even though it's compressed, if it's more than 1% diff, considering the traffic we have, I think it's still worth it.

@syphar
Copy link
Member

syphar commented Sep 15, 2023

I lean towards what @Nemo157 said here: if we want to reduce page-size, a middleware with a minimizer would be the better approach, because it would also minify the rustdoc html.

But: I could imagine even with a minimizer, the size impact (compressed) is negligible, same as the impact on parsing speed in a browser.

And this is coming with an odd (?) style in our templates, which will be difficult to maintain with future changes coming from contributors.

That being said, this doesn't hurt. When I'm back from my short trip I'll have a look & will test in detail.

@Nemo157
Copy link
Contributor

Nemo157 commented Sep 16, 2023

gzipped transfer sizes:

page before with this PR diff with minify-html diff
https://docs.rs/sysinfo/0.29.7/sysinfo/index.html 13.08kB 12.47kB -4.7% 12.03kB -8.0%
https://docs.rs/about 5.38kB 5.03kB -6.5% 4.79kB -10.9%
https://docs.rs/crate/sysinfo/0.29.7/builds 6.12kB 5.61kB -8.3% 5.37kB -12.3%

(minify-html gets -23%, -30%, -29% respectively in the uncompressed html sizes)

@GuillaumeGomez
Copy link
Member Author

So with just my changes it's already between 4% and 8% reduction. However I'm very curious about what minify_html is doing because unless they change the DOM, I don't see how they can achieve this result.

@Nemo157
Copy link
Contributor

Nemo157 commented Sep 16, 2023

I downloaded the /about page from this PR and my change, then ran them both through tidy --indent yes --wrap 0 --sort-attributes alpha to normalize them, results: https://gist.github.com/Nemo157/fccc65481dc61f3409f8865bc4f8d452

Looking at the diff the only structural differences are: removal of comments inside the font-awesome svgs and removal of type="javascript" and type="text" attributes (the latter of which may be an issue for some styles).

The other non-structural difference that probably accounts for a lot is removal of quotes around attributes that don't need them.

@GuillaumeGomez
Copy link
Member Author

That's something I was wondering about: do we need to put the license comments in all HTML pages? Otherwise we could put them into the CSS or keep them as jinja comments.

@syphar
Copy link
Member

syphar commented Oct 19, 2023

@Nemo157 @GuillaumeGomez what are both of your thoughts on this?

@GuillaumeGomez
Copy link
Member Author

About what exactly? I checked minify_html and it's generating invalid HTML, so we can't use it. As for the PR as is, rustdoc is using that to reduce generated pages size, but the requirement is different. I'm personally in favour of it (easy gain after all) but I think the readability is a good concern.

So in short: I'm personally in favour for merging it but I won't push for it. ;)

@Nemo157
Copy link
Contributor

Nemo157 commented Oct 19, 2023

and it's generating invalid HTML

What part is invalid? In my tests I was using minify_html::Cfg::spec_compliant which is meant to be 100% valid. (There are a few open bugs, but none that seem like they would cause us issues).

@GuillaumeGomez
Copy link
Member Author

It was merging together some attributes it shouldn't, modified <!doctype html> into <!doctypehtml>, removed quotes in some cases it shouldn't. I tested it on rustdoc html code and then checked the output with the w3c tool, which output even more errors.

@wilsonzlin
Copy link

Hi @GuillaumeGomez, maintainer of minify-html here. If you have any issues or feedback regarding the minifier I'd be happy to look into them. As @Nemo157 mentioned the DOCTYPE isn't minified in that way with the spec_compliant option enabled. Feel free to send any spec-failing minified HTML snippets where it shouldn't my way.

@syphar
Copy link
Member

syphar commented Nov 6, 2024

This PR at least has to be rebased.

Generally I'm still not sure about the impact when we gzip everything anyways, even between cloudfront & our webserver

@syphar syphar removed their request for review November 6, 2024 05:45
@syphar syphar added S-waiting-on-author Status: This PR is incomplete or needs to address review comments and removed S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This PR is incomplete or needs to address review comments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants