Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.
/ druntime Public archive

Conversation

@n8sh
Copy link
Member

@n8sh n8sh commented Jul 8, 2020

This is achieved by splitting off the all-scalar case and by moving the at and trustedCast helper functions out of the body of core.internal.array.equality.__equals.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @n8sh!

Bugzilla references

Auto-close Bugzilla Severity Description
21030 enhancement Reduce template function instantiations related to array equality

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + druntime#3152"

@dlang-bot dlang-bot added the Enhancement New functionality label Jul 8, 2020
@n8sh
Copy link
Member Author

n8sh commented Jul 8, 2020

Pushing to fix a comment x 2

@n8sh n8sh force-pushed the issue-21030 branch 3 times, most recently from 7bae49a to 10a3de5 Compare July 8, 2020 16:27
@n8sh n8sh force-pushed the issue-21030 branch 2 times, most recently from 4baf944 to bdb3810 Compare July 8, 2020 21:20
// This would improperly allow equality of integers and pointers
// but the CTFE branch will stop this function from compiling then.
import core.stdc.string : memcmp;
return lhs.length == 0 || 0 == memcmp(lhs.ptr, rhs.ptr, lhs.length * T1.sizeof);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure you need the length == 0 case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an optimisation, for an uncommon case, therefore a slowdown for the common case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there because I am under the impression that memcmp has undefined behavior for null pointers. If not it can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n8sh you are correct.
memcpy on null is undefined

@UplinkCoder
Copy link
Member

@n8sh Have you verified that this helps?

@n8sh
Copy link
Member Author

n8sh commented Jul 8, 2020

@UplinkCoder Yes, I wrote a long program that was nothing but array equality checks, and with the change compiling it takes about 23% as much time and about 22% as much memory.

Comment on lines 46 to 47
// exclude opaque structs due to https://issues.dlang.org/show_bug.cgi?id=20959
if (!(is(T == struct) && !is(typeof(T.sizeof))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from #3142

@kinke
Copy link
Contributor

kinke commented Jul 8, 2020

I've rebased and updated #3142 in the meantime too. I don't really want to rebase yet another time after pieces of it are ripped out into yet another separate PR (and each with its own dlang issue? Is that tracker spamming (ab)used for changelog generation?).

@n8sh
Copy link
Member Author

n8sh commented Jul 8, 2020

(and each with its own dlang issue? Is that tracker spamming (ab)used for changelog generation?)

I am under the impression that any non-trivial PR should have an associated issue.

@n8sh
Copy link
Member Author

n8sh commented Jul 8, 2020

Anyway, if this PR is merged before that one (which seems possible because this PR isn't waiting on anyone else's work), I'd be fine with rebasing #3142 for you. Regardless of which PR is pulled first the other will need to be rebased anyway.

@kinke
Copy link
Contributor

kinke commented Jul 8, 2020

It'd be nice to have a comparison with your test program, but the non-identical semantics (also wrt. memcmp application) don't allow a direct comparison. Anyway, getting rid of superfluous instantiations was obviously my main goal after seeing the original code, and the .tupleof thing an unexpected hold-up. [Edit: And AFAICT, this PR shouldn't be needed after mine.]

@n8sh
Copy link
Member Author

n8sh commented Jul 8, 2020

If I remove all of this PR except the addition of the scalar-specific __equals definition and the addition of a template constraint, that would minimize conflict while still doing something useful performance-wise.

@kinke
Copy link
Contributor

kinke commented Jul 9, 2020

For the time being, maybe, but my version, AFAICT, doesn't need any of this, so I'd basically just revert this and apply mine instead. In my version, there's no trustedCast() helper template at all anymore. at was extracted to the module-level, just like you did (but does the void->ubyte promotion in my case, instead of trustedCast and the void[] special cases for __equals). I don't use a 2nd __equals, but a little nested trustedMemcmp helper (in a static if for memcmp-able arrays only).

@n8sh
Copy link
Member Author

n8sh commented Jul 9, 2020

It'd be nice to have a comparison with your test program, but the non-identical semantics (also wrt. memcmp application) don't allow a direct comparison.

With those provisos, yours compiles in 75% as much time and using 60% as much memory as the baseline.

@n8sh
Copy link
Member Author

n8sh commented Jul 9, 2020

I don't use a 2nd __equals

Right, but doing so improves compilation speed without interfering with any of the stuff you are doing.

@kinke
Copy link
Contributor

kinke commented Jul 9, 2020

Your version doesn't use memcmp for static array element types etc. though, that's why runtime performance would be interesting as well. Please post your test code as gist or so somewhere, so that I can experiment as well.

@n8sh
Copy link
Member Author

n8sh commented Jul 9, 2020

Your version doesn't use memcmp for static array element types etc. though

Right, and static arrays are handled by the non-scalar version (since __isScalar is never true for static arrays) with your memcmp check etc.

Please post your test code as gist or so somewhere, so that I can experiment as well.

Sure. You'll see that the code does nothing but test whether there is a net benefit in compilation time for the happy path given the additional work done in template resolution. My starting assumption was that array equality checks are primarily performed between arrays of scalars so that is the case I was focused on improving. https://pastebin.com/d1A5f07s

@kinke
Copy link
Contributor

kinke commented Jul 9, 2020

Thx a lot, I'll look into it another time - I hope(d) to have covered your happy path optimization in a not-unnecessarily-complex useMemcmp check; if it doesn't, that's definitely an improvement of yours.

@n8sh n8sh force-pushed the issue-21030 branch 2 times, most recently from 84145b9 to 585fecd Compare August 10, 2020 16:06
@dlang-bot dlang-bot merged commit f4dcab0 into dlang:master Sep 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Enhancement New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants