Skip to content

Conversation

@sjakobi
Copy link
Member

@sjakobi sjakobi commented Nov 12, 2025

This is a continuation of @oberblastmeister's work done in #471.

Resolves #392.


TODO:

  • Use WW to avoid allocating Maybes
  • Move Collision-handling code out of line
  • Undo 2e7b1c4 to reduce Core size.

@sjakobi sjakobi mentioned this pull request Nov 12, 2025
@sjakobi
Copy link
Member Author

sjakobi commented Nov 12, 2025

I've made alter and update to inline aggressively, and at this stage, alter is faster than the version on master at all sizes:

master:

$ cabal run fine-grained -- -p alter --stdev 1 -p Int
All
  HashMap.Strict
    alter (1000x)
      presentKey
        Int
          1:      OK
            20.4 μs ± 407 ns
          10:     OK
            38.4 μs ± 449 ns
          100:    OK
            52.0 μs ± 979 ns
          1000:   OK
            70.7 μs ± 1.3 μs
          10000:  OK
            81.4 μs ± 1.5 μs
          100000: OK
            98.0 μs ± 1.5 μs
      absentKey
        Int
          0:      OK
            19.9 μs ± 329 ns
          1:      OK
            24.6 μs ± 442 ns
          10:     OK
            35.1 μs ± 481 ns
          100:    OK
            44.9 μs ± 807 ns
          1000:   OK
            55.5 μs ± 897 ns
          10000:  OK
            63.2 μs ± 1.1 μs
          100000: OK
            85.8 μs ± 457 ns

65af25c:

≻ cabal run fine-grained -- -p alter --stdev 1 -p Int
All
  HashMap.Strict
    alter (1000x)
      presentKey
        Int
          1:      OK
            11.7 μs ±  43 ns
          10:     OK
            29.3 μs ± 200 ns
          100:    OK
            38.8 μs ± 440 ns
          1000:   OK
            60.4 μs ± 678 ns
          10000:  OK
            71.8 μs ± 254 ns
          100000: OK
            91.1 μs ± 1.3 μs
      absentKey
        Int
          0:      OK
            10.3 μs ±  51 ns
          1:      OK
            14.6 μs ± 210 ns
          10:     OK
            24.3 μs ± 323 ns
          100:    OK
            33.7 μs ± 407 ns
          1000:   OK
            44.5 μs ± 750 ns
          10000:  OK
            55.1 μs ± 674 ns
          100000: OK
            81.7 μs ± 766 ns

The Core size for these functions is pretty huge though: Strict.alter has 720 terms now, Strict.update has 536.

There are still a few things to improve though.

@sjakobi
Copy link
Member Author

sjakobi commented Nov 12, 2025

Core sizes at 50e2490:

  • Strict.alter: 576 terms
  • Strict.update: 434 terms
  • $walterCollision: 168 terms

@treeowl
Copy link
Collaborator

treeowl commented Nov 13, 2025

Large. Probably not something we'd want to INLINE. One trick around that is to use manual worker-wrapper to try to "unbox" the passed function. Roughly speaking,

newtype Maybe# a = Maybe# (# (##) | a
 #)
pattern Just# :: a -> Maybe# a
pattern Just# a = Maybe# (# | a #)
pattern Nothing# :: Maybe# a
pattern Nothing# = Maybe# (# (##) | #)
{-# COMPLETE Nothing#, Just# #-}
toMaybe :: Maybe# a -> Maybe a
fromMaybe :: Maybe a -> Maybe# a

alter f = alter# $ \m# -> fromMaybe (f (toMaybe m#))

alter# :: (Hashable k, Eq k) => (Maybe# a -> Maybe# a) -> HashMap k a -> k -> HashMap k a

In the (I believe typical) case that the passed function is known, small, and non-recursive, GHC will inline it into the function passed to alter#, getting rid of the maybes.

@sjakobi
Copy link
Member Author

sjakobi commented Nov 13, 2025

At the current state of this branch, the Maybes from the application of the function argument are eliminated by inlining the function into alter, but the resulting Core size is, of course, quite enormous.

I guess this Maybe#-scheme could possibly help recover some of the performance lost by not inlining alter.

@treeowl
Copy link
Collaborator

treeowl commented Nov 13, 2025

I guess this Maybe#-scheme could possibly help recover some of the performance lost by not inlining alter.

Exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

alter could be much more efficient

4 participants