Skip to content

Conversation

@rfourquet
Copy link
Member

@rfourquet rfourquet commented Nov 22, 2025

When calling seed!(rng, seed), the seed is converted into random bytes, which are then expanded to produce the initialization state for rng. The purpose of hashing is to ensure that MyRNG(1) and MyRNG(2) produce uncorrelated streams.

In practice, this call is implemented as seed!(rng, SeedHasher(seed)), assuming that rng implements seed!(::AbstractRNG) for initialization from another RNG.

Previously, SeedHasher worked in three stages:

  1. encode seed into bytes
  2. hash these bytes using SHA-2
  3. expand the resulting digest with an ad-hoc construction

This approach was functional but relatively slow.
This commit replaces stages 2 and 3 with an algorithm designed specifically for seed generation by M. E. O'Neill, described at: https://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html

The implementation is adapted from O'Neill's seed_seq_fe C++ reference (MIT license). NumPy uses the same algorithm for its SeedSequence.

Here are some numbers:

@btime Xoshiro(1)
@btime Xoshiro($(rand(UInt)))
@btime Xoshiro($(rand(UInt, 4)))
@btime Xoshiro($(rand(UInt, 8)))
s = Random.SeedHasher(); @btime rand($s, UInt)

On master:

  398.755 ns (9 allocations: 448 bytes)
  403.610 ns (9 allocations: 448 bytes)
  486.254 ns (9 allocations: 448 bytes)
  998.182 ns (9 allocations: 448 bytes)
  73.213 ns (0 allocations: 63 bytes)

On PR:

  36.807 ns (3 allocations: 256 bytes)
  54.717 ns (3 allocations: 256 bytes)
  144.917 ns (3 allocations: 256 bytes)
  228.738 ns (3 allocations: 256 bytes)
  2.454 ns (0 allocations: 0 bytes)

When calling `seed!(rng, seed)`, the `seed` is converted into random bytes,
which are then expanded to produce the initialization state for `rng`. The
purpose of hashing is to ensure that `MyRNG(1)` and `MyRNG(2)` produce
uncorrelated streams.

In practice, this call is implemented as `seed!(rng, SeedHasher(seed))`,
assuming that `rng` implements `seed!(::AbstractRNG)` for initialization
from another RNG.

Previously, `SeedHasher` worked in three stages:
1. encode `seed` into bytes
2. hash these bytes using SHA-2
3. expand the resulting digest with an ad-hoc construction

This approach was functional but relatively slow.
This commit replaces stages 2 and 3 with an algorithm designed specifically
for seed generation by M. E. O'Neill, described at:
https://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html

The implementation is adapted from O'Neill's `seed_seq_fe` C++ reference
(MIT license). NumPy uses the same algorithm for its `SeedSequence`.
@rfourquet rfourquet added performance Must go faster randomness Random number generation and the Random stdlib labels Nov 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster randomness Random number generation and the Random stdlib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants