-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
While trying to get embeddings for tokenized text, I encountered this error:
File "/usr/local/lib/python3.11/dist-packages/staticvectors/model.py", line 297, in generate
random = np.random.default_rng(seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "numpy/random/_generator.pyx", line 4957, in numpy.random._generator.default_rng
File "_pcg64.pyx", line 123, in numpy.random._pcg64.PCG64.__init__
File "bit_generator.pyx", line 535, in numpy.random.bit_generator.BitGenerator.__init__
File "bit_generator.pyx", line 315, in numpy.random.bit_generator.SeedSequence.__init__
File "bit_generator.pyx", line 389, in numpy.random.bit_generator.SeedSequence.get_assembled_entropy
File "bit_generator.pyx", line 140, in numpy.random.bit_generator._coerce_to_uint32_array
File "bit_generator.pyx", line 70, in numpy.random.bit_generator._int_to_uint32_array
ValueError: expected non-negative integer
The issue occurs in the generate method when handling out-of-vocabulary tokens. The hasher function produces negative integers which are then passed as seeds to np.random.default_rng(seed). However, NumPy's random generator expects non-negative integers for seeds.
Suggested fix: Make sure the seed value is always positive before passing it to the random number generator, for example by taking the absolute value: random = np.random.default_rng(abs(seed))
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working