Skip to content

ValueError: Negative seed for out-of-vocab embedding generation #6

@elenamuia

Description

@elenamuia

While trying to get embeddings for tokenized text, I encountered this error:

File "/usr/local/lib/python3.11/dist-packages/staticvectors/model.py", line 297, in generate
    random = np.random.default_rng(seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "numpy/random/_generator.pyx", line 4957, in numpy.random._generator.default_rng
  File "_pcg64.pyx", line 123, in numpy.random._pcg64.PCG64.__init__
  File "bit_generator.pyx", line 535, in numpy.random.bit_generator.BitGenerator.__init__
  File "bit_generator.pyx", line 315, in numpy.random.bit_generator.SeedSequence.__init__
  File "bit_generator.pyx", line 389, in numpy.random.bit_generator.SeedSequence.get_assembled_entropy
  File "bit_generator.pyx", line 140, in numpy.random.bit_generator._coerce_to_uint32_array
  File "bit_generator.pyx", line 70, in numpy.random.bit_generator._int_to_uint32_array
ValueError: expected non-negative integer

The issue occurs in the generate method when handling out-of-vocabulary tokens. The hasher function produces negative integers which are then passed as seeds to np.random.default_rng(seed). However, NumPy's random generator expects non-negative integers for seeds.

Suggested fix: Make sure the seed value is always positive before passing it to the random number generator, for example by taking the absolute value: random = np.random.default_rng(abs(seed))

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions