Skip to content

Unable to read/write arrays of nested records in Python #194

@naegelejd

Description

@naegelejd

As of v0.6.3, I'm not sure if it's possible to serialize and deserialize NumPy arrays of nested record types.

Example model:

ChildRecord: !record
  fields:
    c: int

ParentRecord: !record
  fields:
    p: int
    child: ChildRecord

MyProtocol: !protocol
  sequence:
    records: ParentRecord[,]

Now, walking through the example as if I'm a new user of Yardl:

Step 1

If I attempt to write a NumPy array of ParentRecord, I get a Yardl error about its dtype:

child = issue.ChildRecord(c=42)
parent = issue.ParentRecord(p=7, child=child)
records = np.tile(parent, (3, 4))
with issue.BinaryMyProtocolWriter("data.bin") as w:
    w.write_records(records)
...
File "/workspaces/yardl/joe/issue-#194/python/issue/_binary.py", line 1129, in _write_data
    raise ValueError(message)
ValueError: Expected dtype {'names': ['p', 'child'], 'formats': ['<i4', [('c', '<i4')]], 'offsets': [0, 4], 'itemsize': 8, 'aligned': True}, got object

This is documented behavior: https://microsoft.github.io/yardl/python/language.html#arrays.

Step 2

Unfortunately, we can't just "set" the correct dtype, e.g.

records = np.tile(parent, (3, 4)).astype(issue.get_dtype(issue.ParentRecord))
  File "/workspaces/yardl/joe/issue-#194/python/test.py", line 60, in main
    records = np.tile(parent, (3, 4)).astype(issue.get_dtype(issue.ParentRecord))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'ParentRecord'

Step 3

So I'll manually transform my data to a NumPy structured array (noting that this is not very user-friendly).

records = np.tile(
    np.array(
        (parent.p, (parent.child.c,)), dtype=issue.get_dtype(issue.ParentRecord)
    ),
    (3, 4),
)

This allows me to successfully write my array but now I get an error when reading the array!

with issue.BinaryMyProtocolReader("data.bin") as r:
    records_read = r.read_records()
...
File "/workspaces/yardl/joe/issue-#194/python/test.py", line 72, in main
    records_read = r.read_records()
                   ^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/joe/issue-#194/python/issue/protocols.py", line 113, in read_records
    value = self._read_records()
            ^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/joe/issue-#194/python/issue/binary.py", line 43, in _read_records
    return _binary.NDArraySerializer(ParentRecordSerializer(), 2).read(self._stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/joe/issue-#194/python/issue/_binary.py", line 1251, in read
    return self._read_data(stream, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/yardl/joe/issue-#194/python/issue/_binary.py", line 1149, in _read_data
    result[i] = self.element_serializer.read_numpy(stream)
    ~~~~~~^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'ChildRecord'

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions