Skip to content

Direct serialization of GufeTokenizable attributes holding GufeKey instances produces false round tripping #713

@ianmkenney

Description

@ianmkenney

Serializing a GufeTokenizable with an attribute bound to a data-structure (that is not also a GufeTokenizable) containing a GufeKey is a lossy transformation. De-serialization results in a GufeTokenizable with differing data types where all instances of GufeKey become str (e.g. GufeKey -> str, [GufeKey] -> [str],{GufeKey: Any} -> {str: Any}). Reconstructing after clearing the registry, a user then loses access to the GufeKey methods, such as prefix and token.

Given that the documentation says

These properties, in particular the stability across Python sessions, make the GufeKey a stable identifier for the object.

it could be interpreted that GufeKey is a serializable type useful for referring to a GufeTokenizable.

from gufe.tokenization import (GufeTokenizable,
                               GufeKey,
                               TOKENIZABLE_REGISTRY)

class Blank(GufeTokenizable):

    def __init__(self, value):
        self.value = value

    @staticmethod
    def _defaults():
        return {}

    def _to_dict(self):
        return {"value": self.value}

    @classmethod
    def _from_dict(cls, dct):
        return cls(**dct)

if __name__ == "__main__":
    target_key = GufeKey("fake_key")

    # we have access to the embedded GufeKey methods, such as to_dict
    gt = Blank(target_key)
    print(gt.value.to_dict())

    # serializing to json removes the information that [value] is a
    # GufeKey
    as_json = gt.to_json()
    print(as_json)

    # rebuilding the GufeTokenizable still gives us access to these
    # methods because the object is never really rebuilt, it's cached
    # in the TOKENIZABLE_REGISTRY
    reconstructed = Blank.from_json(content=as_json)
    assert reconstructed == gt
    # same object
    assert reconstructed is gt
    print(reconstructed.value.to_dict())

    # clearing the registry and rebuilding then removes access to
    # these methods because direct serialization of the GufeKey just
    # creates a string, while still /appearing/ to have round-tripped
    # correctly
    TOKENIZABLE_REGISTRY.clear()
    reconstructed = Blank.from_json(content=as_json)
    assert reconstructed == gt
    # no longer the same object
    assert reconstructed is not gt

    try:
        reconstructed.value.to_dict()
    except AttributeError as e:
        print(e)
{':gufe-key:': 'fake_key'}
[["Blank-208103cb7d0052c36db0d76769123b5c", {"value": "fake_key", "__qualname__": "Blank", "__module__": "__main__", ":version:": 1}]]
{':gufe-key:': 'fake_key'}
'str' object has no attribute 'to_dict'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions