Skip to content

Support PROV-Dictionary? #129

@stain

Description

@stain

We've been wanting to use PROV-Dictionary extension with prov.py, but it's a bit tricky if we want to serialize in multiple formats.

Our current workaround is to register the regular membership of prov:Collection as supported by prov.py, and also say it's a prov:Dictionary:

entity = document.entity("ex:someFile")
coll = document.entity("ex:someDirectory", [                    
                     (provM.PROV_TYPE, PROV["Collection"]),
                     (provM.PROV_TYPE, PROV["Dictionary"]),
               ])

Then regular membership is easy:

document.membership(coll, entity)

prov.py does however not have a dictionaryMembership method. To express the PROV Dictionary we use a PROV-O compatible attributes:

# Membership relation
m_entity  = document.entity(uuid.uuid4().urn, [
  (PROV["KeyEntityPair"])
  ])
m_entity.add_attributes({
    PROV["pairKey"]: entry["basename"],
    PROV["pairEntity"]: entity,
})

This workaround produces PROV-O statements correct according to PROV-Dictionary section 5:

ex:someDirectory a 
        prov:Collection,
        prov:Dictionary,
        prov:Entity ;
    prov:hadMember ex:someFile ;
    prov:hadDictionaryMember <urn:uuid:25d8fc8b-2b63-45dc-9e33-276e9839a0a8> .

<urn:uuid:25d8fc8b-2b63-45dc-9e33-276e9839a0a8> a 
        prov:Entity,
        prov:KeyEntityPair ;
    prov:pairEntity ex:someFile ;
    prov:pairKey "filename.txt"^^xsd:string .

However the PROV-N output does not match PROV-Dictionary section 4:

 entity(ex:someDirectory, [prov:type='prov:Dictionary', prov:type='prov:Collection', prov:hadDictionaryMember='id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8'])

  hadMember(ex:someDirectory, ex:someFile)

  entity(id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8, [prov:type='prov:KeyEntityPair', prov:pairKey="filename.txt", prov:pairEntity='ex:someFile'])

If this was supported the membership should come in PROV-N as:

prov:hadDictionaryMember(ex:someDirectory, ex:someFile, "filename.txt")

Is there a way to add such name-spaced statements to PROV-N with prov.py?

Similarly expressed in PROV-XML according to PROV-Dictionary section 6 we would expect something like:

<prov:collection prov:id="ex:someDirectory" />
<prov:hadMember>
    <prov:collection prov:ref="ex:someDirectory"/>
    <prov:entity prov:ref="ex:someFile"/>
</prov:hadMember>


<prov:dictionary prov:id="ex:someDirectory" />

<prov:hadDictionaryMember>
    <prov:dictionary prov:ref="ex:someDirectory"/>
    <prov:keyEntityPair>
        <prov:key>filename.txt</prov:key>
        <prov:entity prov:ref="ex:someFile"/>
    </prov:keyEntityPair>
</prov:hadDictionaryMember>

but with our workaround we get:

  <prov:collection prov:id="ex:someDirectory">
    <prov:type xsi:type="xsd:QName">prov:Dictionary</prov:type>
    <prov:hadDictionaryMember xsi:type="xsd:QName">id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8</prov:hadDictionaryMember>
  </prov:collection>

<prov:hadMember>
    <prov:collection prov:ref="ex:someDirectory"/>
    <prov:entity prov:ref="ex:someFile"/>
</prov:hadMember>

  <prov:entity prov:id="id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8">
    <prov:type xsi:type="xsd:QName">prov:KeyEntityPair</prov:type>
    <prov:pairEntity xsi:type="xsd:QName">id:aa96fdb4-ecb6-4488-9a9b-00f0c17a1fbd</prov:pairEntity>
    <prov:pairKey>rsem_reference.seq</prov:pairKey>
  </prov:entity>

Note that this style seems to survive a round-trip from PROV-O via PROV-XML over to PROV-O again.

Obviously we can blame the PROV-Dictionary spec for not implementing it in this PROV-O style also in PROV-XML and PROV-N (which would then have been backwards compatible to all PROV syntaxes)

This issue however asks for some prov.py API support for making PROV-Dictionary statements across all syntaxes.

It might ideally need some hacks to have consistent serialization and parsing though - but as a first attempt I would suggest adding support for our approach as it would not cause issues in loading/saving. Also I think the implication of a Dictionary being a Collection should be implied for compatibility with consumers not understanding PROV-Dictionary, but I understand if that can be harder to maintain in a mutable prov model in memory (e.g. there could be multiple keys having same value).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions