@peteromallet amazing work, i noticed you have git branch as part of the metadata, but i feel that is not enough. To be able to train on it, ideally people need the actual environment to source the dataset as well.
This work reminds me of the great agent tracteory framework harbor built in, which also reminds me, maybe we need to bake in a dataclaw version number into everyone's export, such that as we make breaking changes, downstream users can convert easily https://github.com/harbor-framework/harbor