Replies: 1 comment
-
|
I think this attention on traceability / reproducibility is definitely warranted. A couple specific things come to mind, though you may have already considered these within the items above
There are some interesting differences in how much provenance information is available for different inputs. For some things, e.g. the EDB or BADA files, there are clear, robust version numbers that can reported, while in other cases (TASOPT performance models) there's much less structure. I'm curious where the airport location list fits into this. I think it would be useful to keep both some human-interpretable provenance information as well as some checksums, so you can both tell what a run is after the fact and have some idea of whether or not you should expect to be able to repeat it with a given set of inputs. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
It seems to me that there are two kinds of data that we're going to be producing from AEIC:
As described in the user stories that Adi wrote, different kinds of users will be interested in different kinds of data products. Some users will be running trajectory-level simulations themselves with self-selected performance data, mission profiles, etc., while some (most?) users will be using only post-processed products.
I think that we need to make some efforts to ensure traceability of the data products that we provide to users. From what I've seen, many users ask for "the AEIC emissions data for 2024", meaning the annual emissions inventories based on OAG mission data, ERA-5 weather and unspecified performance data. From the perspective of that kind of user, they don't really care about the details of exactly how we generate that emissions data: they want to think of what we give them as the AEIC emissions. That means that we need to standardize on a single set of choices for missions, weather, performance and other data for generating these "consumer" data products. For other users, the situation is more fluid: some users will produce post-processed data of one kind or another themselves, some users will run trajectory simulations themselves using custom performance data, and so on.
The range of choices is wide, and we need to make it as easy as possible for users to keep track of how AEIC data was generated and what data choices went into it. I think that that means that we need to embed provenance metadata into all data products that we produce. (If users write their own post-processing code, it's up to them to handle this, but we should at least make it easy for them to get hold of provenance metadata to use themselves.)
Here's a list of the things that go into producing AEIC intermediate trajectory/emissions output:
(For emissions calculations performed after trajectory simulations, the trajectory stores maintain a link from the emissions data to the trajectories that generated them. That could be viewed as an additional data source.)
In addition, post-processed data products will have more configuration choices (mission filtering, grid parameters, parameters related to climate/impacts calculations, and so on).
I have ideas for how to handle most of these items, but I'd like to leave the discussion here open without prejudicing people with those ideas. Have I missed anything from the above list? How would you handle this whole provenance question? What level of traceability do we actually need in AEIC?
(Note: the reason for including "airport locations" in the list above is that these do change from time to time. Airports are closed, so historical mission data may not be usable with current airport data, and some airport location data from the source that we use is not quite stable, which can lead to small differences in simulation output.)
Beta Was this translation helpful? Give feedback.
All reactions