Skip to content

File Processing

Chrisymonds edited this page Aug 19, 2022 · 1 revision

The original output files from the GLAM modelling runs were numerous, taking the form of over 6 million individual files comprising variation over lat, lon, time, irrigation, ygp, rcp, crop and country. To be able to plot these files in real time on the iFEED website, the files needed to be combined so that a single graph didn't cause thousands of data requests.

As such, the [DIMCool Data Tool] (https://github.com/cemac/DIMCool_data_tool) was created. The plan was to combine all the data from each country into a single file. The small ASCII files that existed before had two dimensions - lat & lon. This was to be expanded to 7 dimensions - lat, lon, time, irrigation, ygp, crop and rcp. The files were originally ordered into directories, with the following hierarchy:

country
|___crop
    |___rcp
        |___year
            |___120 files with all combinations of irrigation level and ygp

Unfortunately the amount of data was too large to do this all at once using iris, and so nco was used instead, and an itterative process was used. The first stage was to combine all data for each year, ygp and irrigation level in the iFEED project. This data was comprised of 120 files per year (10 production levels, 12 irrigation levels). Data was combined from all 120 individual ascii files (with 49 columns per file and each row relating to a single 0.5degx0.5deg gridcell) into a set of 100 NetCDF files using iris cubes. The program was set up to use multiprocessing also as the data combination process can be time consuming. Data was then combined across years using nco.

Once this stage was completed successfully a second stage of processing was carried out where nco was used to itteritively walk up the directory tree combining files along the rcp axis, then the crop axis so that only 4 files remained. This proved difficult for two reasons - firstly nco had trouble changing the concatenation dimension from time to something else, and secondly after a certain point there was not enough memory to hold the entire cube and so data started being lost (resulting in interesting data being plotted on the iFEED website such as, for example, a prediction that in 2077 potato crops in malawi would fail dramatically and never recover). After some testing it was determined the difficulty in combining the data was much higher than the difficulty inherent in using a set of 32 files rather than a set of 4 files. As such, the files that resulted from the first round of processing were subsetted so that only the fields that were required (biomass, planting date, duration and yield) were kept and the function used to read the data was modified to be able to read data that was split into many files.

Clone this wiki locally