At the moment the CombineGeneticDatasets has some strong input requirements. In particular all three GenOMICC genotyping arrays need to be present and optionally WGS data. It was mentioned here that:
- Some of our collaborators (and our other open-source inputs) will / do use other, different types of arrays than our core GenOMICC UK arrays.
- It would be more useful if the Genotype QC pipeline could take a list or table of array paths to pass into the Combine Genetic Variants step.
We should thus make these inputs more flexible, for instance by providing them as a list. However sufficient information needs to be present to process them correctly. For instance:
- files prefix
- genome assembly
- file format
So that the workflow could handle a variable number of inputs. One current constraint is that a single participant could occur within multiple files and a priority as been given to the existing genotyping technologies. Having a variable number of file formats makes this process more complicated.
At the moment the
CombineGeneticDatasetshas some strong input requirements. In particular all three GenOMICC genotyping arrays need to be present and optionally WGS data. It was mentioned here that:We should thus make these inputs more flexible, for instance by providing them as a list. However sufficient information needs to be present to process them correctly. For instance:
So that the workflow could handle a variable number of inputs. One current constraint is that a single participant could occur within multiple files and a priority as been given to the existing genotyping technologies. Having a variable number of file formats makes this process more complicated.