Skip to content

Thoughts on changes - Lucie #2

@nathanielmki

Description

@nathanielmki

I would go with /biocore/scratch/reference/ensembl/release-93/... instead of /biocore/ref_expanded/ensembl/release-93/... for consistency. However, since the proposed changes will affect all parts of our automation including, the cloud image, data downloads ,... I would suggest we do this incrementally.

rename /data/internal to /data/raw_data
rename /data/projects to /data/analysis
rename /data/external to /data/reference
re-organize the expended reference data under /data/scratch to /data/scratch/reference
replicate this in the cloud
run regression testing
data downloads automation
pipeline analysis
Re organize /data/scratch/reference to be by source_name/release-version
Re organize the information under /data/transformed/tool-version to be by source_name/release-version
Update the reference pre-indexing automation
run regression testing for the reference pre-indexing automation
Proceed with changing /data to /biocore
Update the /data mount in the cloud and generate a new image
I personally do not see the need to rename /data to /biocore for the following reasons:

It is redundant since we are using biocore servers
More hassle to run the same application on both the premise and cloud servers - need more configuration
Not portable - see 2
I also do not think raw_data is intuitive and representative for the experimental data - I would suggest "experiments" instead which requires less cognitive load since it uses recognition instead of recall .

Those are my two cents

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions