Formalize YODA principles

YODA has also been proposed to be a standard/best-practice for ReproNim https://github.com/ReproNim/repronim.org/issues/206.

IMO,  YODA should clearly separate the principles from the suggestions, and should be fully decoupled from DataLad.

"Standards speak" would need to be expounded and explained to make sense to the unfamiliar, but this is what I have in mind for the formal bit. wdyt @yarikoptic?

## YODA IDEALS

- "YODA compliant datasets" contain well-defined, portable computational environments to compute analysis results.
- "YODA compliant datasets" preserve provenance of the computational procedures that produce or alter derivative data.
- "YODA compliant datasets" strive for reproducibility.

## YODA PRINCIPLES:

- All assets essential to replicate computational execution MUST be included
- All assets essential to replicate computational execution MUST be version controlled
- All assets essential to replicate computational execution SHOULD be version controlled using the
    same version control system
- All assets essential to replicate computational execution MAY be linked(subdataset) or included directly in the dataset
- Provenance of all modifications to the assets MUST be annotated
- Dataset structure SHOULD accommodate domain standards
- Assets SHOULD be organized in a modular structure

## YODA ASSETS:

(This part could probably be left out of the formal section and discussed in the detailed explanation)

MUST:
- input data
- custom analysis code/scripts (upstream or custom code)
- computational environments (e.g. as container images)
- Documentation

SHOULD:
- Test scripts
- Automation 

# NOTES

### Original Organigram: https://f1000research.com/posters/7-1965

#### Top level

    Track all input data, code, and computational environments needed to produce analysis outputs in
    version controlled datasets — and reproducibility you will achieve!

    Learn control you must.
    Size matters not!

    - Subdataset references in a dataset are
      extremely lightweight yet guarantee data identity via cryptographic hashes.  Subdatasets can be
      detached without losing this information, yielding massively improved storage efficiency and
      reduced archive costs.

    - Publicly shared data compliant with a common standard are an optimal element in a modular study
      setup. From mid-2018 OpenNeuro (previously OpenFMRI) will offer DataLad datasets for direct
      download

#### Principles

    *P1* Use well-defined, portable computational environments to compute analysis results

    *P2* Exhaustively track ALL analysis inputs in the same version control system
    as the computed results, including:
    - input data
    - custom analysis code/scripts
    - required computational environments (e.g. as container images)

    *P3* Structure study elements (data, code, environments) in modular
    components to facilitate reuse within or outside the context of the
    original study

#### Dataset Layout

    Dataset structure is fully flexible to be able to accommodate domain standards (e.g. BIDS).  Element
    location/name can be discovered from configuration.
    
    Required (3rd-party) code repositories can be referenced as subdatasets just like datasets with data
    files. Repository state is unambiguous version record.
    
    Images of containerized computational environments are tracked in version control just like any
    other data file. Actual storage can be local or in cloud
    
    Any input data is referenced via the dataset that contans it.  Dataset state provides unambi- guous
    version specification for any data dependency.
    
    DataLad can obtain required subdataset content on demand.  Only content elements actually required
    for an analysis are present. Directory structure is expanded recursively as needed
    
    Test scripts can be used to check analysis code, verify data integrity, and assess computational
    reproducibility.


### Datalad Handbook

https://handbook.datalad.org/en/latest/basics/101-127-yoda.html

####  Principles

    P1: One thing, one dataset
    P2: Record where you got it from, and where it is now
    P3: Record what you did to it, and with what


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formalize YODA principles #3

YODA IDEALS

YODA PRINCIPLES:

YODA ASSETS:

NOTES

Original Organigram: https://f1000research.com/posters/7-1965

Top level

Principles

Dataset Layout

Datalad Handbook

Principles

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Formalize YODA principles #3

Description

YODA IDEALS

YODA PRINCIPLES:

YODA ASSETS:

NOTES

Original Organigram: https://f1000research.com/posters/7-1965

Top level

Principles

Dataset Layout

Datalad Handbook

Principles

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions