Skip to content

What training and test folds were used for models in paper #33

@BDEvan5

Description

@BDEvan5

Hello Borzoi Team

Firstly, I am impressed with the quality of the repositories and tutorials. I was able to generate the data (both to replicate the paper and using the new 393kbp sequences) and I could train the micro, mini and full models. It is rare and thus very impressive to find open source repositories that work out of the box. Thank you for you hard work and effort in making replication easy.

I am confused regarding which train/test folds were used to produce the models in the paper. The paper states:

We trained four models, each with distinct held out test and validation folds. (Page 14, Methods/Data)

However, the repo readme, supported by the answer to issue #11 (Clarification of model fold data splits) says:

We trained a total of 4 model replicates with identical train, validation and test splits (test = fold3, validation = fold4 from sequences_human.bed.gz).

This appears to be contradictory, unless you trained 4 models per fold set and only released the models for (test = fold3, validation = fold4). If this is the case, which models did you use in the results reported in the paper?

I downloaded the four models (links from the readme, e.g. https://storage.googleapis.com/seqnn-share/borzoi/f0/model0_best.h5). I tested on the K652 RNA-seq tracks (ENCSR000AEL, plus and minus), processed with the Makefile for 524kbp sequences from the borzoi-paper repo.

The image below shows my results from testing each model on each fold. I measure a Pearson correlation above 0.83 on each of the folds, except 3 and 4, where the scores are around 0.6/0.7.
This would indicate that the repo is correct in that the models were all trained on the same train/test split.

image

Please will you help me to understand which folds were used?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions