Skip to content

Update benchmarking script with new CLI#56

Draft
victorlin wants to merge 19 commits intoababaian:masterfrom
victorlin:master
Draft

Update benchmarking script with new CLI#56
victorlin wants to merge 19 commits intoababaian:masterfrom
victorlin:master

Conversation

@victorlin
Copy link
Collaborator

@victorlin victorlin commented Apr 25, 2020

Opening this PR to track progress for #27. Keeping as draft until ready.

TODO:

  • split into 2 scripts
    • cov_benchmark.py
      • design high-level CLI
      • implement simulation parameters
      • implement alignment parameters
      • implement custom subprocess parameters (msbar,art_illumina, bowtie2)
      • convert counts to proportions
    • pangenome_cov_benchmark.py
      • per genome in pangenome: generate single-record FASTA, run cov_benchmark.py with predefined parameters
      • aggregate results in .csv file
  • update README
  • write containerized unit tests (?)

@mathemage
Copy link
Collaborator

mathemage commented Apr 26, 2020

PR closes #27

@mathemage mathemage linked an issue Apr 26, 2020 that may be closed by this pull request
@victorlin
Copy link
Collaborator Author

I was able to do most of the finalizing work for this today. A few more unit tests that I'm figuring out how to implement since it doesn't seem like msbar supports a random seed to generate reproducible output.

The updated README is mostly technical without much insight on the biological context; @ababaian if you have any comments/suggestions feel free to modify directly or let me know!

@ababaian
Copy link
Owner

ababaian commented May 3, 2020

Hey @victorlin, can you remove the fq files from the repo and instead host them on the s3 bucket. There is a test_data folder which these would fit in.

--

I'm a bit tied up at the moment, I'll do a proper review tomorrow

@victorlin
Copy link
Collaborator Author

@ababaian done - moved all the test input files to s3://serratus-public/test-data/benchmarker/. Also added a setup.sh script to copy the files into test/benchmarker/local/. Using local for the folder name since it's targeted by our repo's .gitignore.

@victorlin
Copy link
Collaborator Author

A lot has happened since this was started. Is the benchmarker still needed?

The last task is unit testing msbar, but it seems non-trivial. We could merge this PR in and add a separate task, or just close out to make room for other exciting things.

@rcedgar
Copy link
Collaborator

rcedgar commented May 24, 2020

IMO we don't need a mapping benchmark for the main Serratus search, we have settled on how to run bowtie2 and are not likely to revisit benchmarking of mappers.

@ababaian
Copy link
Owner

Ironically enough I do think we /should/ revisit this as there will be ways to improve on --very-sensitive-local I just think on the totem pole of priorites this has gotten bumped down quite a bit as more pressing things always seem to be there. I'd say let's leave this open and swing back to it when time is a luxury we have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmarking statistics for divergence simulation tests

4 participants