avoid grepping for each line... by bpow · Pull Request #2 · cbuhay/ExCID

bpow · 2015-04-03T13:01:13Z

Calling a system grep process for each line (or two grep processes per line in the case of VEGADB) of the reference databases is inefficient.

On my desktop, setup.sh takes almost an hour. Most of this time is in the check_HGNC_individual_VEGADB.pl script. This patch performs the equivalent in a matter of seconds.

The overall setup.sh still takes >20 minutes because something else becomes the rate-limiting step. Something similar could be done for the other check_HGNC_individual*pl scripts, but you would have to be careful because making an index splitting by [\s,] is not the same as grep -w (for example, hsa-mir-511 matches hsa-mir-511-1 when using grep -w, but would not match with a simple index of "words" as done here.

So this could just be considered an example of how to address an inefficiency in the setup process.

Rashesh7 · 2015-04-09T19:29:15Z

Hi,

This is really a great suggestion. Yes the patch will need to be edited for each database accordingly since creating a Hash generates a strict Key for each Gene.

Thank you.

avoid grepping for each line...

1a89366

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid grepping for each line...#2

avoid grepping for each line...#2
bpow wants to merge 1 commit intocbuhay:masterfrom
bpow:faster-vega

bpow commented Apr 3, 2015

Uh oh!

Rashesh7 commented Apr 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bpow commented Apr 3, 2015

Uh oh!

Rashesh7 commented Apr 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants