Dealing with memory and speed issues in permutation analysis for large datasets by areyesq89 · Pull Request #2 · tengmx/gcapc

areyesq89 · 2018-07-05T15:31:26Z

I ran into the problem reported here and dig into debugging the error. It is related to the gcapcPeaks function. It turns out that, for some datasets, the vector of permuted values can be extremely large (>50 billion) and the functions quantile and density just break. I solved this by adding an option to sample permuted values from a uniform distribution. I added a parameter permsamp= that indicates the fraction of permuted values to use for the size of the sample.

I also modified some lines of code to speed them up. For the example dataset, these changes improve the runs by only ~5 seconds, but for larger datasets it makes a more substantial difference.

This version passes R CMD check without problems. Let me know if these suggestions make sense!

Alejandro

…apcPeaks

…sities

areyesq89 added 4 commits July 2, 2018 20:04

Improved speed of pvalue calculation and sampling for quantiles in gc…

34e3dee

…apcPeaks

Speed up code and subset permutation values also when calculating den…

e359507

…sities

removed system.time command

6b96abc

bumped version, improved wording of the sampperm parameter

53a63d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with memory and speed issues in permutation analysis for large datasets#2

Dealing with memory and speed issues in permutation analysis for large datasets#2
areyesq89 wants to merge 4 commits intotengmx:masterfrom
areyesq89:integerFix

areyesq89 commented Jul 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

areyesq89 commented Jul 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments