Optimizations to R ETL process (esp. loading). #4

concretevitamin · 2015-03-28T19:13:32Z

Directly reads from .txt file instead of saving out to .Rdata first
then reading back again. Prototyped for Regression.
Even if the .Rdata step is desired, using fread() has much better
performance.

I've found this to be much more efficient for benchmarking (tested on an EC2 instance). If this approach looks good, I could certainly make corresponding changes for all queries.

- Directly reads from .txt file instead of saving out to .Rdata first then reading back again. Prototyped for Regression. - Even if the .Rdata step is desired, using fread() has much better performance.

rytaft · 2015-07-28T06:04:53Z

Sorry this slipped through the cracks and I am only looking at this now. Thanks for submitting your code!

Regarding the changes to vanilla_R_benchmark.R, I have done a bit of testing on the 5000x5000 dataset, and it seems that load() on a binary file is faster than fread() on a text file (6.5 seconds v. 11.8 seconds). Under what conditions did you find fread() to be faster?

Regarding the changes to generate_Rdata.R, fread() is certainly faster than read.csv(), but it seems to leave the data in a format that doesn't work with the code in vanilla_R_benchmark.R. I haven't done much debugging, but if you have any ideas I'd definitely appreciate them!

Optimizations to R ETL process (esp. loading).

04e4b39

- Directly reads from .txt file instead of saving out to .Rdata first then reading back again. Prototyped for Regression. - Even if the .Rdata step is desired, using fread() has much better performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimizations to R ETL process (esp. loading). #4

Optimizations to R ETL process (esp. loading). #4

Uh oh!

concretevitamin commented Mar 28, 2015

Uh oh!

rytaft commented Jul 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimizations to R ETL process (esp. loading). #4

Are you sure you want to change the base?

Optimizations to R ETL process (esp. loading). #4

Uh oh!

Conversation

concretevitamin commented Mar 28, 2015

Uh oh!

rytaft commented Jul 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants