Skip to content

Fixing timed runs#8

Open
bmorris3 wants to merge 3 commits intolesliehebb:masterfrom
bmorris3:timedrunfix
Open

Fixing timed runs#8
bmorris3 wants to merge 3 commits intolesliehebb:masterfrom
bmorris3:timedrunfix

Conversation

@bmorris3
Copy link
Contributor

@bmorris3 bmorris3 commented Dec 7, 2015

I'm experimenting with timed runs, and the job terminates at unexpected times. I've been testing with short runs that should last a few minutes, and they terminate after tens of seconds. it looks like they make a guess as to how long the next steps will take and try to anticipate when to stop.


Here I'll try to explain what's happening in the code before this pull request. It looks like this is what's happening:

  1. Set avgtime=0 (here)
  2. For the first step, set avgtime=(avgtime*msi+time1-time0)/(msi+1) (see here), which reduces to avgtime=(time1-time0)/(msi+1) for avgtime=0. time1 is the time at the current step, time0 is time at the start of the most recent step. msi is the MCMC step index.
  3. For future steps, avgtime != 0, so avgtime is set to the average run time per step times the number of steps, added to the elapsed time, all divided by the step number, i.e. ((time per step)*(step #) + (delta t)) / (step #), which is equivalent to (time per step)/(step #) + (delta t)/(step #). What is this supposed to represent?
  4. Then if time1+2*avgtime+300>maxtime (here), end the run. How was this expression chosen?

I think it would make more sense to me if avgtime = (time1 - time0)/msi always, if time0 was set at the beginning of the MCMC run and not reset for each step, and the terminating condition is time1 - time0 + avgtime > maxtime. This way if the elapsed time plus the time to complete the next step is greater than the requested maximum time, it will terminate the run.

I've implemented that change in this pull request. I tested it locally and for a run with the mcmc steps parameter set to -80, i.e. 80 seconds, the output files were last changed 120 seconds after they were created, perhaps showing that there's some overhead before the chains begin running, but otherwise showing that this pull request does as intended.

I've also updated the README to increment the apparent version number to V4.5.1.

@bmorris3
Copy link
Contributor Author

bmorris3 commented Dec 9, 2015

@lesliehebb -- I've added the five minute buffer in 3dca69d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant