-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hello,
I want to point you to an error in your implementation of the metrics with relaxed boundaries.
For example, let's look at the lines 36-40 in Evaluate.m:
L36 % relaxed boundary
L37 % revised for cholec80 dataset !!!!!!!!!!!
L38 if(iPhase == 4 || iPhase == 5) % Gallbladder dissection and packaging might jump between two phases
L39 curDiff(curDiff(1:t)==-1) = 0; % late transition
L40 curDiff(curDiff(end-t+1:end)==1 | curDiff(end-t+1:end)==2) = 0; % early transition
The instruction in line 39 is clear: If in the first t elements of the curDiff array an element equals -1 then it will be set to 0.
The instruction in line 40 contains the bug (similarly in lines 43 and 47): Here, elements within the last t elements of the array will be checked (for equality to 1 or 2) and accordingly, elements within the first (not last!) t elements will be set to 0. More specifically, if the element at position (end - t + i) equals 1 or 2 then the element at position i will be set to 0.
Reason for this behavior is that curDiff(end-t+1:end)==1 returns a Boolean array of length t, which is used to select elements in the curDiff array, and curDiff has length >= t. By default, Matlab applies the t-length array to the first t elements in curDiff, even if it was meant to be applied to the last t elements.
It's a bit cumbersome to explain it in words, but you'll see the problem clearly when running the following example lines of code in Matlab or Octave:
>> t = 3
>> curDiff = [-2, -2, -2, 0, 0, 0, 0, 1, 2]
>> curDiff(curDiff(1:t)==-1) = 0;
>> curDiff(curDiff(end-t+1:end)==1 | curDiff(end-t+1:end)==2) = 0;
>> curDiff
You will see that curDiff equals [-2, 0, 0, 0, 0, 0, 0, 1, 2] afterwards instead of [-2, -2, -2, 0, 0, 0, 0, 0, 0] as would be the expected outcome.
Besides this problem, I think that the formulas for relaxed precision and relaxed recall are ill-defined, because they can actually exceed 1 (or 100%).
To conclude, it seems that these relaxed metrics are not suited for evaluating phase recognition algorithms, at least not in their current state. We found further inconsistencies in the comparison of surgical phase recognition algorithms on Cholec80 - kindly check our report on arXiv if you are interested.
Best regards,
Isabel