Skip to content

Bug in implementation of evaluation metrics #8

@IsabelFunke

Description

@IsabelFunke

Hello,
I want to point you to an error in your implementation of the metrics with relaxed boundaries.

For example, let's look at the lines 36-40 in Evaluate.m:

L36        % relaxed boundary
L37        % revised for cholec80 dataset !!!!!!!!!!!
L38        if(iPhase == 4 || iPhase == 5) % Gallbladder dissection and packaging might jump between two phases
L39            curDiff(curDiff(1:t)==-1) = 0; % late transition
L40            curDiff(curDiff(end-t+1:end)==1 | curDiff(end-t+1:end)==2) = 0; % early transition 

The instruction in line 39 is clear: If in the first t elements of the curDiff array an element equals -1 then it will be set to 0.
The instruction in line 40 contains the bug (similarly in lines 43 and 47): Here, elements within the last t elements of the array will be checked (for equality to 1 or 2) and accordingly, elements within the first (not last!) t elements will be set to 0. More specifically, if the element at position (end - t + i) equals 1 or 2 then the element at position i will be set to 0.

Reason for this behavior is that curDiff(end-t+1:end)==1 returns a Boolean array of length t, which is used to select elements in the curDiff array, and curDiff has length >= t. By default, Matlab applies the t-length array to the first t elements in curDiff, even if it was meant to be applied to the last t elements.

It's a bit cumbersome to explain it in words, but you'll see the problem clearly when running the following example lines of code in Matlab or Octave:

>> t = 3
>> curDiff = [-2, -2, -2, 0, 0, 0, 0, 1, 2]
>> curDiff(curDiff(1:t)==-1) = 0; 
>> curDiff(curDiff(end-t+1:end)==1 | curDiff(end-t+1:end)==2) = 0;
>> curDiff

You will see that curDiff equals [-2, 0, 0, 0, 0, 0, 0, 1, 2] afterwards instead of [-2, -2, -2, 0, 0, 0, 0, 0, 0] as would be the expected outcome.

Besides this problem, I think that the formulas for relaxed precision and relaxed recall are ill-defined, because they can actually exceed 1 (or 100%).

To conclude, it seems that these relaxed metrics are not suited for evaluating phase recognition algorithms, at least not in their current state. We found further inconsistencies in the comparison of surgical phase recognition algorithms on Cholec80 - kindly check our report on arXiv if you are interested.

Best regards,
Isabel

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions