Why is Focal Loss using a sigmoid activation even in the multiclass case? #1232

jmv115 · 2025-09-08T12:32:22Z

jmv115
Sep 8, 2025

I found this stale issue that touches upon this same topic. #671

Currently the losses.FocalLoss interface doesn't allow to pass probabilities (unlike other Losses that have both a probs and logits implementation) and expects logits, which then are activated by a sigmoid in the losses._functional.focal_loss_with_logits function. Since sigmoid can only work in a binary case, the loss is calculated per class in binary fashion and then summed.

Interestingly, there's also a softmax version in losses._functional.softmax_focal_loss_with_logits and it's more similar to how the FocalLoss works in segmentation_models for tensorflow/keras. In this case, the loss wouldn't need to be calculated per class in the wrapper.

Is there any reason on why Focal Loss is calculated with this function and using sigmoid? Should the results be equivalent to softmax somehow? I would think softmax would be a more appropiate solution for the multiclass case but I'm am all ears in case this is a nicer approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Why is Focal Loss using a sigmoid activation even in the multiclass case? #1232

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Why is Focal Loss using a sigmoid activation even in the multiclass case? #1232

Uh oh!

Uh oh!

jmv115 Sep 8, 2025

Replies: 0 comments

jmv115
Sep 8, 2025