-
Notifications
You must be signed in to change notification settings - Fork 18
Description
First of all compliments for the results achieved with this piece of software!
Bug
When I was using the software to register WSIs I encountered an error which was difficult to traceback but I think found the little bug. When registering the WSIs most of them worked perfectly fine, but at very few slide I got this error:
Exception: Argument #6: Padding size should be less than the corresponding input dimension, but got: padding (1, 1) at dimension 2 of input [1, 1, 1, 2]
Traceback (most recent call last):
File "script.py", line 83, in <module>
main()
File "script.py", line 79, in main
deeperhistreg_caller(args)
File "script.py", line 55, in deeperhistreg_caller
deeperhistreg.run_registration(**dhr_config)
File "DeeperHistReg/deeperhistreg/run.py", line 55, in run_registration
warped_name = [item for item in os.listdir(pathlib.Path(save_path) / experiment_name / "Results_Final") if "warped_source" in item][0]
IndexError: list index out of range
It seemed to happen at a very low proportion of the slides, but it kept failing to perform the registration for the same slides. Unfortunately, the python error and traceback is raised because the nonrigid registration was not performed and hence the final results could not be made. The actual error has no traceback since it is caught here:
DeeperHistReg/deeperhistreg/run.py
Lines 39 to 46 in 42e7c9d
| ### Run Registration ### | |
| try: | |
| registration_parameters['logging_path'] = pathlib.Path(save_path) / "logs.txt" | |
| registration_parameters['case_name'] = experiment_name | |
| pipeline = fr.DeeperHistReg_FullResolution(registration_parameters) | |
| pipeline.run_registration(source_path, target_path, save_path) | |
| except Exception as e: | |
| print(f"Exception: {e}") |
Since the program will fail anyway and the try except loop will only make it harder to traceback the actual error I would suggest removing it. Leaving it to the user to decide whether they want to catch errors if they are registering a batch of WSIs and removing possible confusion about errors. Also in:
After disabling the try except loop above I obtained the actual traceback:
Traceback (most recent call last):
File "script.py", line 86, in <module>
main()
File "script.py", line 82, in main
deeperhistreg_caller(args)
File "script.py", line 56, in deeperhistreg_caller
deeperhistreg.run_registration(**dhr_config)
File "DeeperHistReg/deeperhistreg/run.py", line 44, in run_registration
pipeline.run_registration(source_path, target_path, save_path)
File "DeeperHistReg/deeperhistreg/dhr_pipeline/../dhr_pipeline/full_resolution.py", line 299, in run_registration
self.nonrigid_registration()
File "DeeperHistReg/deeperhistreg/dhr_pipeline/../dhr_pipeline/full_resolution.py", line 240, in nonrigid_registration
self.run_nonrigid_registration()
File "DeeperHistReg/deeperhistreg/dhr_pipeline/../dhr_pipeline/full_resolution.py", line 189, in run_nonrigid_registration
self.nonrigid_displacement_field = nonrigid_registration_function(self.pre_source, self.pre_target, self.current_displacement_field, nonrigid_registration_params)
File "DeeperHistReg/deeperhistreg/dhr_pipeline/../dhr_registration/nonrigid_registration_methods.py", line 66, in instance_optimization_nonrigid_registration
return ion.instance_optimization_nonrigid_registration(source, target, initial_displacement_field, params)
File "DeeperHistReg/deeperhistreg/dhr_pipeline/../dhr_registration/./dhr_nonrigid_registration/io_nonrigid.py", line 67, in instance_optimization_nonrigid_registration
displacement_field = io.nonrigid_registration(warped_source, resampled_target, num_levels, used_levels, iterations, learning_rates, alphas,
File "DeeperHistReg/deeperhistreg/dhr_pipeline/../dhr_registration/./dhr_building_blocks/instance_optimization.py", line 172, in nonrigid_registration
source_pyramid = u.create_pyramid(source, num_levels=num_levels)
File "DeeperHistReg/deeperhistreg/dhr_pipeline/../dhr_utils/utils.py", line 462, in create_pyramid
new_tensor = resample_tensor_to_size(gaussian_smoothing(pyramid[i+1], 1), new_size, mode=mode)
File "DeeperHistReg/deeperhistreg/dhr_pipeline/../dhr_utils/utils.py", line 90, in gaussian_smoothing
return tr.GaussianBlur(kernel_size, sigma)(tensor)
File "venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "venv/lib/python3.10/site-packages/torchvision/transforms/transforms.py", line 1812, in forward
return F.gaussian_blur(img, self.kernel_size, [sigma, sigma])
File "venv/lib/python3.10/site-packages/torchvision/transforms/functional.py", line 1380, in gaussian_blur
output = F_t.gaussian_blur(t_img, kernel_size, sigma)
File "envs/venv/lib/python3.10/site-packages/torchvision/transforms/_functional_tensor.py", line 760, in gaussian_blur
img = torch_pad(img, padding, mode="reflect")
File "venv/lib/python3.10/site-packages/torch/nn/functional.py", line 5096, in pad
return torch._C._nn.pad(input, pad, mode, value)
RuntimeError: Padding size should be less than the corresponding input dimension, but got: padding (1, 1) at dimension 2 of input [1, 1, 1, 2]
After debugging for a long time I think the problem is here:
Origin of bug
For certain dimensions I obtained the error described above which was the result of a tensor that was resampled to a size of (1, 1, 255, width) instead of the (1, 1, 256, width) (as set in params["preprocessing_params"]["initial_resolution"]) resulting in Exception: Argument #6: Padding size should be less than the corresponding input dimension, but got: padding (1, 1) at dimension 2 of input [1, 1, 1, 2] after further down sampling.
| source = u.resample(u.gaussian_smoothing(source, initial_smoothing), initial_resample_ratio) | |
| target = u.resample(u.gaussian_smoothing(target, initial_smoothing), initial_resample_ratio) |
DeeperHistReg/deeperhistreg/dhr_utils/utils.py
Lines 60 to 70 in 42e7c9d
| def resample(tensor : tc.Tensor, resample_ratio : float, mode: str="bilinear") -> tc.Tensor: | |
| """ | |
| TODO | |
| """ | |
| return F.interpolate(tensor, scale_factor = 1 / resample_ratio, mode=mode, recompute_scale_factor=False, align_corners=False) | |
| def resample_tensor_to_size(tensor: tc.Tensor, new_size: tc.Tensor, mode: str='bilinear') -> tc.Tensor: | |
| """ | |
| TODO | |
| """ | |
| return F.interpolate(tensor, size=new_size, mode=mode, align_corners=False) |
This seems to appear because the resampling was done using the correctly calculated
initial_resample_ratio but due to floating point precision and the PyTorch implementation the resulting height dimension will be 255.99999... which is cast to int (i.e. floored) in PyTorch to 255 resulting in the error described above.
See also:
- Most relevant PyTorch GitHub issue: pytorch/pytorch/62237: torch.nn.functional.interpolate nearest behaviour changed in PyTorch 1.9 and appears incorrect
Proposed bugfix
As the output size is most important in this case my suggestion would be to use your resample_tensor_to_size method instead as this resolves the behavior. This PyTorch form page suggested that using torch.nn.functional.interpolate with recompute_Scale_factor=True should also work, but when I tested it (see the test below) it still produced a tensor with height 255 instead of 256.
I implemented this change in:
Setup
- OS: Rocky Linux 8.7 (Green Obsidian) - Slurm 23.02.8
- GPUs: 3x RTX8000 (VRAM: 48GB) CUDA-compatible setup
- Python: 3.10.16
- Pip: pip 23.0.1 from .../venv/lib/python3.10/site-packages/pip (python 3.10)
- torch 2.5.1+cu118
- torchvision 0.20.1+cu118
- torchio 0.21.0
- torchstain 1.4.1
- torchsummary 1.5.1
Stand-alone test
Below is a stand-alone test to reproduce the bug and test bugfixes:
import pathlib
import logging
import unittest
import torch as tc
import torch.nn.functional as F
from deeperhistreg.dhr_utils import utils as u
def resample_modified(tensor: tc.Tensor, ratio: float, mode="bilinear"):
"""Modified resample method: use `recompute_scale_factor=True`"""
return F.interpolate(tensor, scale_factor=1 / ratio, mode=mode, recompute_scale_factor=True, align_corners=False)
def init_logger():
"""Basic logger"""
log_path = pathlib.Path(__file__).with_suffix(".log")
fmt = "[%(levelname)s] - %(asctime)s - %(name)s: %(message)s - %(pathname)s:%(lineno)d"
logging.basicConfig(
format=fmt,
level=logging.INFO,
handlers=[
logging.FileHandler(log_path),
logging.StreamHandler()
],
force=True, # override pytest/unittest handlers
)
return logging.getLogger("Resampling-test")
logger = init_logger()
class TestResampling(unittest.TestCase):
"""NOTE: This test mainly focused on height but the same might be applied to width"""
def setUp(self):
# Deterministic behavior
tc.manual_seed(0)
tc.backends.cudnn.deterministic = True
tc.backends.cudnn.benchmark = False
# Use CUDA if available
self.device = tc.device("cuda" if tc.cuda.is_available() else "cpu")
logger.info(f"Using device: {self.device}")
self.initial_resolution = 256
self.H = 16051
self.W = 17600
self.src = tc.rand(1, 3, self.H, self.W, device=self.device)
self.trg = tc.rand(1, 3, self.H, self.W, device=self.device)
logger.info(f"Initial shapes - source: {self.src.shape} | target: {self.trg.shape}")
# Resampling parameters (taken from production code)
self.ratio = u.calculate_resampling_ratio(
(self.W, self.W),
(self.H, self.H),
self.initial_resolution,
)
self.sigma = max(self.ratio - 1, 0.1)
logger.info(f"Resampling ratio: {self.ratio} with Gaussian smoothing sigma: {self.sigma}")
def test_resample_without_recompute(self):
logger.info("TEST RESAMPLE (current implementation)")
src_smooth = u.gaussian_smoothing(self.src, self.sigma)
trg_smooth = u.gaussian_smoothing(self.trg, self.sigma)
logger.info(f"Gaussian-smoothed shapes (src|trg): {src_smooth.shape} | {trg_smooth.shape}")
out_src = u.resample(src_smooth, self.ratio)
out_trg = u.resample(trg_smooth, self.ratio)
logger.info(f"Resampled shapes (src|trg): {out_src.shape} | {out_trg.shape}")
# Check for mismatch bug
expected_h = int(round(self.H / self.ratio))
self.assertEqual(out_src.shape[2], out_trg.shape[2])
self.assertIn(out_src.shape[2], [expected_h, expected_h - 1, expected_h + 1])
def test_resample_with_recompute(self):
logger.info("TEST RESAMPLE WITH RECOMPUTE SCALE RATIO (possible suggestion)")
src_smooth = u.gaussian_smoothing(self.src, self.sigma)
trg_smooth = u.gaussian_smoothing(self.trg, self.sigma)
logger.info(f"Gaussian-smoothed shapes (src|trg): {src_smooth.shape} | {trg_smooth.shape}")
out_src = resample_modified(src_smooth, self.ratio)
out_trg = resample_modified(trg_smooth, self.ratio)
logger.info(f"Resampled shapes (src|trg): {out_src.shape} | {out_trg.shape}")
# Check output dimensions
expected_h = int(round(self.H / self.ratio))
self.assertEqual(out_src.shape[2], out_trg.shape[2])
self.assertIn(out_src.shape[2], [expected_h]) # height is still 255, test might fail
def test_resample_to_explicit_size(self):
logger.info("TEST RESAMPLE TO EXPLICIT SIZE (best bugfix)")
resample_dim = (self.initial_resolution, round(self.W * (1 / self.ratio)))
logger.info(f"Target size: {resample_dim}")
src_smooth = u.gaussian_smoothing(self.src, self.sigma)
trg_smooth = u.gaussian_smoothing(self.trg, self.sigma)
logger.info(f"Gaussian-smoothed shapes (src|trg): {src_smooth.shape} | {trg_smooth.shape}")
out_src = u.resample_tensor_to_size(src_smooth, resample_dim)
out_trg = u.resample_tensor_to_size(trg_smooth, resample_dim)
logger.info(f"Resampled shapes (src|trg): {out_src.shape} | {out_trg.shape}")
# Check resample height
self.assertEqual(self.initial_resolution == round(self.H * (1 / self.ratio)))
# Check output dimensions
self.assertEqual(out_src.shape[2:], resample_dim)
self.assertEqual(out_trg.shape[2:], resample_dim)
if __name__ == "__main__":
unittest.main()Slide dimensions at which this problem occurs
It happens only at specific and unfortunate slide dimensions. I ran a more extensive test on 2 slides:
| SRC W | SRC H | TRG W | TRG H | Ratio | W | H | Resampled H | Err |
|---|---|---|---|---|---|---|---|---|
| 88000 | 79744 | 87872 | 80256 | 0.2 | 17600 | 16051 | 255 | Yes |
| 88000 | 79744 | 87872 | 80256 | 0.2 | 17600 | 16052 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.2 | 17600 | 16053 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.2 | 17600 | 16054 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.2 | 17600 | 16055 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8026 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8027 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8028 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8029 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8030 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8031 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8032 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8033 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8034 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8035 | 255 | Yes |
| … | … | … | … | … | … | … | … | … |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8039 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.1 | 8800 | 8040 | 255 | Yes |
| 88000 | 79744 | 87872 | 80256 | 0.15 | 13200 | 12038 | 256 | No |
| … | … | … | … | … | … | … | … | … |
| 88000 | 79744 | 87872 | 80256 | 0.15 | 13200 | 12047 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.15 | 13200 | 12048 | 255 | Yes |
| 88000 | 79744 | 87872 | 80256 | 0.15 | 13200 | 12049 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.15 | 13200 | 12050 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.15 | 13200 | 12051 | 256 | No |
| 88000 | 79744 | 87872 | 80256 | 0.15 | 13200 | 12052 | 256 | No |
Abbreviations:
- SRC: source
- TRG: target
- W: width,
- H: height,
- Err: error