Skip to content

RuntimeError: Cannot re-initialize CUDA in forked subprocess #35

@bhjolly

Description

@bhjolly

I cloned the repo, added a training las file, ran python train.py and got the following:

(fsct) [jollyb@styx scripts]$ python train.py
Using default number of CPU cores (all of them).
Processing using  128 / 128  CPU cores.
/home/jollyb/code/FSCT/data directory found.
/home/jollyb/code/FSCT/data/train directory found.
/home/jollyb/code/FSCT/data/train/sample_dir directory created.
Preprocessing train_dataset point clouds...
/home/jollyb/code/FSCT/data/train/example.las
Loading file... /home/jollyb/code/FSCT/data/train/example.las
Pre-processing point cloud...
/home/jollyb/code/FSCT/data/train/all_points_train.las
Loading file... /home/jollyb/code/FSCT/data/train/all_points_train.las
Pre-processing point cloud...
/home/jollyb/code/FSCT/data directory found.
/home/jollyb/code/FSCT/data/validation directory found.
/home/jollyb/code/FSCT/data/validation/sample_dir directory created.
Preprocessing train_dataset point clouds...
/home/jollyb/code/FSCT/data/validation/example.las
Loading file... /home/jollyb/code/FSCT/data/validation/example.las
Pre-processing point cloud...
/home/jollyb/code/FSCT/data/validation/all_points_validate.las
Loading file... /home/jollyb/code/FSCT/data/validation/all_points_validate.las
Pre-processing point cloud...
Running deep learning using  1 / 128  CPU cores.
Loading existing model...
File not found, creating new model...
=====================================================================
EPOCH  0
Traceback (most recent call last):
  File "/home/jollyb/code/FSCT/scripts/train.py", line 389, in <module>
    run_training.run_training()
  File "/home/jollyb/code/FSCT/scripts/train.py", line 284, in run_training
    for data in self.train_loader:
  File "/home/jollyb/miniconda3/envs/fsct/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/jollyb/miniconda3/envs/fsct/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/home/jollyb/miniconda3/envs/fsct/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/home/jollyb/miniconda3/envs/fsct/lib/python3.9/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/jollyb/miniconda3/envs/fsct/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/jollyb/miniconda3/envs/fsct/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/jollyb/miniconda3/envs/fsct/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/jollyb/code/FSCT/scripts/train_datasets.py", line 33, in __getitem__
    x = torch.from_numpy(x.copy()).type(torch.float).to(self.device)
  File "/home/jollyb/miniconda3/envs/fsct/lib/python3.9/site-packages/torch/cuda/__init__.py", line 162, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I've had a quick look at the code but the solution is not obvious to me, any ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions