Hi, I was trying to re-run my notebook but I’m getting the following error message when I try to run a segmentation model using docker
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=47 error=999 : unknown error
Traceback (most recent call last):
File "/usr/local/bin/hd-bet", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/HD-BET/HD_BET/hd-bet", line 119, in <module>
run_hd_bet(input_files, output_files, mode, config_file, device, pp, tta, save_mask, overwrite_existing)
File "/HD-BET/HD_BET/run.py", line 63, in run_hd_bet
net.cuda(device)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 376, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 458, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 190, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (999) : unknown error at /pytorch/aten/src/THC/THCGeneral.cpp:47
Using contrast T1 as reference
Traceback (most recent call last):
File "scripts/run.py", line 505, in <module>
not args.no_permissions
File "scripts/run.py", line 280, in run
output1 = subp.check_output(["hd-bet", "-i", file_, "-device", "0"])
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['hd-bet', '-i', '/output/T1_r2s.nii.gz', '-device', '0']' returned non-zero exit status 1.
I’ve tried looking for solutions online, but nothing worked. I didn’t make any change to the environment or my virtual machine, and the same code works just fine on another virtual machine on the same Google Cloud project.
Any clue of what could be the issue?
Thanks!