Description
I am looking to leverage torch.nn.parallel.DistributedDataParallel per the documentation you have written to integrate dual 3090s into a workflow. I am using the automatic repo and after trying multiple things to update the following code to include what you have in the torch wiki, I have been unsuccessful in switching the cuda current device to leverage the model methodology outlined in your documentation and stackoverflow examples. Do you have any recommendations on what I can read or leverage to test further? I know that Meta has been releasing some wonderful tools I have been using to support the Stable Diffusion project so I hope this is in your purview. If it is not, feel free to ignore.
def caching_allocator_alloc(size, device: Union[Device, int] = None, stream=None):
r"""Performs a memory allocation using the CUDA memory allocator.
Memory is allocated for a given device and a stream, this
function is intended to be used for interoperability with other
frameworks. Allocated memory is released through
:func:`~torch.cuda.caching_allocator_delete`.
Args:
size (int): number of bytes to be allocated.
device (torch.device or int, optional): selected device. If it is
``None`` the default CUDA device is used.
stream (torch.cuda.Stream or int, optional): selected stream. If is ``None`` then
the default stream for the selected device is used.
.. note::
See :ref:`cuda-memory-management` for more details about GPU memory
management.
"""
if device is None:
device = torch.cuda.current_device()
device = _get_device_index(0)
if stream is None:
stream = torch.cuda.current_stream(device)
if isinstance(stream, torch.cuda.streams.Stream):
stream = stream.cuda_stream
if not isinstance(stream, int):
raise TypeError('Invalid type for stream argument, must be '
'`torch.cuda.Stream` or `int` representing a pointer '
'to a exisiting stream')
with torch.cuda.device(device):
return torch._C._cuda_cudaCachingAllocator_raw_alloc(size, stream)