Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary warning when using TorchIO inside Dataset __getitem__ without returning Subject objects #1247

Open
1 task done
acqxi opened this issue Dec 5, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@acqxi
Copy link

acqxi commented Dec 5, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Bug summary

When using TorchIO within the __getitem__ method of a custom Dataset, and returning a tuple of (torch.Tensor, dict) instead of a torchio.Subject, I receive a warning suggesting to use SubjectsLoader. However, replacing DataLoader with SubjectsLoader is not feasible in this case and leads to errors, as SubjectsLoader expects Subject instances. The warning seems unnecessary and cannot be easily suppressed.

Code for reproduction

import torch
import torchio as tio
from torch.utils.data import Dataset, DataLoader
from typing import Tuple, Dict
import numpy as np

class CustomDataset(Dataset):
    def __init__(self, data_list, transform=None):
        self.data_list = data_list
        self.transform = transform

    def __len__(self):
        return len(self.data_list)

    def __getitem__(self, idx) -> Tuple[torch.Tensor, Dict]:
        # Simulate loading image and mask arrays
        image_array = np.random.rand(1, 128, 128, 128).astype(np.float32)
        mask_array = np.random.randint(0, 2, (1, 128, 128, 128)).astype(np.int16)
        metadata = {'label': 0}

        # Create TorchIO Images
        image = tio.ScalarImage(tensor=image_array)
        mask = tio.LabelMap(tensor=mask_array)

        # Create a Subject
        subject = tio.Subject(image=image, mask=mask)

        # Apply transforms if any
        if self.transform:
            subject = self.transform(subject)

        # Process the image (e.g., apply masking)
        processed_image = subject['image'].data.float().contiguous()

        # Return tensor and metadata dictionary
        return processed_image, metadata

# Define any transforms (optional)
transform = tio.Compose([
    tio.RandomAffine(),
    tio.RandomFlip(),
])

# # Create the dataset and dataloader
# dataset = CustomDataset(data_list=[0, 1, 2], transform=transform)
# loader = DataLoader(dataset, batch_size=2)

# Attempting to replace `DataLoader` with `SubjectsLoader` as suggested:
from torchio import SubjectsLoader
loader = SubjectsLoader(dataset, batch_size=2)

# Iterate through the DataLoader
for batch in loader:
    images, labels = batch
    print(type(images), type(labels))
    break

Actual outcome

Running the provided code with a standard DataLoader results in a warning message from TorchIO, even though the data returned by the custom Dataset is a tuple of (torch.Tensor, dict) and does not involve torchio.Subject objects in the final output. The warning is unnecessary because the actual batch structure is fully compatible with PyTorch's DataLoader.

When attempting to follow the warning's suggestion to replace DataLoader with SubjectsLoader, the program fails with an AttributeError because SubjectsLoader expects each dataset item to be a torchio.Subject, but the Dataset returns a tuple instead. This makes SubjectsLoader unusable for this scenario.

Error messages

Traceback (most recent call last):
  File "example.py", line XX, in <module>
    batch = next(iter(loader))
  File "path_to_python/lib/python3.X/site-packages/torch/utils/data/dataloader.py", line XXX, in __next__
    data = self._next_data()
  File "path_to_python/lib/python3.X/site-packages/torch/utils/data/dataloader.py", line XXX, in _next_data
    data = self._dataset_fetcher.fetch(index)
  File "path_to_python/lib/python3.X/site-packages/torch/utils/data/_utils/fetch.py", line XX, in fetch
    return self.collate_fn(data)
  File "path_to_python/lib/python3.X/site-packages/torchio/data/loader.py", line XX, in _collate
    for key in first_subject.keys():
AttributeError: 'tuple' object has no attribute 'keys'

Expected outcome

I expect that when using TorchIO inside the __getitem__ method but returning standard PyTorch data structures (e.g., torch.Tensor and dict), the warning about using SubjectsLoader should not be displayed. Alternatively, there should be a way to suppress this warning when it's not applicable.

System info

Platform:   Linux-5.15.0-107-generic-x86_64-with-glibc2.27
TorchIO:    0.20.2
PyTorch:    2.3.1
SimpleITK:  2.2.1 (ITK 5.3)
NumPy:      1.26.4
Python:     3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]
@acqxi acqxi added the bug Something isn't working label Dec 5, 2024
@acqxi
Copy link
Author

acqxi commented Dec 5, 2024

Related Discussion

This issue seems to be related to the behavior described in #1179 , where the use of torchio.Subject in PyTorch >= 2.3 caused unintended behavior with DataLoader and led to the introduction of SubjectsLoader. However, unlike the examples in #1179, my use case does not involve returning Subject instances from the Dataset. The warning in this scenario becomes a false positive and creates confusion.

While SubjectsLoader was introduced as a solution for handling Subject instances in batches, it does not work in my case because my data pipeline ultimately returns PyTorch-compatible types (torch.Tensor and dict), making SubjectsLoader incompatible.

@lorinczszabolcs
Copy link

Another problem is that even if I'm compatible with SubjectsLoader and SubjectsDataset, it's giving the same warning, see #1261

@fepegar
Copy link
Member

fepegar commented Jan 27, 2025

Thanks for reporting, @acqxi . Unfortunately I won't be able to pick this up any time soon. Feel free to give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants