PCAM: fix DataLoader pickling error by avoiding module on self #9200
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PCAM: lazy-import h5py via
_get_h5pyto keep dataset picklableFixes: #9195
Problem
PCAMpreviously stored theh5pymodule on the dataset instance, making it unpicklable.DataLoader(num_workers>0)/ spawn-based DDP pickle the dataset, causingTypeError: cannot pickle 'module' object.Change
_get_h5py()helper that importsh5pyon demand with a clear error message.h5py = _get_h5py()in__len__and__getitem__(and any other call sites) instead ofself.h5py.__init__; no module stored onself. Error text for missingh5pyremains unchanged.Tests
test_update_PCAM.pyPCAMTestCase(inheritsImageDatasetTestCase).inject_fake_data(...)writes tiny HDF5 files for each split (train,val,test) with datasetsxandy.h5py. Exercises core dataset behaviors on synthetic data (offline).test_update_PCAM_multiprocessing.pytorch.multiprocessing.spawnwithworld_size=2, backendgloo).DistributedSamplerandDataLoader(num_workers=2, persistent_workers=True).all_reducesanity check to ensure workers progress without pickling errors.