PCAM: fix DataLoader pickling error by avoiding module on self #9200
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
PCAM: lazy-import h5py via
_get_h5pyto keep dataset picklableFixes: #9195
Problem
PCAMpreviously stored theh5pymodule on the dataset instance, making it unpicklable.DataLoader(num_workers>0)/ spawn-based DDP pickle the dataset, causingTypeError: cannot pickle 'module' object.Change
_get_h5py()helper that importsh5pyon demand with a clear error message.h5py = _get_h5py()in__len__and__getitem__(and any other call sites) instead ofself.h5py.__init__; no module stored onself. Error text for missingh5pyremains unchanged.Tests
test_update_PCAM.pyPCAMTestCase(inheritsImageDatasetTestCase).inject_fake_data(...)writes tiny HDF5 files for each split (train,val,test) with datasetsxandy.h5py. Exercises core dataset behaviors on synthetic data (offline).test_update_PCAM_multiprocessing.pytorch.multiprocessing.spawnwithworld_size=2, backendgloo).DistributedSamplerandDataLoader(num_workers=2, persistent_workers=True).all_reducesanity check to ensure workers progress without pickling errors.