-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Is your feature request related to a problem? Please describe.
I want to know how to accomplish the following. It doesn't need to require zero effort on the user's part, but there needs to be a clear best-hangar-practices path to a workable setup.
Take a source data format that is complex (e.g. DICOM, JPEG) and infeasible to reconstitute bit-exact from the tensor+metadata form. Each instance of the raw data produces a sample that consists of 2 (or more) tensors; an image tensor and a 1D tensor that encodes things like lat/long or age/sex/etc. (to be concatenated with the output of the convolutional layers prior to the fully connected layers). To be clear, this is intended to be an illustrative example, not a concrete use case.
Per my reading of the docs, right now these two tensors wouldn't qualify as being in the same hangar dataset (it's not clear if that's problematic or not).
Let's express the above conversion as:
f_v1(raw) -> (t1, t2)
Users will need to:
- Update the conversion function to
f_v2and repopulatet1andt2. - Update the conversion function to
f_v3which outputs(t1, t2, t3). - Update the raw data for a sample and repopulate
t1andt2. - Be handed the pair of
t1andt2for training/validation (including when training is randomized). - Retrieve the raw data given IDs/tags/metadata included with the training sample (for use in an external viewer, manual investigation, etc.).
Describe the solution you'd like
I think that changing the definition of a sample to be a tuple of binary blobs plus a tuple of tensors plus metadata would work, but I haven't considered the potential impacts from that kind of change. Seems potentially large.
Describe alternatives you've considered
Another option would be to have separate datasets for t1 and t2 and combine them manually, plus manage the binary blobs separately. That seems like a lot of infra work, and might be at risk of having drift between the samples themselves, and with the blobs.
Additional context
I suspect that I want/expect Hangar to solve a larger slice of the problem than it's intended to, but it's not clear at first glance what the intended approach would be for more complicated setups like the above.