-
Notifications
You must be signed in to change notification settings - Fork 75
Open
Labels
enhancementNew feature or requestNew feature or request
Description
🚀 Feature
The ability to provide length as an input argument to the CombinedStreamingDataset such that the epoch length is dissociated from the number of samples in the dataset. Same as ParallelStreamingDataset.
Motivation
I want to create a CombinedStreamingDataset that is the weighted combination of StreamingDatasets but be able to specify the number of training steps/cycle the CombinedStreamingDataset arbitrarily. As discussed with @tchaton.
Related to #524
Alternatives
Not sure if this would work but conceptually one workaround might be to wrap the CombinedStreamingDataset with the ParallelStreamingDataset? e.g.
ds1 = StreamingDataset(...)
ds2 = StreamingDataset(...)
cds = CombinedStreamingDataset([ds1, ds2], weights)
pds = ParallelStreamingDataset([cds], length=100)
bhimrazy
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request