Video support, new datasets and models
This release adds support for video models and datasets, and brings several improvements.
Note: torchvision 0.4 requires PyTorch 1.2 or newer
Highlights
Video and IO
Video is now a first-class citizen in torchvision. The 0.4 release includes:
- efficient IO primitives for reading and writing video files
- Kinetics-400, HMDB51 and UCF101 datasets for action recognition, which are compatible with
torch.utils.data.DataLoader - Pre-trained models for action recognition, trained on Kinetics-400
- Training and evaluation scripts for reproducing the training results.
Writing your own video dataset is easy. We provide an utility class VideoClips that simplifies the task of enumerating all possible clips of fixed size in a list of video files by creating an index of all clips in a set of videos. It additionally allows to specify a fixed frame-rate for the videos.
from torchvision.datasets.video_utils import VideoClips
class MyVideoDataset(object):
def __init__(self, video_paths):
self.video_clips = VideoClips(video_paths,
clip_length_in_frames=16,
frames_between_clips``=1,
frame_rate=15)
def __getitem__(self, idx):
video, audio, info, video_idx = self.video_clips.get_clip(idx)
return video, audio
def __len__(self):
return self.video_clips.num_clips()We provide pre-trained models for action recognition, trained on Kinetics-400, which reproduce the results on the original papers where they have been first introduced, as well the corresponding training scripts.
| model | clip @ 1 |
|---|---|
| r3d_18 | 52.748 |
| mc3_18 | 53.898 |
| r2plus1d_18 | 57.498 |
Bugfixes
- change aspect ratio calculation formula in
references/detection(#1194) - bug fixes in ImageNet (#1149)
- fix save_image when height or width equals 1 (#1059)
- Fix STL10
__repr__(#969) - Fix wrong behavior of
GeneralizedRCNNTransformin Python2. (#960)
Datasets
New
- Add USPS dataset (#961)(#1117)
- Added support for the QMNIST dataset (#995)
- Add HMDB51 and UCF101 datasets (#1156)
- Add Kinetics400 dataset (#1077)
Improvements
- Miscellaneous dataset fixes (#1174)
- Standardize str argument verification in datasets (#1167)
- Always pass
transformandtarget_transformto abstract dataset (#1126) - Remove duplicate transform assignment in FakeDataset (#1125)
- Automatic extraction for Cityscapes Dataset (#1066) (#1068)
- Use joint transform in Cityscapes (#1024)(#1045)
- CelebA: track attr names, support split="all", code cleanup (#1008)
- Add folds option to STL10 (#914)
Models
New
- Add pretrained Wide ResNet (#912)
- Memory efficient densenet (#1003) (#1090)
- Implementation of the MNASNet family of models (#829)(#1043)(#1092)
- Add VideoModelZoo models (#1130)
Improvements
- Fix resnet fpn backbone for resnet18 and resnet34 (#1147)
- Add checks to
roi_headsin detection module (#1091) - Make shallow copy of input list in
GeneralizedRCNNTransform(#1085)(#1111)(#1084) - Make MobileNetV2 number of channel divisible by 8 (#1005)
- typo fix: ouput -> output in Inception and GoogleNet (#1034)
- Remove empty proposals from the RPN (#1026)
- Remove empty boxes before NMS (#1019)
- Reduce code duplication in segmentation models (#1009)
- allow user to define residual settings in MobileNetV2 (#965)
- Use
flatteninstead ofview(#1134)
Documentation
- Consistency in detection box format (#1110)
- Fix Mask R-CNN docs (#1089)
- Add paper references to VGG and Resnet variants (#1088)
- Doc, Test Fixes in
Normalize(#1063) - Add transforms doc to more datasets (#1038)
- Corrected typo: 5 to 0.5 (#1041)
- Update doc for
torchvision.transforms.functional.perspective(#1017) - Improve documentation for
fillcoloroption inRandomAffine(#994) - Fix
COCO_INSTANCE_CATEGORY_NAMES(#991) - Added models information to documentation. (#985)
- Add missing import in
faster_rcnn.pydocumentation (#979) - Improve
make_griddocs (#964)
Tests
- Add test for SVHN (#1086)
- Add tests for Cityscapes Dataset (#1079)
- Update CI to Python 3.6 (#1044)
- Make
test_save_imagemore robust (#1037) - Add a generic test for the datasets (#1015)
- moved fakedata generation to separate module (#1014)
- Create imagenet fakedata on-the-fly (#1012)
- Minor test refactorings (#1011)
- Add test for CIFAR10(0) (#1010)
- Mock MNIST download for less flaky tests (#1004)
- Add test for ImageNet (#976)(#1006)
- Add tests for datasets (#966)
Transforms
New
Improvements
- Allowing 'F' mode for 1 channel FloatTensor in
ToPILImage(#1100) - Add shear parallel to y-axis (#1070)
- fix error message in
to_tensor(#1000) - Fix TypeError in
RandomResizedCrop.get_params(#1036) - Fix
normalizefor differentdtypethanfloat32(#1021)
Ops
- Renamed
vision.hfiles tovision_cpu.handvision_cuda.h(#1051)(#1052) - Optimize
nms_cudaby avoiding extratorch.catcall (#945)
Reference scripts
- Expose data-path in the detection reference scripts (#1109)
- Make
utils.pywork with pytorch-cpu (#1023) - Add mixed precision training with Apex (#972)(#1124)
- Add reference code for similarity learning (#1101)
Build
- Add windows build steps and wheel build scripts (#998)
- add packaging scripts (#996)
- Allow forcing GPU build with
FORCE_CUDA=1(#927)