Skip to content

Commit b37bee7

Browse files
authored
Merge pull request #179 from theabhirath/mobilenet-fixes
Miscellaneous fixes for MobileNet
2 parents 099c1a5 + c19eda0 commit b37bee7

File tree

8 files changed

+52
-43
lines changed

8 files changed

+52
-43
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "Metalhead"
22
uuid = "dbeba491-748d-5e0e-a39e-b530a07fa0cc"
3-
version = "0.7.3-DEV"
3+
version = "0.7.3"
44

55
[deps]
66
Artifacts = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

src/convnets/convmixer.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Creates a ConvMixer model.
99
1010
- `planes`: number of planes in the output of each block
1111
- `depth`: number of layers
12-
- `inchannels`: number of channels in the input
12+
- `inchannels`: The number of channels in the input. The default value is 3.
1313
- `kernel_size`: kernel size of the convolutional layers
1414
- `patch_size`: size of the patches
1515
- `activation`: activation function used after the convolutional layers
@@ -45,7 +45,7 @@ Creates a ConvMixer model.
4545
# Arguments
4646
4747
- `mode`: the mode of the model, either `:base`, `:small` or `:large`
48-
- `inchannels`: number of channels in the input
48+
- `inchannels`: The number of channels in the input. The default value is 3.
4949
- `activation`: activation function used after the convolutional layers
5050
- `nclasses`: number of classes in the output
5151
"""

src/convnets/convnext.jl

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ Creates the layers for a ConvNeXt model.
3333
- `depths`: list with configuration for depth of each block
3434
- `planes`: list with configuration for number of output channels in each block
3535
- `drop_path_rate`: Stochastic depth rate.
36-
- `λ`: Initial value for [`LayerScale`](#)
37-
([reference](https://arxiv.org/abs/2103.17239))
36+
- `λ`: Initial value for [`LayerScale`](#)
37+
([reference](https://arxiv.org/abs/2103.17239))
3838
- `nclasses`: number of output classes
3939
"""
4040
function convnext(depths, planes; inchannels = 3, drop_path_rate = 0.0, λ = 1.0f-6,
@@ -92,7 +92,7 @@ Creates a ConvNeXt model.
9292
9393
# Arguments:
9494
95-
- `inchannels`: number of input channels.
95+
- `inchannels`: The number of channels in the input. The default value is 3.
9696
- `drop_path_rate`: Stochastic depth rate.
9797
- `λ`: Init value for [LayerScale](https://arxiv.org/abs/2103.17239)
9898
- `nclasses`: number of output classes

src/convnets/inception.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,7 @@ Creates an Inceptionv4 model.
326326
# Arguments
327327
328328
- `pretrain`: set to `true` to load the pre-trained weights for ImageNet
329-
- `inchannels`: number of input channels.
329+
- `inchannels`: The number of channels in the input. The default value is 3.
330330
- `dropout`: rate of dropout in classifier head.
331331
- `nclasses`: the number of output classes.
332332
@@ -426,7 +426,7 @@ Creates an InceptionResNetv2 model.
426426
427427
# Arguments
428428
429-
- `inchannels`: number of input channels.
429+
- `inchannels`: The number of channels in the input. The default value is 3.
430430
- `dropout`: rate of dropout in classifier head.
431431
- `nclasses`: the number of output classes.
432432
"""
@@ -459,12 +459,12 @@ Creates an InceptionResNetv2 model.
459459
# Arguments
460460
461461
- `pretrain`: set to `true` to load the pre-trained weights for ImageNet
462-
- `inchannels`: number of input channels.
462+
- `inchannels`: The number of channels in the input. The default value is 3.
463463
- `dropout`: rate of dropout in classifier head.
464464
- `nclasses`: the number of output classes.
465465
466466
!!! warning
467-
467+
468468
`InceptionResNetv2` does not currently support pretrained weights.
469469
"""
470470
struct InceptionResNetv2
@@ -496,7 +496,7 @@ Create an Xception block.
496496
497497
# Arguments
498498
499-
- `inchannels`: number of input channels.
499+
- `inchannels`: The number of channels in the input. The default value is 3.
500500
- `outchannels`: number of output channels.
501501
- `nrepeats`: number of repeats of depthwise separable convolution layers.
502502
- `stride`: stride by which to downsample the input.
@@ -540,7 +540,7 @@ Creates an Xception model.
540540
541541
# Arguments
542542
543-
- `inchannels`: number of input channels.
543+
- `inchannels`: The number of channels in the input. The default value is 3.
544544
- `dropout`: rate of dropout in classifier head.
545545
- `nclasses`: the number of output classes.
546546
"""
@@ -571,7 +571,7 @@ Creates an Xception model.
571571
# Arguments
572572
573573
- `pretrain`: set to `true` to load the pre-trained weights for ImageNet.
574-
- `inchannels`: number of input channels.
574+
- `inchannels`: The number of channels in the input. The default value is 3.
575575
- `dropout`: rate of dropout in classifier head.
576576
- `nclasses`: the number of output classes.
577577

src/convnets/mobilenet.jl

Lines changed: 36 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
mobilenetv1(width_mult, config;
55
activation = relu,
66
inchannels = 3,
7-
nclasses = 1000,
8-
fcsize = 1024)
7+
fcsize = 1024,
8+
nclasses = 1000)
99
1010
Create a MobileNetv1 model ([reference](https://arxiv.org/abs/1704.04861v1)).
1111
@@ -21,23 +21,24 @@ Create a MobileNetv1 model ([reference](https://arxiv.org/abs/1704.04861v1)).
2121
+ `s`: The stride of the convolutional kernel
2222
+ `r`: The number of time this configuration block is repeated
2323
- `activate`: The activation function to use throughout the network
24-
- `inchannels`: The number of input feature maps``
24+
- `inchannels`: The number of input channels. The default value is 3.
2525
- `fcsize`: The intermediate fully-connected size between the convolution and final layers
2626
- `nclasses`: The number of output classes
2727
"""
2828
function mobilenetv1(width_mult, config;
2929
activation = relu,
3030
inchannels = 3,
31-
nclasses = 1000,
32-
fcsize = 1024)
31+
fcsize = 1024,
32+
nclasses = 1000)
3333
layers = []
3434
for (dw, outch, stride, nrepeats) in config
3535
outch = Int(outch * width_mult)
3636
for _ in 1:nrepeats
3737
layer = dw ?
3838
depthwise_sep_conv_bn((3, 3), inchannels, outch, activation;
3939
stride = stride, pad = 1, bias = false) :
40-
conv_bn((3, 3), inchannels, outch, activation; stride = stride, pad = 1)
40+
conv_bn((3, 3), inchannels, outch, activation; stride = stride, pad = 1,
41+
bias = false)
4142
append!(layers, layer)
4243
inchannels = outch
4344
end
@@ -51,7 +52,7 @@ function mobilenetv1(width_mult, config;
5152
end
5253

5354
const mobilenetv1_configs = [
54-
# dw, c, s, r
55+
# dw, c, s, r
5556
(false, 32, 2, 1),
5657
(true, 64, 1, 1),
5758
(true, 128, 2, 1),
@@ -65,7 +66,7 @@ const mobilenetv1_configs = [
6566
]
6667

6768
"""
68-
MobileNetv1(width_mult = 1; pretrain = false, nclasses = 1000)
69+
MobileNetv1(width_mult = 1; inchannels = 3, pretrain = false, nclasses = 1000)
6970
7071
Create a MobileNetv1 model with the baseline configuration
7172
([reference](https://arxiv.org/abs/1704.04861v1)).
@@ -76,6 +77,7 @@ Set `pretrain` to `true` to load the pretrained weights for ImageNet.
7677
- `width_mult`: Controls the number of output feature maps in each block
7778
(with 1.0 being the default in the paper;
7879
this is usually a value between 0.1 and 1.4)
80+
- `inchannels`: The number of input channels. The default value is 3.
7981
- `pretrain`: Whether to load the pre-trained weights for ImageNet
8082
- `nclasses`: The number of output classes
8183
@@ -85,10 +87,10 @@ struct MobileNetv1
8587
layers::Any
8688
end
8789

88-
function MobileNetv1(width_mult::Number = 1; pretrain = false, nclasses = 1000)
89-
layers = mobilenetv1(width_mult, mobilenetv1_configs; nclasses = nclasses)
90+
function MobileNetv1(width_mult::Number = 1; inchannels = 3, pretrain = false,
91+
nclasses = 1000)
92+
layers = mobilenetv1(width_mult, mobilenetv1_configs; inchannels, nclasses)
9093
pretrain && loadpretrain!(layers, string("MobileNetv1"))
91-
9294
return MobileNetv1(layers)
9395
end
9496

@@ -102,7 +104,7 @@ classifier(m::MobileNetv1) = m.layers[2]
102104
# MobileNetv2
103105

104106
"""
105-
mobilenetv2(width_mult, configs; max_width = 1280, nclasses = 1000)
107+
mobilenetv2(width_mult, configs; inchannels = 3, max_width = 1280, nclasses = 1000)
106108
107109
Create a MobileNetv2 model.
108110
([reference](https://arxiv.org/abs/1801.04381)).
@@ -119,14 +121,15 @@ Create a MobileNetv2 model.
119121
+ `n`: The number of times a block is repeated
120122
+ `s`: The stride of the convolutional kernel
121123
+ `a`: The activation function used in the bottleneck layer
124+
- `inchannels`: The number of input channels. The default value is 3.
122125
- `max_width`: The maximum number of feature maps in any layer of the network
123126
- `nclasses`: The number of output classes
124127
"""
125-
function mobilenetv2(width_mult, configs; max_width = 1280, nclasses = 1000)
128+
function mobilenetv2(width_mult, configs; inchannels = 3, max_width = 1280, nclasses = 1000)
126129
# building first layer
127130
inplanes = _round_channels(32 * width_mult, width_mult == 0.1 ? 4 : 8)
128131
layers = []
129-
append!(layers, conv_bn((3, 3), 3, inplanes; stride = 2))
132+
append!(layers, conv_bn((3, 3), inchannels, inplanes; pad = 1, stride = 2))
130133
# building inverted residual blocks
131134
for (t, c, n, s, a) in configs
132135
outplanes = _round_channels(c * width_mult, width_mult == 0.1 ? 4 : 8)
@@ -165,7 +168,7 @@ struct MobileNetv2
165168
end
166169

167170
"""
168-
MobileNetv2(width_mult = 1.0; pretrain = false, nclasses = 1000)
171+
MobileNetv2(width_mult = 1.0; inchannels = 3, pretrain = false, nclasses = 1000)
169172
170173
Create a MobileNetv2 model with the specified configuration.
171174
([reference](https://arxiv.org/abs/1801.04381)).
@@ -176,13 +179,15 @@ Set `pretrain` to `true` to load the pretrained weights for ImageNet.
176179
- `width_mult`: Controls the number of output feature maps in each block
177180
(with 1.0 being the default in the paper;
178181
this is usually a value between 0.1 and 1.4)
182+
- `inchannels`: The number of input channels. The default value is 3.
179183
- `pretrain`: Whether to load the pre-trained weights for ImageNet
180184
- `nclasses`: The number of output classes
181185
182186
See also [`Metalhead.mobilenetv2`](#).
183187
"""
184-
function MobileNetv2(width_mult::Number = 1; pretrain = false, nclasses = 1000)
185-
layers = mobilenetv2(width_mult, mobilenetv2_configs; nclasses = nclasses)
188+
function MobileNetv2(width_mult::Number = 1; inchannels = 3, pretrain = false,
189+
nclasses = 1000)
190+
layers = mobilenetv2(width_mult, mobilenetv2_configs; inchannels, nclasses)
186191
pretrain && loadpretrain!(layers, string("MobileNetv2"))
187192
return MobileNetv2(layers)
188193
end
@@ -197,7 +202,7 @@ classifier(m::MobileNetv2) = m.layers[2]
197202
# MobileNetv3
198203

199204
"""
200-
mobilenetv3(width_mult, configs; max_width = 1024, nclasses = 1000)
205+
mobilenetv3(width_mult, configs; inchannels = 3, max_width = 1024, nclasses = 1000)
201206
202207
Create a MobileNetv3 model.
203208
([reference](https://arxiv.org/abs/1905.02244)).
@@ -216,14 +221,17 @@ Create a MobileNetv3 model.
216221
+ `r::Integer` - The reduction factor (`>= 1` or `nothing` to skip) for squeeze and excite layers
217222
+ `s::Integer` - The stride of the convolutional kernel
218223
+ `a` - The activation function used in the bottleneck (typically `hardswish` or `relu`)
224+
- `inchannels`: The number of input channels. The default value is 3.
219225
- `max_width`: The maximum number of feature maps in any layer of the network
220226
- `nclasses`: the number of output classes
221227
"""
222-
function mobilenetv3(width_mult, configs; max_width = 1024, nclasses = 1000)
228+
function mobilenetv3(width_mult, configs; inchannels = 3, max_width = 1024, nclasses = 1000)
223229
# building first layer
224230
inplanes = _round_channels(16 * width_mult, 8)
225231
layers = []
226-
append!(layers, conv_bn((3, 3), 3, inplanes, hardswish; stride = 2))
232+
append!(layers,
233+
conv_bn((3, 3), inchannels, inplanes, hardswish; pad = 1, stride = 2,
234+
bias = false))
227235
explanes = 0
228236
# building inverted residual blocks
229237
for (k, t, c, r, a, s) in configs
@@ -249,7 +257,7 @@ end
249257

250258
# Configurations for small and large mode for MobileNetv3
251259
mobilenetv3_configs = Dict(:small => [
252-
# k, t, c, SE, a, s
260+
# k, t, c, SE, a, s
253261
(3, 1, 16, 4, relu, 2),
254262
(3, 4.5, 24, nothing, relu, 2),
255263
(3, 3.67, 24, nothing, relu, 1),
@@ -263,7 +271,7 @@ mobilenetv3_configs = Dict(:small => [
263271
(5, 6, 96, 4, hardswish, 1),
264272
],
265273
:large => [
266-
# k, t, c, SE, a, s
274+
# k, t, c, SE, a, s
267275
(3, 1, 16, nothing, relu, 1),
268276
(3, 4, 24, nothing, relu, 2),
269277
(3, 3, 24, nothing, relu, 1),
@@ -287,7 +295,7 @@ struct MobileNetv3
287295
end
288296

289297
"""
290-
MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; pretrain = false, nclasses = 1000)
298+
MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; inchannels = 3, pretrain = false, nclasses = 1000)
291299
292300
Create a MobileNetv3 model with the specified configuration.
293301
([reference](https://arxiv.org/abs/1905.02244)).
@@ -299,17 +307,18 @@ Set `pretrain = true` to load the model with pre-trained weights for ImageNet.
299307
- `width_mult`: Controls the number of output feature maps in each block
300308
(with 1.0 being the default in the paper;
301309
this is usually a value between 0.1 and 1.4)
310+
- `inchannels`: The number of channels in the input. The default value is 3.
302311
- `pretrain`: whether to load the pre-trained weights for ImageNet
303312
- `nclasses`: the number of output classes
304313
305314
See also [`Metalhead.mobilenetv3`](#).
306315
"""
307-
function MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; pretrain = false,
308-
nclasses = 1000)
316+
function MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; inchannels = 3,
317+
pretrain = false, nclasses = 1000)
309318
@assert mode in [:large, :small] "`mode` has to be either :large or :small"
310319
max_width = (mode == :large) ? 1280 : 1024
311-
layers = mobilenetv3(width_mult, mobilenetv3_configs[mode]; max_width = max_width,
312-
nclasses = nclasses)
320+
layers = mobilenetv3(width_mult, mobilenetv3_configs[mode]; inchannels, max_width,
321+
nclasses)
313322
pretrain && loadpretrain!(layers, string("MobileNetv3", mode))
314323
return MobileNetv3(layers)
315324
end

src/convnets/resnext.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ Create a ResNeXt model with specified configuration. Currently supported values
112112
Set `pretrain = true` to load the model with pre-trained weights for ImageNet.
113113
114114
!!! warning
115-
115+
116116
`ResNeXt` does not currently support pretrained weights.
117117
118118
See also [`Metalhead.resnext`](#).

src/layers/embeddings.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ patches.
1111
# Arguments:
1212
1313
- `imsize`: the size of the input image
14-
- `inchannels`: the number of channels in the input image
14+
- `inchannels`: the number of channels in the input. The default value is 3.
1515
- `patch_size`: the size of the patches
1616
- `embedplanes`: the number of channels in the embedding
1717
- `norm_layer`: the normalization layer - by default the identity function but otherwise takes a

src/vit-based/vit.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ Creates a Vision Transformer (ViT) model.
8080
# Arguments
8181
8282
- `mode`: the model configuration, one of
83-
`[:tiny, :small, :base, :large, :huge, :giant, :gigantic]`
83+
`[:tiny, :small, :base, :large, :huge, :giant, :gigantic]`
8484
- `imsize`: image size
8585
- `inchannels`: number of input channels
8686
- `patch_size`: size of the patches

0 commit comments

Comments
 (0)