Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "Metalhead"
uuid = "dbeba491-748d-5e0e-a39e-b530a07fa0cc"
version = "0.7.3-DEV"
version = "0.7.3"

[deps]
Artifacts = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"
Expand Down
4 changes: 2 additions & 2 deletions src/convnets/convmixer.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Creates a ConvMixer model.

- `planes`: number of planes in the output of each block
- `depth`: number of layers
- `inchannels`: number of channels in the input
- `inchannels`: The number of channels in the input. The default value is 3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to specify this? It's already in the docstring above, and it seems inconsistent with the other bullets (either re-specify all + capitalize all or leave it as is).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on making this happen for all the docstrings in the new PRs (that is, having the default values in there). I need to push a larger documentation change sometime in the next few weeks and that should probably cover everything

- `kernel_size`: kernel size of the convolutional layers
- `patch_size`: size of the patches
- `activation`: activation function used after the convolutional layers
Expand Down Expand Up @@ -45,7 +45,7 @@ Creates a ConvMixer model.
# Arguments

- `mode`: the mode of the model, either `:base`, `:small` or `:large`
- `inchannels`: number of channels in the input
- `inchannels`: The number of channels in the input. The default value is 3.
- `activation`: activation function used after the convolutional layers
- `nclasses`: number of classes in the output
"""
Expand Down
6 changes: 3 additions & 3 deletions src/convnets/convnext.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ Creates the layers for a ConvNeXt model.
- `depths`: list with configuration for depth of each block
- `planes`: list with configuration for number of output channels in each block
- `drop_path_rate`: Stochastic depth rate.
- `λ`: Initial value for [`LayerScale`](#)
([reference](https://arxiv.org/abs/2103.17239))
- `λ`: Initial value for [`LayerScale`](#)
([reference](https://arxiv.org/abs/2103.17239))
- `nclasses`: number of output classes
"""
function convnext(depths, planes; inchannels = 3, drop_path_rate = 0.0, λ = 1.0f-6,
Expand Down Expand Up @@ -92,7 +92,7 @@ Creates a ConvNeXt model.

# Arguments:

- `inchannels`: number of input channels.
- `inchannels`: The number of channels in the input. The default value is 3.
- `drop_path_rate`: Stochastic depth rate.
- `λ`: Init value for [LayerScale](https://arxiv.org/abs/2103.17239)
- `nclasses`: number of output classes
Expand Down
14 changes: 7 additions & 7 deletions src/convnets/inception.jl
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ Creates an Inceptionv4 model.
# Arguments

- `pretrain`: set to `true` to load the pre-trained weights for ImageNet
- `inchannels`: number of input channels.
- `inchannels`: The number of channels in the input. The default value is 3.
- `dropout`: rate of dropout in classifier head.
- `nclasses`: the number of output classes.

Expand Down Expand Up @@ -426,7 +426,7 @@ Creates an InceptionResNetv2 model.

# Arguments

- `inchannels`: number of input channels.
- `inchannels`: The number of channels in the input. The default value is 3.
- `dropout`: rate of dropout in classifier head.
- `nclasses`: the number of output classes.
"""
Expand Down Expand Up @@ -459,12 +459,12 @@ Creates an InceptionResNetv2 model.
# Arguments

- `pretrain`: set to `true` to load the pre-trained weights for ImageNet
- `inchannels`: number of input channels.
- `inchannels`: The number of channels in the input. The default value is 3.
- `dropout`: rate of dropout in classifier head.
- `nclasses`: the number of output classes.

!!! warning

`InceptionResNetv2` does not currently support pretrained weights.
"""
struct InceptionResNetv2
Expand Down Expand Up @@ -496,7 +496,7 @@ Create an Xception block.

# Arguments

- `inchannels`: number of input channels.
- `inchannels`: The number of channels in the input. The default value is 3.
- `outchannels`: number of output channels.
- `nrepeats`: number of repeats of depthwise separable convolution layers.
- `stride`: stride by which to downsample the input.
Expand Down Expand Up @@ -540,7 +540,7 @@ Creates an Xception model.

# Arguments

- `inchannels`: number of input channels.
- `inchannels`: The number of channels in the input. The default value is 3.
- `dropout`: rate of dropout in classifier head.
- `nclasses`: the number of output classes.
"""
Expand Down Expand Up @@ -571,7 +571,7 @@ Creates an Xception model.
# Arguments

- `pretrain`: set to `true` to load the pre-trained weights for ImageNet.
- `inchannels`: number of input channels.
- `inchannels`: The number of channels in the input. The default value is 3.
- `dropout`: rate of dropout in classifier head.
- `nclasses`: the number of output classes.

Expand Down
63 changes: 36 additions & 27 deletions src/convnets/mobilenet.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
mobilenetv1(width_mult, config;
activation = relu,
inchannels = 3,
nclasses = 1000,
fcsize = 1024)
fcsize = 1024,
nclasses = 1000)

Create a MobileNetv1 model ([reference](https://arxiv.org/abs/1704.04861v1)).

Expand All @@ -21,23 +21,24 @@ Create a MobileNetv1 model ([reference](https://arxiv.org/abs/1704.04861v1)).
+ `s`: The stride of the convolutional kernel
+ `r`: The number of time this configuration block is repeated
- `activate`: The activation function to use throughout the network
- `inchannels`: The number of input feature maps``
- `inchannels`: The number of input channels. The default value is 3.
- `fcsize`: The intermediate fully-connected size between the convolution and final layers
- `nclasses`: The number of output classes
"""
function mobilenetv1(width_mult, config;
activation = relu,
inchannels = 3,
nclasses = 1000,
fcsize = 1024)
fcsize = 1024,
nclasses = 1000)
layers = []
for (dw, outch, stride, nrepeats) in config
outch = Int(outch * width_mult)
for _ in 1:nrepeats
layer = dw ?
depthwise_sep_conv_bn((3, 3), inchannels, outch, activation;
stride = stride, pad = 1, bias = false) :
conv_bn((3, 3), inchannels, outch, activation; stride = stride, pad = 1)
conv_bn((3, 3), inchannels, outch, activation; stride = stride, pad = 1,
bias = false)
append!(layers, layer)
inchannels = outch
end
Expand All @@ -51,7 +52,7 @@ function mobilenetv1(width_mult, config;
end

const mobilenetv1_configs = [
# dw, c, s, r
# dw, c, s, r
(false, 32, 2, 1),
(true, 64, 1, 1),
(true, 128, 2, 1),
Expand All @@ -65,7 +66,7 @@ const mobilenetv1_configs = [
]

"""
MobileNetv1(width_mult = 1; pretrain = false, nclasses = 1000)
MobileNetv1(width_mult = 1; inchannels = 3, pretrain = false, nclasses = 1000)

Create a MobileNetv1 model with the baseline configuration
([reference](https://arxiv.org/abs/1704.04861v1)).
Expand All @@ -76,6 +77,7 @@ Set `pretrain` to `true` to load the pretrained weights for ImageNet.
- `width_mult`: Controls the number of output feature maps in each block
(with 1.0 being the default in the paper;
this is usually a value between 0.1 and 1.4)
- `inchannels`: The number of input channels. The default value is 3.
- `pretrain`: Whether to load the pre-trained weights for ImageNet
- `nclasses`: The number of output classes

Expand All @@ -85,10 +87,10 @@ struct MobileNetv1
layers::Any
end

function MobileNetv1(width_mult::Number = 1; pretrain = false, nclasses = 1000)
layers = mobilenetv1(width_mult, mobilenetv1_configs; nclasses = nclasses)
function MobileNetv1(width_mult::Number = 1; inchannels = 3, pretrain = false,
nclasses = 1000)
layers = mobilenetv1(width_mult, mobilenetv1_configs; inchannels, nclasses)
pretrain && loadpretrain!(layers, string("MobileNetv1"))

return MobileNetv1(layers)
end

Expand All @@ -102,7 +104,7 @@ classifier(m::MobileNetv1) = m.layers[2]
# MobileNetv2

"""
mobilenetv2(width_mult, configs; max_width = 1280, nclasses = 1000)
mobilenetv2(width_mult, configs; inchannels = 3, max_width = 1280, nclasses = 1000)

Create a MobileNetv2 model.
([reference](https://arxiv.org/abs/1801.04381)).
Expand All @@ -119,14 +121,15 @@ Create a MobileNetv2 model.
+ `n`: The number of times a block is repeated
+ `s`: The stride of the convolutional kernel
+ `a`: The activation function used in the bottleneck layer
- `inchannels`: The number of input channels. The default value is 3.
- `max_width`: The maximum number of feature maps in any layer of the network
- `nclasses`: The number of output classes
"""
function mobilenetv2(width_mult, configs; max_width = 1280, nclasses = 1000)
function mobilenetv2(width_mult, configs; inchannels = 3, max_width = 1280, nclasses = 1000)
# building first layer
inplanes = _round_channels(32 * width_mult, width_mult == 0.1 ? 4 : 8)
layers = []
append!(layers, conv_bn((3, 3), 3, inplanes; stride = 2))
append!(layers, conv_bn((3, 3), inchannels, inplanes; pad = 1, stride = 2))
# building inverted residual blocks
for (t, c, n, s, a) in configs
outplanes = _round_channels(c * width_mult, width_mult == 0.1 ? 4 : 8)
Expand Down Expand Up @@ -165,7 +168,7 @@ struct MobileNetv2
end

"""
MobileNetv2(width_mult = 1.0; pretrain = false, nclasses = 1000)
MobileNetv2(width_mult = 1.0; inchannels = 3, pretrain = false, nclasses = 1000)

Create a MobileNetv2 model with the specified configuration.
([reference](https://arxiv.org/abs/1801.04381)).
Expand All @@ -176,13 +179,15 @@ Set `pretrain` to `true` to load the pretrained weights for ImageNet.
- `width_mult`: Controls the number of output feature maps in each block
(with 1.0 being the default in the paper;
this is usually a value between 0.1 and 1.4)
- `inchannels`: The number of input channels. The default value is 3.
- `pretrain`: Whether to load the pre-trained weights for ImageNet
- `nclasses`: The number of output classes

See also [`Metalhead.mobilenetv2`](#).
"""
function MobileNetv2(width_mult::Number = 1; pretrain = false, nclasses = 1000)
layers = mobilenetv2(width_mult, mobilenetv2_configs; nclasses = nclasses)
function MobileNetv2(width_mult::Number = 1; inchannels = 3, pretrain = false,
nclasses = 1000)
layers = mobilenetv2(width_mult, mobilenetv2_configs; inchannels, nclasses)
pretrain && loadpretrain!(layers, string("MobileNetv2"))
return MobileNetv2(layers)
end
Expand All @@ -197,7 +202,7 @@ classifier(m::MobileNetv2) = m.layers[2]
# MobileNetv3

"""
mobilenetv3(width_mult, configs; max_width = 1024, nclasses = 1000)
mobilenetv3(width_mult, configs; inchannels = 3, max_width = 1024, nclasses = 1000)

Create a MobileNetv3 model.
([reference](https://arxiv.org/abs/1905.02244)).
Expand All @@ -216,14 +221,17 @@ Create a MobileNetv3 model.
+ `r::Integer` - The reduction factor (`>= 1` or `nothing` to skip) for squeeze and excite layers
+ `s::Integer` - The stride of the convolutional kernel
+ `a` - The activation function used in the bottleneck (typically `hardswish` or `relu`)
- `inchannels`: The number of input channels. The default value is 3.
- `max_width`: The maximum number of feature maps in any layer of the network
- `nclasses`: the number of output classes
"""
function mobilenetv3(width_mult, configs; max_width = 1024, nclasses = 1000)
function mobilenetv3(width_mult, configs; inchannels = 3, max_width = 1024, nclasses = 1000)
# building first layer
inplanes = _round_channels(16 * width_mult, 8)
layers = []
append!(layers, conv_bn((3, 3), 3, inplanes, hardswish; stride = 2))
append!(layers,
conv_bn((3, 3), inchannels, inplanes, hardswish; pad = 1, stride = 2,
bias = false))
explanes = 0
# building inverted residual blocks
for (k, t, c, r, a, s) in configs
Expand All @@ -249,7 +257,7 @@ end

# Configurations for small and large mode for MobileNetv3
mobilenetv3_configs = Dict(:small => [
# k, t, c, SE, a, s
# k, t, c, SE, a, s
(3, 1, 16, 4, relu, 2),
(3, 4.5, 24, nothing, relu, 2),
(3, 3.67, 24, nothing, relu, 1),
Expand All @@ -263,7 +271,7 @@ mobilenetv3_configs = Dict(:small => [
(5, 6, 96, 4, hardswish, 1),
],
:large => [
# k, t, c, SE, a, s
# k, t, c, SE, a, s
(3, 1, 16, nothing, relu, 1),
(3, 4, 24, nothing, relu, 2),
(3, 3, 24, nothing, relu, 1),
Expand All @@ -287,7 +295,7 @@ struct MobileNetv3
end

"""
MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; pretrain = false, nclasses = 1000)
MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; inchannels = 3, pretrain = false, nclasses = 1000)

Create a MobileNetv3 model with the specified configuration.
([reference](https://arxiv.org/abs/1905.02244)).
Expand All @@ -299,17 +307,18 @@ Set `pretrain = true` to load the model with pre-trained weights for ImageNet.
- `width_mult`: Controls the number of output feature maps in each block
(with 1.0 being the default in the paper;
this is usually a value between 0.1 and 1.4)
- `inchannels`: The number of channels in the input. The default value is 3.
- `pretrain`: whether to load the pre-trained weights for ImageNet
- `nclasses`: the number of output classes

See also [`Metalhead.mobilenetv3`](#).
"""
function MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; pretrain = false,
nclasses = 1000)
function MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; inchannels = 3,
pretrain = false, nclasses = 1000)
@assert mode in [:large, :small] "`mode` has to be either :large or :small"
max_width = (mode == :large) ? 1280 : 1024
layers = mobilenetv3(width_mult, mobilenetv3_configs[mode]; max_width = max_width,
nclasses = nclasses)
layers = mobilenetv3(width_mult, mobilenetv3_configs[mode]; inchannels, max_width,
nclasses)
pretrain && loadpretrain!(layers, string("MobileNetv3", mode))
return MobileNetv3(layers)
end
Expand Down
2 changes: 1 addition & 1 deletion src/convnets/resnext.jl
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ Create a ResNeXt model with specified configuration. Currently supported values
Set `pretrain = true` to load the model with pre-trained weights for ImageNet.

!!! warning

`ResNeXt` does not currently support pretrained weights.

See also [`Metalhead.resnext`](#).
Expand Down
2 changes: 1 addition & 1 deletion src/layers/embeddings.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ patches.
# Arguments:

- `imsize`: the size of the input image
- `inchannels`: the number of channels in the input image
- `inchannels`: the number of channels in the input. The default value is 3.
- `patch_size`: the size of the patches
- `embedplanes`: the number of channels in the embedding
- `norm_layer`: the normalization layer - by default the identity function but otherwise takes a
Expand Down
2 changes: 1 addition & 1 deletion src/vit-based/vit.jl
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Creates a Vision Transformer (ViT) model.
# Arguments

- `mode`: the model configuration, one of
`[:tiny, :small, :base, :large, :huge, :giant, :gigantic]`
`[:tiny, :small, :base, :large, :huge, :giant, :gigantic]`
- `imsize`: image size
- `inchannels`: number of input channels
- `patch_size`: size of the patches
Expand Down