- EPOCHS: Number of training iterations, set to 100.
- LEARNING_RATE: Learning rate for the optimizer, set to 0.001.
- DEVICE: Specifies whether the model will run on GPU (
'cuda'
) or CPU.
This is a custom class for the colorization model, inheriting from nn.Module
, the base class for all neural networks in PyTorch.
The constructor defines the layers for both the encoder (downsampling) and decoder (upsampling) parts of the autoencoder, which are organized in a symmetric manner similar to a U-Net architecture.
- The downsampling layers are
Conv2d
layers with stride 2, reducing the spatial resolution of the image while increasing the number of feature maps. - The input is a grayscale image (1 channel), and through successive convolutional layers, the number of channels (filters) increases:
- First layer: 1 -> 64 filters
- Second layer: 64 -> 128 filters
- Third layer: 128 -> 256 filters
- Fourth layer: 256 -> 512 filters
- Fifth layer: 512 -> 1024 filters (new deeper layer)
- The decoder mirrors the encoder using
ConvTranspose2d
(transposed convolutions or deconvolutions), which upsample the feature maps back to the original image size. - The layers work as follows:
- First layer: 1024 -> 512 filters
- Second layer: 512 -> 256 filters (after concatenating with the corresponding downsampled layer)
- Third layer: 256 -> 128 filters
- Fourth layer: 128 -> 64 filters
- Fifth layer: 64 -> 3 filters (RGB color image output)
ReLU
is used as the activation function for all layers except the final layer.- The final layer uses
Sigmoid
to output pixel values between 0 and 1 for each of the three RGB channels (producing a colored image).
The forward
method defines how the input image flows through the network.
The input image (grayscale) is passed through the downsampling layers, progressively reducing its size and increasing the depth of feature maps. The intermediate feature maps are stored as d1
, d2
, d3
, d4
, and d5
.
The decoder upsamples the feature maps back to the original image size. Each upsampling step is concatenated with the corresponding feature map from the encoder (skip connections), which is a key characteristic of the U-Net architecture.
The final output is a color image (with 3 channels) that is the same spatial size as the input image.
- Optimizer (
Adam
): An Adam optimizer is used for training the model, adjusting the model parameters based on the gradients and the learning rate. - Loss Function (
MSELoss
): Mean Squared Error (MSE) is used as the loss function to compare the generated color image with the target color image.
- The grayscale image goes through the encoder to extract high-level features.
- These features are progressively upsampled back to the original resolution in the decoder while using skip connections to combine fine details from the encoder layers.
- The final output is a colorized image in RGB format (3 channels). The
sigmoid
activation ensures the pixel values lie between 0 and 1. - The model is trained using backpropagation with the Adam optimizer and MSE loss.