Skip to content

CodenesShuvankar/AutoEncoder_for_Colorization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Color AutoEncoder Model

Model Diagram

Model Diagram

Key Parameters


  • EPOCHS: Number of training iterations, set to 100.
  • LEARNING_RATE: Learning rate for the optimizer, set to 0.001.
  • DEVICE: Specifies whether the model will run on GPU ('cuda') or CPU.

Model Class: ColorAutoEncoder

This is a custom class for the colorization model, inheriting from nn.Module, the base class for all neural networks in PyTorch.

1. Constructor (__init__)

The constructor defines the layers for both the encoder (downsampling) and decoder (upsampling) parts of the autoencoder, which are organized in a symmetric manner similar to a U-Net architecture.

Downsample (Encoder)

  • The downsampling layers are Conv2d layers with stride 2, reducing the spatial resolution of the image while increasing the number of feature maps.
  • The input is a grayscale image (1 channel), and through successive convolutional layers, the number of channels (filters) increases:
    • First layer: 1 -> 64 filters
    • Second layer: 64 -> 128 filters
    • Third layer: 128 -> 256 filters
    • Fourth layer: 256 -> 512 filters
    • Fifth layer: 512 -> 1024 filters (new deeper layer)

Upsample (Decoder)

  • The decoder mirrors the encoder using ConvTranspose2d (transposed convolutions or deconvolutions), which upsample the feature maps back to the original image size.
  • The layers work as follows:
    • First layer: 1024 -> 512 filters
    • Second layer: 512 -> 256 filters (after concatenating with the corresponding downsampled layer)
    • Third layer: 256 -> 128 filters
    • Fourth layer: 128 -> 64 filters
    • Fifth layer: 64 -> 3 filters (RGB color image output)

Activation Functions

  • ReLU is used as the activation function for all layers except the final layer.
  • The final layer uses Sigmoid to output pixel values between 0 and 1 for each of the three RGB channels (producing a colored image).

2. Forward Pass (forward)

The forward method defines how the input image flows through the network.

Encoder

The input image (grayscale) is passed through the downsampling layers, progressively reducing its size and increasing the depth of feature maps. The intermediate feature maps are stored as d1, d2, d3, d4, and d5.

Decoder

The decoder upsamples the feature maps back to the original image size. Each upsampling step is concatenated with the corresponding feature map from the encoder (skip connections), which is a key characteristic of the U-Net architecture.

Output

The final output is a color image (with 3 channels) that is the same spatial size as the input image.

Optimizer and Loss Function

  • Optimizer (Adam): An Adam optimizer is used for training the model, adjusting the model parameters based on the gradients and the learning rate.
  • Loss Function (MSELoss): Mean Squared Error (MSE) is used as the loss function to compare the generated color image with the target color image.

Overall Workflow

  1. The grayscale image goes through the encoder to extract high-level features.
  2. These features are progressively upsampled back to the original resolution in the decoder while using skip connections to combine fine details from the encoder layers.
  3. The final output is a colorized image in RGB format (3 channels). The sigmoid activation ensures the pixel values lie between 0 and 1.
  4. The model is trained using backpropagation with the Adam optimizer and MSE loss.

User Interface

User Interface

Output Example

Output Example

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages