Skip to content

Attempt to add color to Auto-Encoder #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 28 additions & 7 deletions AutoEncoder_TrainingData/AutoEncoder_TrainingData.pde
Original file line number Diff line number Diff line change
@@ -1,31 +1,52 @@
int counter = 0;
int W = 28;


void setup() {
size(280, 280);

}

void draw() {
translate(width/2, height/2);
background(255);
//frameRate(5);
float r = random(0, width);
strokeWeight(16);
rectMode(CENTER);
//float x = random(width);
//float y = random(height);
//star(0, 0, r/5, r/2, 6);
//star(width/2, height/2, r, r, 4);
if (random(1) < 0.5) {
square(width/2, width/2, r);
stroke(255,0,255);
star(0, 0, r/5, r/2, 6);
//square(width/2, height/2, r);
} else {
circle(width/2, width/2, r);
stroke(0,0,255);
circle(0, 0, r);
}

PImage img = get();
//img.resize(28, 28);
img.resize(W, W);
img.save("data/square" + nf(counter, 4) + ".png");
//img.save("Auto-Encoder-Demo/public/data/shape" + nf(counter, 4) + ".png");
counter++;
if (counter == 5100) {
if (counter == 1100) {
exit();
}
// noLoop();
//noLoop();
}
//star function taken from Processing website
function star(float x, float y, float radius1, float radius2, int npoints) {
float angle = TWO_PI / npoints;
float halfAngle = angle/2.0;
beginShape();
for (float a = 0; a < TWO_PI; a += angle) {
float sx = x + cos(a) * radius2;
float sy = y + sin(a) * radius2;
vertex(sx, sy);
sx = x + cos(a+halfAngle) * radius1;
sy = y + sin(a+halfAngle) * radius1;
vertex(sx, sy);
}
endShape(CLOSE);
}
109 changes: 109 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,112 @@ Exploring autoencoders with p5 and tensorflow js
## Related topics
* [Coding train tensorflow.js playlist](https://www.youtube.com/playlist?list=PLRqwX-V7Uu6YIeVA3dNxbR9PYj4wV31oQ)
* [Coding train neural network playlist](https://www.youtube.com/playlist?list=PLRqwX-V7Uu6aCibgK1PTWWu9by6XFdCfh)

## Introduction

This definition of machine learning, adapted from "Deep Learning with Javascript: Neural Networks in Tensorflow.js", encapsulates the purpose of the autoencoder nicely: "finding an appropriate transformation that turns the old representation of the input data into a new one". In this case, the data is the pixel values in a (28*28 or 56*56) image of randomly sized circles or aquares generated in Processing. The role of the autoencoder is to develop a model which will learn how to produce new images based on a new input (a new shape, random noise, etc.). The model consists of weights that will be applied to each neuron (pixel) in each layer. Starting at the final layer, the weights are determined by calculating the difference between the predicted pixel array and the actual pixel array (error), for each image in a random batch of 32 images in the training data. Epochs specify how many times this process is repeated. This calculation is repeated in backwards fashion for each layer of the autoencoder.

Once the weights are calculated, they are stored in a json file in the model folder. Given the model weights, the model is then tested by imputing the test images. Once the model is built, the weights are frozen, and the decoder model can be used to regerate new images based on the input fed to the decoder. This input could be images of circles or squares (with or without noise) or pure random noise. A practical example of this is an activity tracker.

Key concepts:

1. Neural Network: composed of an input layer, one or more hidden layers, and an output layer. Our input layer is the array containing the normalized r values for each pixel

2. Sequential model: the layers are architacked in a linear sequence.

``const autoencoder = tf.sequential();``

3. Fullly-Connected or Dense Layer: defined as "deeply connected" to the preceding layer (ie. every neuron of the layer is connected to every neuron in its preceding layer.) The input to a dense layer must be a 1d array.

`` autoencoder.add(tf.layers.dense{}));``

3. Convolutional Layer: this algorithm takes subsets of the tensor (based on the size of the kernal). Only the nodes within the subsample are fully connected.

``autoencoder.add(tf.layers.conv2d({}));``

4. Pooling layer, which downsamples the nodes, is often used is conjuction with the convolutional layer

A convolutional layer and pooling layer used together are referred to as a convolutional block.

``autoencoder.add(tf.layers.maxPooling2d({poolSize: 2, strides: 2}));``

If your images are grey scale (and therefore you are only using the r channel), it is OK to start with dense layers. However, if you want to include 3 channels, you need to use a convolutional network. Because the images each have an input shape of W*W*3, the number of neurons gets very large as the resolution of the image increases.


4. Gradient Descent: A really good analogy is descending a steep mountain trail in the dark, when you only have the narrow beam of a flashlight to guide you. You will make small steps trying to get to the bottom of the mountain, taken from "Make Your Own Neural Network". Each small step is analogous to the learning rate. If there is only one mountain peak, given enough time you will eventually reach the bottom. However, if there are many peaks with valleys, it is possible you will not reach the local value but not reach the base of the mountain. For this reason, it is a best practice to train on small groups of samples (batches) and shuffle the samples.

5. Normalization: The inputs into the last layer of the model are passed throught the sigmoid activation function, which is bounded between -1 and 1, therefore we divide by 255:

``rawData[n*c+cidx] = p[n*c+cidx] / 255.0;``

## Key Model Hyperparameters for neural network

1. Activation Function The activation function is a nonlinear, differentiable function that is used to improve the efficiency of the model(relu, sigmoid) over plain linear regression.
2. Learning Rate: the rate at which the weights are adjusted.
3. Optimizer: adam
4. Kernal size: subset of the image tensor
5. Input Shape: number of nodes going into each layer
6. Units: number of nodes coming out of each layer.
7. Accuracy: perceptange of correct predictions
Note: can be misleading which is why loss is used to monitor how well the model is doing
8. Loss: "MeanSquaredError", a measure of the difference between the actual image and predicted image.
9. Drop out rate - percentage of inputs that are dropped from a layer to add randomness to the training and help prevent overfitting

## Hyperparameters that control output size (units) of convolutional blocks

1. depth: corresponds the the number of filters (can be thought of as separate parameters, i.e. shape, radius, colors)
2. stride: number of pixels by which the filter is moved across the array to sample
3. zero-padding: do you want to pad with zeros around the outside of the tensor. This is helpful if you want to ensure that you are capturing the information at the perimeter of the tensor.
4. The formula to determine output size of convolutional layer, W1*W1*D1, where W1 = (W - F + 2 * P)/S + 1

W1 = width of image
F = number of filters
P = amount of zero padding
S = strides
+1 accounts for the bias
D = depth (# of colors)

5. The formula to determine the output shape of the pooling layer is W2*W2*D, where

W2 = (W1 - F)/S + 1

A detailed explanation of these formulas and a very nice animation can be found here at https://cs231n.github.io/convolutional-networks/#conv

## Training with grey scale versus color images

Each image is a 3D tensor with (height-width-channel). A batch of images is a 4D tensor, where the additional dimension is the number of images (NHWC).

--If you are training/testing grey scale images, declare 2d tensors.

``const x_train = tf.tensor2d(images.slice(0, 1000), [ 1, W, W, c ]);``

--If you are training/testing color images, declare 4d tensors.

``const x_train = tf.tensor4d(images.slice(1000), [ 1, W , W, c ]);``

## p5.js implementation

createGraphics(w, h, [renderer]) creates an off-screen graphics buffer. (In example in p5.js appearance of object changes.

## Libraries

1. tensorflow.js

1. Jimp: used to read in pixel data from the images created in Processing/P5JS and write new images to a file.

2. Numeral: used to efficiently refer to each image during the read/write process


## Important

1. Each time the program is run, tensors are created. You need to make sure that you are disposing of them using tidy() or your computer memory will fill up with tensors and slow down your computer.

2. Each layer of stacking convolutional blocks will reduce the number of nodes. It is possible to go too far (especially if you start with a small resolution) and you will get a somewhat cryptic error that you have passed a null tensor. The output of the last convolutional layer needs to be [1,1, number of filters], ie you must pass a 1d array to the first dense layer.

## Additional Reading

Deep Learning with Javascript: Neural Networks in Tensorflow.js, Shanging Cai, Stanley Bileschi, Eric Nielsen, Fransois Chollet

Make Your Own Neural Network, Tariq Rashid

Understanding and implementing a fully convolutional network (FCN), Himanshu Rawiani, https://towardsdatascience.com/implementing-a-fully-convolutional-network-fcn-in-tensorflow-2-3c46fb61de3b
Loading