AvatarGAN — Generate Cartoon Images using GAN

Have you ever wondered how to generate bitmoji that doesn’t belong to any human face? Check out how GAN aces in creating those images.

Aakash Jhawar
Towards Data Science

--

Most of us have created our own customized bitmoji and used them across different social media apps. Those bitmoji are personalized for a particular user. But have you ever wondered how to generate bitmoji that doesn't belong to any human face? Well, let's explore how GANs do the job for us.

Prediction of AvatarGAN — Image by Author

Generative Adversarial Networks are one of the most interesting ideas in computer science today. GANs can generate images from garbage datasets. GANs were developed by Ian J. Goodfellow in 2014. It consists of two neural networks which compete with each other to become more accurate in their prediction.

The generative model finds out the features in the input data and is able to analyze, capture and copy the variations within the dataset and generate new images that look similar to that of the input set in an unsupervised fashion. For instance, GANs can create images that resemble photos of human faces.

Source: https://www.thispersondoesnotexist.com/

All these images generated by GAN have a common pattern. Every face's eyes lies in the same coordinates. The background is just a blurred random texture. If there are multiple faces, the shape of the second face is very distorted.

Before we get our hands dirty while diving into the training part, let's understand how GANs work.

Generative as the name suggests tries to generate fake images that look like real images. It learns the probability of features X. The generator takes noise (random features) as input noise.

The discriminator is a binary classifier and tries to discriminate the real images and the images created by the generator. It learns the probability of class Y (real or fake) given features X. The probabilities are the feedback for the generator.

Generator learns to make fakes that look real. Discriminator learns to distinguish real from fake.

Steps involved in training GANs:

  1. Define Generator and Discriminator network architecture
  2. Train the Generator model to generate the fake data that can fool Discriminator
  3. Train the Discriminator model to distinguish real vs fake data
  4. Continue the training for several epochs and save the Generator model

In essence, we take the random noise, pass it through Generator. The generator generates the fake image. This output image is passed to the Discriminator along with a stream of images from the real image dataset. Both real and fake images are given to the Discriminator which returns the probability of authenticity of the image. Then we compute the cost function from the Discriminator output and update both model’s weights.

Noise → Generator → Features → Discriminator → Output → Cost (output)

GAN Architecture — Image by Author

Training of GAN

Now that we have gone through the basics of GAN, it's time to do the heavy lifting and train the model.

GIF via Giphy

1. Dataset

We will train our GAN on Cartoon Set, a collection of random 2 dimension cartoon avatar images. The cartoons vary in 10 artwork categories, 4 colour categories, and 4 proportion categories, so we have a lot of possible combinations. We will use the dataset with 100,000 randomly chosen cartoon images.

The next step is to read all the images. Since we have a lot of images to read and process, this task can take a while. So we will read all the images, convert them to JPG format, resize it, normalize it and store the preprocessed image as a binary file. It is more efficient to perform these series of steps only once. This way we can simply read the processed image data and quickly use it. We will create a Numpy array of all the images and save it as a .npy file. We are using Numpy binary instead of Pickle because the file is very large and may cause problems with some versions of Pickle.

Now, to hold the images in memory we will use TensorFlow tf.data.Dataset. Dataset object is used to write descriptive and efficient input pipelines. The iteration happens in a streaming fashion, so the full dataset does not need to fit into memory.

2. Build the models

Both the models are using Keras Sequential class.

Generator

The generator needs upsampling layers to generate an image from noise i.e., seed. We can use UpSampling2D() and Conv2DTranspose() for upsampling.

UpSampling2D is just a simple scaling up of the image matrix by using some upsampling techniques. Mostly we use the nearest neighbour or bilinear upsampling. So no machine is learning here. The benefit of UpSampling2D is it’s cheap. Whereas, Conv2DTranspose layer is a convolution operation and learns several filters similar to the regular Conv2D layer. The transpose layer simply swaps the backward and forward pass, keeping the rest of the operations the same. Conv2DTranspose will also upsample its input but the key difference is that the model should learn what are the best upsampling features for the job.

The first layer is a Dense layer whose input is the seed noise. Then we upsample it multiple times until the size is 28x28x1. We will use LeakyReLU activation function through the net for Generator, and for the last layer, we will use tanh.

Source: DCGAN

Let’s try to plot the image generated by Generator Neural Network.

Image generated by Generator from random noise before training — Image by Author

Discriminator

Discriminator network is a simple Convolution Neural Network image classifier.

Let’s check out the output of our Discriminator model.

Output: tf.Tensor([0.50059265]], shape=(1, 1), dtype=float32)

It returns the probability score.

3. Loss Function

We will use the Binary Cross-Entropy loss function. BCE cost function has two parts, one relevant for each class. The value is close to zero when the label and prediction are similar but approaches infinity when the label and the prediction are different.

Binary Cross-Entropy loss function — Image by Author

Let’s break down the equation and analyze each part.

Image by Author
Image by Author

Discriminator loss quantifies how well the discriminator model can distinguish real and fake images. It compares the discriminator’s prediction on real images to an array of 1s, and the discriminator’s prediction on fake images to an array of 0s.

Generator loss quantifies how well it was able to trick the discriminator. Intuitively, if the generator is performing well, the discriminator will classify the fake images as real (or 1). Here, we will compare the discriminator's decision on the generated images to an array of 1s.

Both Generator and Discriminator model uses Adam optimizer and the same learning rate and momentum. Their optimizers are different since we are training two different networks separately.

4. Training Pipeline

Now that we have defined the major components of the training pipeline, let's move to the training section. The following function is where the magic happens.

Notice the use of tf.function annotation. This percompiles the function and improves the performance.

The two neural networks must be trained independently in two separate passes. Because of which we have defined two separate loss functions and separate updates for the gradients. During backpropagation for the discriminator model, it is necessary that the discriminator’s gradient is only applied to reduce the discriminator’s loss, and only the weights for this model get updated. The model will not learn if the generative model’s weights are also updated at the same time.

5. Train the model

The training dataset should be normalized. An equal number of samples for both the class is a must. For the discriminator training set, the input images will be x and y containing the value of 1 for real images and 0 for generated ones. Whereas for the generator training set, x contains the random noise (seed) and y is always 1. Here the aim of the generator is to generate such good images that the discriminator gets fooled and assigns them a probability close to 1.

Now since we have everything in place, let's start the training.

GIF via Giphy
train(train_dataset, EPOCHS)

Check out the model being trained to generate cartoon images.

Image by Author

A pat on the back! Our model is finally trained and it's time to save it so that we can use it in the future.

generator.save(os.path.join(DATA_PATH, "face_generator.h5"))

Applications of GAN

Now that we know the functioning of GAN, it’s time to check the fascinating applications of it. There is a plethora of usage of GAN regularly published in research.

Image-to-image Translation

With the help of GANs, we can perform the translation of photos. Phillip Isola in the paper demonstrated the pix2pix approach for many image-to-image translation tasks. For example, using GANs we can convert horse images into zebra, create colour photographs from sketches, colour black and white images, and the list goes on.

Source: CycleGAN

GANs for Security

With the rise of AI, the risk for fraud and cyber threats also increased. A huge amount of confidential information can be leaked by cyber threats. GANs can be used to prevent “adversarial attacks”. These adversarial attacks use a variety of techniques to fool deep learning architectures. GAN can create more such fake examples and we can easily flag them off by training the model on the fake generated examples.

Source: SSGAN

SSGAN is used to perform Steganalysis and detect hidden encodings in images that ideally should not be there. GANs can also be used to generate synthetic data for supervision.

Photo Inpainting

GANs can be used to perform photograph inpainting or spot filling i.e., to fill the missing area of the photograph that was removed or got destroyed for some reason. The paper Context Encoders: Feature Learning by Inpainting has described the use of Context encoders to perform the photo inpainting.

Source: Generative Image Inpainting with Contextual Attention

GAN for 3D Object Generation

Jiajun Wu has proposed a GAN that can be used to generate three-dimensional objects like a gun, chair, car, sofa, and table.

Source: Learning a Probabilistic Latent Space of Object Shapes
via 3D Generative-Adversarial Modeling

Conclusion

The generator’s goal is to fool the discriminator whereas the discriminator tries to distinguish between real and fake. Both the models learn from the competition with each other. And in the end, fake looks real. The idea of generating data opens a new potential but unfortunately great dangers too.

If you’ve hung in this long… thanks! I hope this has been a learning experience for you. If you enjoyed this article, share it with your friends and colleagues. Drop me a note if you find it useful or have any follow-up questions.

For the lazy among you who have skipped reading or performing the tutorial yourselves, here’s a link to the source code.

--

--