Generate novel artistic images using neural style transfer algorithm
Neural Style Transfer is an algorithm that given a content image C and a style image S can generate novel artistic image
Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. The idea of using a network trained on a different task and applying it to a new task is called transfer learning.
Following the original NST paper, I have used the VGG network. Specifically, VGG-19, a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and thus has learned to recognize a variety of low level features (at the earlier layers) and high level features (at the deeper layers)
Most of the algorithms optimize a cost function to get a set of parameter values. In NST, optimize a cost function to get pixel values!
Building the content cost function
“Generated” image G should have similar content as the input image C. The most visually pleasing results will be generated if a layer is chosen in the middle of the network—neither too shallow nor too deep. Set the image C as the input to the pretrained VGG network, and run forward propagation. a(C) be the hidden layer activation in the layer you had chosen. Set G as the input, and run forward propagation. a(G) be the corresponding hidden layer activation. The cost function will be:
When minimizing the content cost later, it helps make sure G has similar content as C
Building the style cost function
Gram matrix (or style matrix) which serves as the basic building block of style cost function computes the correlation between filters. This matrix is of dimensions (nC,nC) where nC is the number of filters. The value G(i,j) measures how similar the activation of filter i are to the activation of filter j. The Style matrix G measures the style of an image. After generating the Style matrix (Gram matrix), goal is to minimize the distance between the Gram matrix of the “style” image S and that of the “generated” image G
where G(S) and G(G) are respectively the Gram matrices of the “style” image and the “generated” image, computed using the hidden layer activations for a particular hidden layer in the network.
Better results are obtained when style costs from several different layers are merged. This is in contrast to the content representation, where usually using just a single hidden layer is sufficient. λ[l] is the weights given to different layers
Minimizing the style cost will cause the image G to follow the style of the image S
Putting it together to get total cost
Cost function that minimizes both the style and the content cost
The total cost is a linear combination of the content cost and the style cost. α and β are hyperparameters that control the relative weighting between content and style
Now, reduce the cost function and “generated” image will be a combination of content of content image and style of style image
The following steps are to be followed to synthesize new images. Find correlation between the steps mentioned below and nst_main.py
pretrained_model
folder. Change the config.py
file to point to VGG19 model path nst_main.py
on different style and content images placed in images
folder. Change the config.py
accordinglyimages
directory. Corresponding output images can be found in output
directory