Basics about Convolutional Neural Networks I

ReLUs
An exmaple of DNN in TF
Save and Restore Models
Finetuning
Regularization and Dropout

ReLUs

ReLU is one of the most commonly used activation functions in deep neurual networks. It is non-linear or rectified linear. The ReLU function is 0 for negative inputs and $x$ for all inputs $x>0$. In TensorFlow it can be defined as:

hidden_layer = tf.add(tf.matmul(features,hidden_weights), hidden_bias)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, output_weights), output_bias)

An exmaple of DNN in TF

In the previous post Introduction to TensorFlow we have seen an example of using logistic classifier in TF. Here, we are going to see how to build a 2-layer DNN with logistic classifier embedded.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches

learning_rate = 0.001
n_input = 784 # mnist data shape 28*28
n_classes = 10 # 0-9 digits

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

n_hidden_layer = 256 # layer number of features
# Weights and bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_features, n_hiddend_layer]))
    'output_layer': tf.Variable(tf.random_normal(n_hiddend_layer, n_classes))
}
bias = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hiddend_layer]))
    'output_layer': tf.Variable(tf.random_normal([n_classes]))
}

# TF inputs
features = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, n_classes])
x_flat = tf.reshape(features, [-1, n_input])

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
    biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])

# define loss and optimizer
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimize(learning_rate=learning_rate).minimize(loss)

training_epochs = 128
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(optimizer, feed_dict={features:batch_features, labels:batch_labels})
        
        # Print status for every 10 epochs
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

Save and Restore Models

Training a model can take hours. But once you close your TensorFlow session, you lose all the trained weights and biases. If you were to reuse the model in the future, you would have to train it all over again!

Fortunately, TensorFlow gives you the ability to save your progress using a class called tf.train.Saver. This class provides the functionality to save any tf.Variable to your file system. They are easy to write in TF:

# Save model
save_file = './model.ckpt'
saver = tf.train.Saver()
# some codes are omitted
with tf.Session() as sess:
    sess.run(#some codes are omitted
    )
    saver.save(sess, save_file)

# Restore model
save_file = './model.ckpt'
saver = tf.train.Saver()
# some codes are omitted
with tf.Session() as sess:
    saver.restore(sess, save_file)
    sess.run(#some codes are omitted
    )

Finetuning

Sometimes you might want to adjust, or “finetune” a model that you have already trained and saved. However, loading saved tf.Variables directly into a modified model can generate errors. Let’s go over how to avoid these problems.

TensorFlow uses a string identifier for Tensors and Operations called ‘name’. If a ‘name’ is not given, TensorFlow will create one automatically. TensorFlow will give the first node the name , and then give the name _ for the subsequent nodes. As a result, after the variables and weights are restored they are given different names than we previously defined them, showing something like: Variable:0, Variable_1:0. Then, it will arise this error:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. Instead of letting TensorFlow set the name property, let’s set it manually:

weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

Regularization and Dropout

Regularization means applying artificial constraints on the network that implicitly reduce the number of free parameters, while not making it more difficult to optimize. One commonly regularization in deep learning is called L2 Regularization which can be written as:

$\begin{align*} l^{'} = l + \beta\frac{1}{2}\left \| w \right \|_{2}^{2} \end{align*}$

The idea is to add another term to the loss, which penalizes large weights. It is typically achieved by adding the L2 norm of the weights to the loss multiplied by a small constant.

Another overfitting prevention technique is called dropout, which is to temporarily drops units (artificial neurons) from the network, along with all of those units’ incoming and outgoing connections. This prevents units from co-adapting too much. The graph below shows the idea of dropout.

In TensorFlow, dropout can be used in this fashion:

keep_prob = tf.placeholder(tf.float32)
hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_bias)
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)
logits = tf.add(tf.matmul(hidden_layer, output_weights), output_bias)

A rule of thumb about dropout is that: During training, a good starting value for keep_prob is 0.5. During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model.

Disclaimer: This post includes my personal reflections and notes on learning Deep Learning Nanodegree from Udacity. Some texts and images are from the learning materials for better educational purposes.