ReLUs
ReLU is one of the most commonly used activation functions in deep neurual networks. It is non-linear or rectified linear. The ReLU function is 0 for negative inputs and $x$ for all inputs $x>0$. In TensorFlow it can be defined as:
hidden_layer = tf.add(tf.matmul(features,hidden_weights), hidden_bias)
hidden_layer = tf.nn.relu(hidden_layer)
output = tf.add(tf.matmul(hidden_layer, output_weights), output_bias)
An exmaple of DNN in TF
In the previous post Introduction to TensorFlow we have seen an example of using logistic classifier in TF. Here, we are going to see how to build a 2-layer DNN with logistic classifier embedded.
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches
learning_rate = 0.001
n_input = 784 # mnist data shape 28*28
n_classes = 10 # 0-9 digits
# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)
n_hidden_layer = 256 # layer number of features
# Weights and bias
weights = {
'hidden_layer': tf.Variable(tf.random_normal([n_features, n_hiddend_layer]))
'output_layer': tf.Variable(tf.random_normal(n_hiddend_layer, n_classes))
}
bias = {
'hidden_layer': tf.Variable(tf.random_normal([n_hiddend_layer]))
'output_layer': tf.Variable(tf.random_normal([n_classes]))
}
# TF inputs
features = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, n_classes])
x_flat = tf.reshape(features, [-1, n_input])
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])
# define loss and optimizer
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimize(learning_rate=learning_rate).minimize(loss)
training_epochs = 128
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
total_batch = int(mnist.train.num_examples/batch_size)
for i in range(total_batch):
batch_features, batch_labels = mnist.train.next_batch(batch_size)
sess.run(optimizer, feed_dict={features:batch_features, labels:batch_labels})
# Print status for every 10 epochs
if epoch % 10 == 0:
valid_accuracy = sess.run(
accuracy,
feed_dict={
features: mnist.validation.images,
labels: mnist.validation.labels})
print('Epoch {:<3} - Validation Accuracy: {}'.format(
epoch,
valid_accuracy))
Save and Restore Models
Training a model can take hours. But once you close your TensorFlow session, you lose all the trained weights and biases. If you were to reuse the model in the future, you would have to train it all over again!
Fortunately, TensorFlow gives you the ability to save your progress using a class called tf.train.Saver
. This class provides the functionality to save any tf.Variable
to your file system. They are easy to write in TF:
# Save model
save_file = './model.ckpt'
saver = tf.train.Saver()
# some codes are omitted
with tf.Session() as sess:
sess.run(#some codes are omitted
)
saver.save(sess, save_file)
# Restore model
save_file = './model.ckpt'
saver = tf.train.Saver()
# some codes are omitted
with tf.Session() as sess:
saver.restore(sess, save_file)
sess.run(#some codes are omitted
)
Finetuning
Sometimes you might want to adjust, or “finetune” a model that you have already trained and saved. However, loading saved tf.Variables
directly into a modified model can generate errors. Let’s go over how to avoid these problems.
TensorFlow uses a string identifier for Tensors and Operations called ‘name’. If a ‘name’ is not given, TensorFlow will create one automatically. TensorFlow will give the first node the name
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match.
Instead of letting TensorFlow set the name property, let’s set it manually:
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
Regularization and Dropout
Regularization means applying artificial constraints on the network that implicitly reduce the number of free parameters, while not making it more difficult to optimize. One commonly regularization in deep learning is called L2 Regularization which can be written as:
The idea is to add another term to the loss, which penalizes large weights. It is typically achieved by adding the L2 norm of the weights to the loss multiplied by a small constant.
Another overfitting prevention technique is called dropout, which is to temporarily drops units (artificial neurons) from the network, along with all of those units’ incoming and outgoing connections. This prevents units from co-adapting too much. The graph below shows the idea of dropout.
In TensorFlow, dropout can be used in this fashion:
keep_prob = tf.placeholder(tf.float32)
hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_bias)
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)
logits = tf.add(tf.matmul(hidden_layer, output_weights), output_bias)
A rule of thumb about dropout is that: During training, a good starting value for keep_prob
is 0.5. During testing, use a keep_prob
value of 1.0 to keep all units and maximize the power of the model.
Disclaimer: This post includes my personal reflections and notes on learning Deep Learning Nanodegree from Udacity. Some texts and images are from the learning materials for better educational purposes.