Blog Post 5 - Tensorflow

We begin our blog post by using the template code provided by Professor Chodrow. This code initializes the datasets we will use for training and validation of our model. This code block also uses the tf.data module to speed up the loading of images.

Importing Packages and Dataset

import os
from tensorflow.keras import utils
import tensorflow as tf 
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Conv2D, Flatten, Dense, Dropout, MaxPooling2D, InputLayer, ReLU, Softmax, RandomFlip, RandomRotation, Rescaling

# location of data
_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'

# download the data and extract it
path_to_zip = utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)

# construct paths
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

# parameters for datasets
BATCH_SIZE = 32
IMG_SIZE = (160, 160)

# construct train and validation datasets 
train_dataset = utils.image_dataset_from_directory(train_dir,
                                                   shuffle=True,
                                                   batch_size=BATCH_SIZE,
                                                   image_size=IMG_SIZE)

validation_dataset = utils.image_dataset_from_directory(validation_dir,
                                                        shuffle=True,
                                                        batch_size=BATCH_SIZE,
                                                        image_size=IMG_SIZE)

class_names = train_dataset.class_names
# construct the test dataset by taking every 5th observation out of the validation dataset
val_batches = tf.data.experimental.cardinality(validation_dataset)
test_dataset = validation_dataset.take(val_batches // 5)
validation_dataset = validation_dataset.skip(val_batches // 5)

# speed code
AUTOTUNE = tf.data.AUTOTUNE

train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
validation_dataset = validation_dataset.prefetch(buffer_size=AUTOTUNE)
test_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)

Label Counts

We want to verify we have equal counts of cats and dogs.

# Checking how many of each label exist
labels_iterator = test_dataset.unbatch().map(lambda image, label: label==0).as_numpy_iterator()
sum(labels_iterator)

We see the sum of labels equal to 0 is 1000 out of the 2000 total images, so we expect our baseline to be 0.5 for our predicitions.

Visualize Images

We also would like to see a sample of some of the images in our dataset to better understand the task. Let’s split the data into two rows, one for cats and one for dogs.

def visualize_images():
    plt.figure(figsize=(10, 10))
    for images, labels in train_dataset.take(1):
        cat_count = 0
        dog_count = 0
        for i in range(len(labels)):
            if class_names[labels[i]] == "cats" and cat_count < 3:
                ax = plt.subplot(2, 3, cat_count + 1)
                plt.imshow(images[i].numpy().astype("uint8"))
                plt.title(class_names[labels[i]])
                plt.axis("off")
                cat_count += 1
                i += 1
            elif class_names[labels[i]] == "dogs" and dog_count < 3:
                ax = plt.subplot(2, 3, dog_count + 4)
                plt.imshow(images[i].numpy().astype("uint8"))
                plt.title(class_names[labels[i]])
                plt.axis("off")
                dog_count += 1
                i += 1

visualize_images()

We see a lot of complexity within these images and so we expect we may need a complicated model to perform precise distinguishing.

Visualizing Training History

We would often like to see how our model trains, so we will make a function to quickly plot the training performance.

def visualize_history(history):
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']

    loss = history.history['loss']
    val_loss = history.history['val_loss']

    plt.figure(figsize=(8, 8))
    plt.subplot(2, 1, 1)
    plt.plot(acc, label='Training Accuracy')
    plt.plot(val_acc, label='Validation Accuracy')
    plt.legend(loc='lower right')
    plt.ylabel('Accuracy')
    plt.ylim([min(plt.ylim()),1])
    plt.title('Training and Validation Accuracy')

    plt.subplot(2, 1, 2)
    plt.plot(loss, label='Training Loss')
    plt.plot(val_loss, label='Validation Loss')
    plt.legend(loc='upper right')
    plt.ylabel('Cross Entropy')
    plt.ylim([0,1.0])
    plt.title('Training and Validation Loss')
    plt.xlabel('epoch')
    plt.show()

Model 1

Our first model uses only the initalize 2000 images without any data augmentation or preprocessing. We have a lenient goal of 52% validation accuracy so we imitate a simple model from lecture to hopefully prevent overfitting too much.

# Create our sequntial model similarly to in lecture 
model1 = tf.keras.models.Sequential([
    Conv2D(32, 9, activation='relu', input_shape=(160, 160, 3)),
    Dropout(0.3),
    MaxPooling2D((2, 2)),
    Conv2D(32, 6, activation='relu'),
    Dropout(0.3),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(1, activation="sigmoid") 
])

# Compile using adam optimizer and BinaryCrossentropy loss
model1.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

# We train on 20 epochs as prescibed and visualize the training process
history = model1.fit(train_dataset, 
                     epochs=20, 
                     validation_data=validation_dataset)
visualize_history(history)

We see gross overfitting and very performance on the validation data, we experimented with many, many other archtectures and all of them tend to overfit. This is due to the very small size of our dataset and the relatively complex task.

Data Augmentation

Data Augmentation is the process of taking exising data and slightly perturbing it in such a way that the distinguishing features stay intact and our model is able to learn on a wider variety of images.

The layers, RandomFlip('horizontal') and RandomRotation(0.1) are layers that we can add to our model which increases the flexibility of training data by flipping and rotating our images respectively. We demonstrate them here:

flipper = tf.keras.Sequential([
    RandomFlip('horizontal')
])
rotater = tf.keras.Sequential([
    tf.keras.layers.RandomRotation(0.1),    
])

for image, _ in train_dataset.take(1):
    plt.figure(figsize=(10, 10))
    first_image = image[0]
    # In the first three rows we perform random flips 
    for i in range(3):
        ax = plt.subplot(2, 3, i + 1)
        augmented_image = flipper(tf.expand_dims(first_image, 0))
        plt.imshow(augmented_image[0] / 255)
        plt.axis('off')
    # In the second three rows we perform random rotations
    for i in range(3, 6):
        ax = plt.subplot(2, 3, i + 1)
        augmented_image = rotater(tf.expand_dims(first_image, 0))
        plt.imshow(augmented_image[0] / 255)
        plt.axis('off')

Model 2

By adding data augmentation, we allow ourselves to create a larger model which can better learn distinguishing features.

model2 = tf.keras.Sequential([
    InputLayer(input_shape=(160, 160, 3)),
    # Data Augmentation Layers
    RandomFlip('horizontal'),
    RandomRotation(0.2),     

    # Two Sets of Dual Convolutions
    Conv2D(32, 5, activation='relu'),
    Conv2D(32, 3, activation='relu'),
    MaxPooling2D((2, 2)),

    Conv2D(32, 3, activation='relu'),
    Conv2D(32, 3, activation='relu'),
    MaxPooling2D((2, 2)),

    Flatten(),
    # Large Dense layer to learn more features
    Dense(2048, activation='relu'),
    # High Values of Dropout at the end aided in accuracy and overfitting
    Dropout(0.75),
    Dense(1, activation="sigmoid") 
])

model2.compile(optimizer=tf.keras.optimizers.Adam(0.0001),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

history = model2.fit(train_dataset, 
                     epochs=20, 
                     validation_data=validation_dataset)
visualize_history(history)

Model 3

The next step in this process will allow us to normalize our image values and let the model parameters also stay within a smaller range. This code block creates a model layer which will process our input images.

i = tf.keras.Input(shape=(160, 160, 3))
x = tf.keras.applications.mobilenet_v2.preprocess_input(i)
preprocessor = tf.keras.Model(inputs = [i], outputs = [x], name='preprocessor')

Now, the relevant model:

model3 = tf.keras.Sequential([
    # Only change is the preprocessor layer
    preprocessor,

    RandomFlip('horizontal'),
    RandomRotation(0.2),     

    Conv2D(32, 5, activation='relu'),
    Conv2D(32, 3, activation='relu'),
    MaxPooling2D((2, 2)),

    Conv2D(32, 3, activation='relu'),
    Conv2D(32, 3, activation='relu'),
    MaxPooling2D((2, 2)),

    Flatten(),
    Dense(2048, activation='relu'),
    Dropout(0.75),
    Dense(1, activation="sigmoid") 
])

model3.compile(optimizer=tf.keras.optimizers.Adam(0.0001),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

history = model3.fit(train_dataset, 
                     epochs=20, 
                     validation_data=validation_dataset)
visualize_history(history)

Here we see almost no overfitting and very successful accuracy on validation.

Model 4

Our next model we try is a completely different technique called transfer learning. We will take an exisitng model named MobileNetV2 and fine tune a layer after it to focus on only cats and dogs.

IMG_SHAPE = IMG_SIZE + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
base_model.trainable = False

i = tf.keras.Input(shape=IMG_SHAPE)
x = base_model(i, training = False)
base_model_layer = tf.keras.Model(inputs = [i], outputs = [x])

This code allows us to make MobileNetV2 into a frozen layer and insert it into a simple model as below.

model4 = tf.keras.Sequential([
    preprocessor,
    RandomFlip('horizontal'),
    RandomRotation(0.2),     

    base_model_layer,
    tf.keras.layers.GlobalAveragePooling2D(),
    Dropout(0.75),
    Dense(1, activation="sigmoid") 
])

model4.compile(optimizer=tf.keras.optimizers.Adam(0.0001),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

history = model4.fit(train_dataset, 
                     epochs=20, 
                     validation_data=validation_dataset)
visualize_history(history)

Evaluation on Test Set

Since this fine tuning of MobileNetV2 is by far our best performer we test this on our test dataset.

model4.evaluate(test_dataset)

6/6 [==============================] - 1s 69ms/step - loss: 0.1047 - accuracy: 0.9740
[0.10467348247766495, 0.9739583134651184]

Here we see an accuracy above 97% which is fantastic considering our first models were only getting 60%. However MobileNetV2 is a much more complicated network trained on much more data so it was much more expensive to train. In the end we see that many different CNN architectures exist and there are many techniques to help enable our models to succeed.

Written on November 11, 2021