Deep Learning with PyTorch: Training, Validation, and Accuracy

Introduction

When it comes to deep learning, most frameworks do not come with prepackaged training, validation, and accuracy functions or methods. Getting started with these functionalities can therefore be a challenge to many engineers when they are first tackling data science problems. In most cases, these processes need to be manually implemented. At this point, it becomes tricky. In order to write these functions, one needs to actually understand what the processes entail. In this beginner’s tutorial article, we will examine the above-mentioned processes theoretically at a high level and implement them in PyTorch before putting them together and training a convolutional neural network for a classification task.

Prerequisites

  • Dataset: Prepare and preprocess your dataset (e.g., normalization, resizing).
  • DataLoader: Use DataLoader for batching and shuffling training and validation data.
  • Model: Define a neural network model using torch.nn.Module.
  • Loss Function: Choose a suitable loss function (e.g., nn.CrossEntropyLoss for classification).
  • Optimizer: Select an optimizer (e.g., Adam, SGD).
  • Training Loop: Implement loops to update model weights, calculate loss, and adjust learning rates.
  • Validation Loop: Evaluate model performance on the validation dataset.
  • Accuracy Metric: Compute accuracy using torch.max for classification tasks.

Imports and Setup

Below are some of the imported libraries we will use for the task. Each is pre-installed in Gradient Notebook’s Deep Learning runtimes, so use the link above to quick start this tutorial on a free GPU.

 
  import torch
  import torch.nn as nn
  import torch.nn.functional as F
  import torchvision
  import torchvision.transforms as transforms
  import torchvision.datasets as Datasets
  from torch.utils.data import Dataset, DataLoader
  import numpy as np
  import matplotlib.pyplot as plt
  import cv2
  from tqdm.notebook import tqdm
  if torch.cuda.is_available():
    device = torch.device('cuda:0')
    print('Running on the GPU')
  else:
    device = torch.device('cpu')
    print('Running on the CPU')

Anatomy of Neural Networks

Firstly, when talking about a model borne from a neural network, be it a multi-layer perceptron, convolutional neural network, or generative adversarial network, these models are simply made up of ‘numbers’—numbers which are weights and biases collectively called parameters. A neural network with 20 million parameters is simply one with 20 million numbers, each one influencing any instance of data that passes through the network (multiplicative for weights, additive for biases). Following this logic, when a 28 x 28 pixel image is passed through a convolutional neural network with 20 million parameters, all 784 pixels will in fact encounter and be transformed by all 20 million parameters in some way.

Model Objective

Consider the custom-built ConvNet below. The output layer returns a two-element vector representation, so it is safe to conclude that its objective is to help solve a binary classification task.

class ConvNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(3, 8, 3, padding=1)
    self.batchnorm1 = nn.BatchNorm2d(8)
    self.conv2 = nn.Conv2d(8, 8, 3, padding=1)
    self.batchnorm2 = nn.BatchNorm2d(8)
    self.pool2 = nn.MaxPool2d(2)
    self.conv3 = nn.Conv2d(8, 32, 3, padding=1)
    self.batchnorm3 = nn.BatchNorm2d(32)
    self.conv4 = nn.Conv2d(32, 32, 3, padding=1)
    self.batchnorm4 = nn.BatchNorm2d(32)
    self.pool4 = nn.MaxPool2d(2)
    self.conv5 = nn.Conv2d(32, 128, 3, padding=1)
    self.batchnorm5 = nn.BatchNorm2d(128)
    self.conv6 = nn.Conv2d(128, 128, 3, padding=1)
    self.batchnorm6 = nn.BatchNorm2d(128)
    self.pool6 = nn.MaxPool2d(2)
    self.conv7 = nn.Conv2d(128, 2, 1)
    self.pool7 = nn.AvgPool2d(3)

  def forward(self, x):
    #-------------
    # INPUT
    #-------------
    x = x.view(-1, 3, 32, 32)
    
    #-------------
    # LAYER 1
    #-------------
    output_1 = self.conv1(x)
    output_1 = F.relu(output_1)
    output_1 = self.batchnorm1(output_1)

    #-------------
    # LAYER 2
    #-------------
    output_2 = self.conv2(output_1)
    output_2 = F.relu(output_2)
    output_2 = self.pool2(output_2)
    output_2 = self.batchnorm2(output_2)

    #-------------
    # LAYER 3
    #-------------
    output_3 = self.conv3(output_2)
    output_3 = F.relu(output_3)
    output_3 = self.batchnorm3(output_3)

    #-------------
    # LAYER 4
    #-------------
    output_4 = self.conv4(output_3)
    output_4 = F.relu(output_4)
    output_4 = self.pool4(output_4)
    output_4 = self.batchnorm4(output_4)

    #-------------
    # LAYER 5
    #-------------
    output_5 = self.conv5(output_4)
    output_5 = F.relu(output_5)
    output_5 = self.batchnorm5(output_5)

    #-------------
    # LAYER 6
    #-------------
    output_6 = self.conv6(output_5)
    output_6 = F.relu(output_6)
    output_6 = self.pool6(output_6)
    output_6 = self.batchnorm6(output_6)

    #--------------
    # OUTPUT LAYER
    #--------------
    output_7 = self.conv7(output_6)
    output_7 = self.pool7(output_7)
    output_7 = output_7.view(-1, 2)

    return F.softmax(output_7, dim=1)

Assume we would like to train this convnet to correctly distinguish cats, labelled 0, from dogs, labelled 1. Essentially, what we are trying to do on a low level is to ensure that whenever an image of a cat is passed through the network, and all it’s pixels interact with all 197,898 parameters in this convnet, that the first element in the output vector (index 0) will be greater than the second element (index 1) {eg [0.65, 0.35]}.  Otherwise, if an image of a dog is passed through, then the second element (index 1) is expected to be greater {eg [0.20, 0.80]}.

Now, we can begin to perceive the enormity of the model objective, and, yes, there exist a combination of 197,898 numbers/parameters in the millions of permutations possible that allow us to do just that which we have described in the previous paragraph. Looking for this permutation is simply what the training process entails.

The Right Combination

When ever a neural network is instantiated, it’s parameters are all randomized, or if you are initializing parameters through a particular technique they will not be random but follow initializations specific to that particular technique. Notwithstanding, at initialization the network parameters will not fit the model objective at hand such that if the model is used in that state random classifications will be obtained.

The goal now is to find the right combination of 197,898 parameters which will allow us to archive our objective. To do this, we need to break training images into batches, and then pass a batch through the convnet, measure how wrong our classifications are (forward propagation), adjust all 197,898 parameters slightly in the direction which best fits our objective (backpropagation), and then repeat for all other batches until the training data is exhausted. This process is called optimization.

def train(network, training_set, batch_size, optimizer, loss_function):
  """
  This function optimizes the convnet weights
  """
  #  creating list to hold loss per batch
  loss_per_batch = []

  #  defining dataloader
  train_loader = DataLoader(training_set, batch_size)

  #  iterating through batches
  print('training...')
  for images, labels in tqdm(train_loader):
    #---------------------------
    #  sending images to device
    #---------------------------
    images, labels = images.to(device), labels.to(device)

    #-----------------------------
    #  zeroing optimizer gradients
    #-----------------------------
    optimizer.zero_grad()

    #-----------------------
    #  classifying instances
    #-----------------------
    classifications = network(images)

    #---------------------------------------------------
    #  computing loss/how wrong our classifications are
    #---------------------------------------------------
    loss = loss_function(classifications, labels)
    loss_per_batch.append(loss.item())

    #------------------------------------------------------------
    #  computing gradients/the direction that fits our objective
    #------------------------------------------------------------
    loss.backward()

    #---------------------------------------------------
    #  optimizing weights/slightly adjusting parameters
    #---------------------------------------------------
    optimizer.step()
  print('all done!')

  return loss_per_batch

Proper Generalization

To ensure that optimized parameters work on data outside the training set, we validate them using a different dataset called the validation set. This process ensures comparable performance on unseen data…

def validate(network, validation_set, batch_size, loss_function):
  """
  This function validates convnet parameter optimizations
  """
  #  creating a list to hold loss per batch
  loss_per_batch = []

  #  defining model state
  network.eval()

  #  defining dataloader
  val_loader = DataLoader(validation_set, batch_size)

  print('validating...')
  #  preventing gradient calculations since we will not be optimizing
  with torch.no_grad():
    #  iterating through batches
    for images, labels in tqdm(val_loader):
      #--------------------------------------
      #  sending images and labels to device
      #--------------------------------------
      images, labels = images.to(device), labels.to(device)

      #--------------------------
      #  making classsifications
      #--------------------------
      classifications = network(images)

      #-----------------
      #  computing loss
      #-----------------
      loss = loss_function(classifications, labels)
      loss_per_batch.append(loss.item())

  print('all done!')

  return loss_per_batch

Measuring Performance

While dealing with a classification task in the context of a balanced training set, model performance is best measured using accuracy as the metric of choice. Since labels are integers which are essentially pointers to the index which should have the highest probability/value, to derive accuracy we need to compare the index of the maximum value in the output vector representation when an image passes through the convnet with with the image’s label. Accuracy is measured on both the training and validation set.

def accuracy(network, dataset):
  """
  This function computes accuracy
  """
  #  setting model state
  network.eval()
  
  #  instantiating counters
  total_correct = 0
  total_instances = 0

  #  creating dataloader
  dataloader = DataLoader(dataset, 64)

  #  iterating through batches
  with torch.no_grad():
    for images, labels in tqdm(dataloader):
      images, labels = images.to(device), labels.to(device)

      #-------------------------------------------------------------------------
      #  making classifications and deriving indices of maximum value via argmax
      #-------------------------------------------------------------------------
      classifications = torch.argmax(network(images), dim=1)

      #--------------------------------------------------
      #  comparing indicies of maximum values and labels
      #--------------------------------------------------
      correct_predictions = sum(classifications==labels).item()

      #------------------------
      #  incrementing counters
      #------------------------
      total_correct+=correct_predictions
      total_instances+=len(images)
  return round(total_correct/total_instances, 3)

Dataset

In order to observe all these processes working in synergy, we will now apply them on an actual dataset. The CIFAR-10 dataset will be used for this purpose. This is a dataset containing 32 x 32 pixel images from 10 different classes as outlined in the table below.

#  loading training data
training_set = Datasets.CIFAR10(root='./', download=True,
                              transform=transforms.ToTensor())

#  loading validation data
validation_set = Datasets.CIFAR10(root='./', download=True, train=False,
                                transform=transforms.ToTensor())

Convnet Architecture

Since this is a 10-class classification task, we need to modify our ConvNet to output a 10-element vector, as shown in the following code:

class ConvNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(3, 8, 3, padding=1)
    self.batchnorm1 = nn.BatchNorm2d(8)
    self.conv2 = nn.Conv2d(8, 8, 3, padding=1)
    self.batchnorm2 = nn.BatchNorm2d(8)
    self.pool2 = nn.MaxPool2d(2)
    self.conv3 = nn.Conv2d(8, 32, 3, padding=1)
    self.batchnorm3 = nn.BatchNorm2d(32)
    self.conv4 = nn.Conv2d(32, 32, 3, padding=1)
    self.batchnorm4 = nn.BatchNorm2d(32)
    self.pool4 = nn.MaxPool2d(2)
    self.conv5 = nn.Conv2d(32, 128, 3, padding=1)
    self.batchnorm5 = nn.BatchNorm2d(128)
    self.conv6 = nn.Conv2d(128, 128, 3, padding=1)
    self.batchnorm6 = nn.BatchNorm2d(128)
    self.pool6 = nn.MaxPool2d(2)
    self.conv7 = nn.Conv2d(128, 10, 1)
    self.pool7 = nn.AvgPool2d(3)

  def forward(self, x):
    #-------------
    # INPUT
    #-------------
    x = x.view(-1, 3, 32, 32)
    
    #-------------
    # LAYER 1
    #-------------
    output_1 = self.conv1(x)
    output_1 = F.relu(output_1)
    output_1 = self.batchnorm1(output_1)

    #-------------
    # LAYER 2
    #-------------
    output_2 = self.conv2(output_1)
    output_2 = F.relu(output_2)
    output_2 = self.pool2(output_2)
    output_2 = self.batchnorm2(output_2)

    #-------------
    # LAYER 3
    #-------------
    output_3 = self.conv3(output_2)
    output_3 = F.relu(output_3)
    output_3 = self.batchnorm3(output_3)

    #-------------
    # LAYER 4
    #-------------
    output_4 = self.conv4(output_3)
    output_4 = F.relu(output_4)
    output_4 = self.pool4(output_4)
    output_4 = self.batchnorm4(output_4)

    #-------------
    # LAYER 5
    #-------------
    output_5 = self.conv5(output_4)
    output_5 = F.relu(output_5)
    output_5 = self.batchnorm5(output_5)

    #-------------
    # LAYER 6
    #-------------
    output_6 = self.conv6(output_5)
    output_6 = F.relu(output_6)
    output_6 = self.pool6(output_6)
    output_6 = self.batchnorm6(output_6)

    #--------------
    # OUTPUT LAYER
    #--------------
    output_7 = self.conv7(output_6)
    output_7 = self.pool7(output_7)
    output_7 = output_7.view(-1, 10)

    return F.softmax(output_7, dim=1)

Joining Processes

To train the defined ConvNet, we instantiate the model, utilize the training function to optimize the ConvNet weights, and record the losses during training and validation. The code below combines these processes.

#  instantiating model
model = ConvNet()
#  defining optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=3e-4)

#  training/optimizing parameters
training_losses = train(network=model, training_set=training_set, 
                        batch_size=64, optimizer=optimizer, 
                        loss_function=nn.CrossEntropyLoss())
                        
#  validating optimizations                        
validation_losses = validate(network=model, validation_set=validation_set, 
                             batch_size=64, loss_function=nn.CrossEntropyLoss())

#  deriving model accuracy on the traininig set
training_accuracy = accuracy(model, training_set)
print(f'training accuracy: {training_accuracy}')

#  deriving model accuracy on the validation set
validation_accuracy = accuracy(model, validation_set)
print(f'validation accuracy: {validation_accuracy}')

Epochs and Synchronization

Training for multiple epochs improves the performance of the model. Ensure the processes are properly synchronized to retain optimized weights during successive runs. Packaging all processes into a class is a neat way to handle this, as shown below:

class ConvolutionalNeuralNet():
  def __init__(self, network):
    self.network = network.to(device)
    self.optimizer = torch.optim.Adam(self.network.parameters(), lr=1e-3)

  def train(self, loss_function, epochs, batch_size, 
            training_set, validation_set):
    
    #  creating log
    log_dict = {
        'training_loss_per_batch': [],
        'validation_loss_per_batch': [],
        'training_accuracy_per_epoch': [],
        'validation_accuracy_per_epoch': []
    } 

    #  defining weight initialization function
    def init_weights(module):
      if isinstance(module, nn.Conv2d):
        torch.nn.init.xavier_uniform_(module.weight)
        module.bias.data.fill_(0.01)
      elif isinstance(module, nn.Linear):
        torch.nn.init.xavier_uniform_(module.weight)
        module.bias.data.fill_(0.01)

    #  defining accuracy function
    def accuracy(network, dataloader):
      network.eval()
      total_correct = 0
      total_instances = 0
      for images, labels in tqdm(dataloader):
        images, labels = images.to(device), labels.to(device)
        predictions = torch.argmax(network(images), dim=1)
        correct_predictions = sum(predictions==labels).item()
        total_correct+=correct_predictions
        total_instances+=len(images)
      return round(total_correct/total_instances, 3)

    #  initializing network weights
    self.network.apply(init_weights)

    #  creating dataloaders
    train_loader = DataLoader(training_set, batch_size)
    val_loader = DataLoader(validation_set, batch_size)

    #  setting convnet to training mode
    self.network.train()

    for epoch in range(epochs):
      print(f'Epoch {epoch+1}/{epochs}')
      train_losses = []

      #  training
      print('training...')
      for images, labels in tqdm(train_loader):
        #  sending data to device
        images, labels = images.to(device), labels.to(device)
        #  resetting gradients
        self.optimizer.zero_grad()
        #  making predictions
        predictions = self.network(images)
        #  computing loss
        loss = loss_function(predictions, labels)
        log_dict['training_loss_per_batch'].append(loss.item())
        train_losses.append(loss.item())
        #  computing gradients
        loss.backward()
        #  updating weights
        self.optimizer.step()
      with torch.no_grad():
        print('deriving training accuracy...')
        #  computing training accuracy
        train_accuracy = accuracy(self.network, train_loader)
        log_dict['training_accuracy_per_epoch'].append(train_accuracy)

      #  validation
      print('validating...')
      val_losses = []

      #  setting convnet to evaluation mode
      self.network.eval()

      with torch.no_grad():
        for images, labels in tqdm(val_loader):
          #  sending data to device
          images, labels = images.to(device), labels.to(device)
          #  making predictions
          predictions = self.network(images)
          #  computing loss
          val_loss = loss_function(predictions, labels)
          log_dict['validation_loss_per_batch'].append(val_loss.item())
          val_losses.append(val_loss.item())
        #  computing accuracy
        print('deriving validation accuracy...')
        val_accuracy = accuracy(self.network, val_loader)
        log_dict['validation_accuracy_per_epoch'].append(val_accuracy)

      train_losses = np.array(train_losses).mean()
      val_losses = np.array(val_losses).mean()

      print(f'training_loss: {round(train_losses, 4)}  training_accuracy: '+
      f'{train_accuracy}  validation_loss: {round(val_losses, 4)} '+  
      f'validation_accuracy: {val_accuracy}\n')
      
    return log_dict

  def predict(self, x):
    return self.network(x)

Final Remarks

In this article, we explored the key processes in training neural networks: training, validation, and accuracy. These processes were implemented in PyTorch, and we demonstrated their integration within a class for efficiency. The practical example provided insights into training a convolutional neural network with the CIFAR-10 dataset. These concepts can now be applied to real-world deep learning projects.

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in: