Content

1 Introduction
2 Prerequisites
3 Imports and Setup
4 Anatomy of Neural Networks
5 Model Objective
6 The Right Combination
7 Proper Generalization
8 Measuring Performance
9 Dataset
10 Convnet Architecture
11 Joining Processes
12 Epochs and Synchronization
13 Final Remarks

Vijona

25 Feb at 9:47

Deep Learning with PyTorch: Training, Validation, and Accuracy

Introduction

When it comes to deep learning, most frameworks do not come with prepackaged training, validation, and accuracy functions or methods. Getting started with these functionalities can therefore be a challenge to many engineers when they are first tackling data science problems. In most cases, these processes need to be manually implemented. At this point, it becomes tricky. In order to write these functions, one needs to actually understand what the processes entail. In this beginner’s tutorial article, we will examine the above-mentioned processes theoretically at a high level and implement them in PyTorch before putting them together and training a convolutional neural network for a classification task.

Prerequisites

Dataset: Prepare and preprocess your dataset (e.g., normalization, resizing).
DataLoader: Use DataLoader for batching and shuffling training and validation data.
Model: Define a neural network model using torch.nn.Module.
Loss Function: Choose a suitable loss function (e.g., nn.CrossEntropyLoss for classification).
Optimizer: Select an optimizer (e.g., Adam, SGD).
Training Loop: Implement loops to update model weights, calculate loss, and adjust learning rates.
Validation Loop: Evaluate model performance on the validation dataset.
Accuracy Metric: Compute accuracy using torch.max for classification tasks.

Imports and Setup

Below are some of the imported libraries we will use for the task. Each is pre-installed in Gradient Notebook’s Deep Learning runtimes, so use the link above to quick start this tutorial on a free GPU.

Copy Code


 
  import torch
  import torch.nn as nn
  import torch.nn.functional as F
  import torchvision
  import torchvision.transforms as transforms
  import torchvision.datasets as Datasets
  from torch.utils.data import Dataset, DataLoader
  import numpy as np
  import matplotlib.pyplot as plt
  import cv2
  from tqdm.notebook import tqdm
  if torch.cuda.is_available():
    device = torch.device('cuda:0')
    print('Running on the GPU')
  else:
    device = torch.device('cpu')
    print('Running on the CPU')

Anatomy of Neural Networks

Firstly, when talking about a model borne from a neural network, be it a multi-layer perceptron, convolutional neural network, or generative adversarial network, these models are simply made up of ‘numbers’—numbers which are weights and biases collectively called parameters. A neural network with 20 million parameters is simply one with 20 million numbers, each one influencing any instance of data that passes through the network (multiplicative for weights, additive for biases). Following this logic, when a 28 x 28 pixel image is passed through a convolutional neural network with 20 million parameters, all 784 pixels will in fact encounter and be transformed by all 20 million parameters in some way.

Model Objective

Consider the custom-built ConvNet below. The output layer returns a two-element vector representation, so it is safe to conclude that its objective is to help solve a binary classification task.

Copy Code


class ConvNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(3, 8, 3, padding=1)
    self.batchnorm1 = nn.BatchNorm2d(8)
    self.conv2 = nn.Conv2d(8, 8, 3, padding=1)
    self.batchnorm2 = nn.BatchNorm2d(8)
    self.pool2 = nn.MaxPool2d(2)
    self.conv3 = nn.Conv2d(8, 32, 3, padding=1)
    self.batchnorm3 = nn.BatchNorm2d(32)
    self.conv4 = nn.Conv2d(32, 32, 3, padding=1)
    self.batchnorm4 = nn.BatchNorm2d(32)
    self.pool4 = nn.MaxPool2d(2)
    self.conv5 = nn.Conv2d(32, 128, 3, padding=1)
    self.batchnorm5 = nn.BatchNorm2d(128)
    self.conv6 = nn.Conv2d(128, 128, 3, padding=1)
    self.batchnorm6 = nn.BatchNorm2d(128)
    self.pool6 = nn.MaxPool2d(2)
    self.conv7 = nn.Conv2d(128, 2, 1)
    self.pool7 = nn.AvgPool2d(3)

  def forward(self, x):
    #-------------
    # INPUT
    #-------------
    x = x.view(-1, 3, 32, 32)
    
    #-------------
    # LAYER 1
    #-------------
    output_1 = self.conv1(x)
    output_1 = F.relu(output_1)
    output_1 = self.batchnorm1(output_1)

    #-------------
    # LAYER 2
    #-------------
    output_2 = self.conv2(output_1)
    output_2 = F.relu(output_2)
    output_2 = self.pool2(output_2)
    output_2 = self.batchnorm2(output_2)

    #-------------
    # LAYER 3
    #-------------
    output_3 = self.conv3(output_2)
    output_3 = F.relu(output_3)
    output_3 = self.batchnorm3(output_3)

    #-------------
    # LAYER 4
    #-------------
    output_4 = self.conv4(output_3)
    output_4 = F.relu(output_4)
    output_4 = self.pool4(output_4)
    output_4 = self.batchnorm4(output_4)

    #-------------
    # LAYER 5
    #-------------
    output_5 = self.conv5(output_4)
    output_5 = F.relu(output_5)
    output_5 = self.batchnorm5(output_5)

    #-------------
    # LAYER 6
    #-------------
    output_6 = self.conv6(output_5)
    output_6 = F.relu(output_6)
    output_6 = self.pool6(output_6)
    output_6 = self.batchnorm6(output_6)

    #--------------
    # OUTPUT LAYER
    #--------------
    output_7 = self.conv7(output_6)
    output_7 = self.pool7(output_7)
    output_7 = output_7.view(-1, 2)

    return F.softmax(output_7, dim=1)

Assume we would like to train this convnet to correctly distinguish cats, labelled 0, from dogs, labelled 1. Essentially, what we are trying to do on a low level is to ensure that whenever an image of a cat is passed through the network, and all it’s pixels interact with all 197,898 parameters in this convnet, that the first element in the output vector (index 0) will be greater than the second element (index 1) {eg [0.65, 0.35]}. Otherwise, if an image of a dog is passed through, then the second element (index 1) is expected to be greater {eg [0.20, 0.80]}.

Now, we can begin to perceive the enormity of the model objective, and, yes, there exist a combination of 197,898 numbers/parameters in the millions of permutations possible that allow us to do just that which we have described in the previous paragraph. Looking for this permutation is simply what the training process entails.

The Right Combination

When ever a neural network is instantiated, it’s parameters are all randomized, or if you are initializing parameters through a particular technique they will not be random but follow initializations specific to that particular technique. Notwithstanding, at initialization the network parameters will not fit the model objective at hand such that if the model is used in that state random classifications will be obtained.

The goal now is to find the right combination of 197,898 parameters which will allow us to archive our objective. To do this, we need to break training images into batches, and then pass a batch through the convnet, measure how wrong our classifications are (forward propagation), adjust all 197,898 parameters slightly in the direction which best fits our objective (backpropagation), and then repeat for all other batches until the training data is exhausted. This process is called optimization.

Copy Code


def train(network, training_set, batch_size, optimizer, loss_function):
  """
  This function optimizes the convnet weights
  """
  #  creating list to hold loss per batch
  loss_per_batch = []

  #  defining dataloader
  train_loader = DataLoader(training_set, batch_size)

  #  iterating through batches
  print('training...')
  for images, labels in tqdm(train_loader):
    #---------------------------
    #  sending images to device
    #---------------------------
    images, labels = images.to(device), labels.to(device)

    #-----------------------------
    #  zeroing optimizer gradients
    #-----------------------------
    optimizer.zero_grad()

    #-----------------------
    #  classifying instances
    #-----------------------
    classifications = network(images)

    #---------------------------------------------------
    #  computing loss/how wrong our classifications are
    #---------------------------------------------------
    loss = loss_function(classifications, labels)
    loss_per_batch.append(loss.item())

    #------------------------------------------------------------
    #  computing gradients/the direction that fits our objective
    #------------------------------------------------------------
    loss.backward()

    #---------------------------------------------------
    #  optimizing weights/slightly adjusting parameters
    #---------------------------------------------------
    optimizer.step()
  print('all done!')

  return loss_per_batch

Proper Generalization

To ensure that optimized parameters work on data outside the training set, we validate them using a different dataset called the validation set. This process ensures comparable performance on unseen data…

Copy Code


def validate(network, validation_set, batch_size, loss_function):
  """
  This function validates convnet parameter optimizations
  """
  #  creating a list to hold loss per batch
  loss_per_batch = []

  #  defining model state
  network.eval()

  #  defining dataloader
  val_loader = DataLoader(validation_set, batch_size)

  print('validating...')
  #  preventing gradient calculations since we will not be optimizing
  with torch.no_grad():
    #  iterating through batches
    for images, labels in tqdm(val_loader):
      #--------------------------------------
      #  sending images and labels to device
      #--------------------------------------
      images, labels = images.to(device), labels.to(device)

      #--------------------------
      #  making classsifications
      #--------------------------
      classifications = network(images)

      #-----------------
      #  computing loss
      #-----------------
      loss = loss_function(classifications, labels)
      loss_per_batch.append(loss.item())

  print('all done!')

  return loss_per_batch

Measuring Performance

While dealing with a classification task in the context of a balanced training set, model performance is best measured using accuracy as the metric of choice. Since labels are integers which are essentially pointers to the index which should have the highest probability/value, to derive accuracy we need to compare the index of the maximum value in the output vector representation when an image passes through the convnet with with the image’s label. Accuracy is measured on both the training and validation set.

Copy Code


def accuracy(network, dataset):
  """
  This function computes accuracy
  """
  #  setting model state
  network.eval()
  
  #  instantiating counters
  total_correct = 0
  total_instances = 0

  #  creating dataloader
  dataloader = DataLoader(dataset, 64)

  #  iterating through batches
  with torch.no_grad():
    for images, labels in tqdm(dataloader):
      images, labels = images.to(device), labels.to(device)

      #-------------------------------------------------------------------------
      #  making classifications and deriving indices of maximum value via argmax
      #-------------------------------------------------------------------------
      classifications = torch.argmax(network(images), dim=1)

      #--------------------------------------------------
      #  comparing indicies of maximum values and labels
      #--------------------------------------------------
      correct_predictions = sum(classifications==labels).item()

      #------------------------
      #  incrementing counters
      #------------------------
      total_correct+=correct_predictions
      total_instances+=len(images)
  return round(total_correct/total_instances, 3)

Dataset

In order to observe all these processes working in synergy, we will now apply them on an actual dataset. The CIFAR-10 dataset will be used for this purpose. This is a dataset containing 32 x 32 pixel images from 10 different classes as outlined in the table below.

Copy Code


#  loading training data
training_set = Datasets.CIFAR10(root='./', download=True,
                              transform=transforms.ToTensor())

#  loading validation data
validation_set = Datasets.CIFAR10(root='./', download=True, train=False,
                                transform=transforms.ToTensor())

Convnet Architecture

Since this is a 10-class classification task, we need to modify our ConvNet to output a 10-element vector, as shown in the following code:

Copy Code


class ConvNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(3, 8, 3, padding=1)
    self.batchnorm1 = nn.BatchNorm2d(8)
    self.conv2 = nn.Conv2d(8, 8, 3, padding=1)
    self.batchnorm2 = nn.BatchNorm2d(8)
    self.pool2 = nn.MaxPool2d(2)
    self.conv3 = nn.Conv2d(8, 32, 3, padding=1)
    self.batchnorm3 = nn.BatchNorm2d(32)
    self.conv4 = nn.Conv2d(32, 32, 3, padding=1)
    self.batchnorm4 = nn.BatchNorm2d(32)
    self.pool4 = nn.MaxPool2d(2)
    self.conv5 = nn.Conv2d(32, 128, 3, padding=1)
    self.batchnorm5 = nn.BatchNorm2d(128)
    self.conv6 = nn.Conv2d(128, 128, 3, padding=1)
    self.batchnorm6 = nn.BatchNorm2d(128)
    self.pool6 = nn.MaxPool2d(2)
    self.conv7 = nn.Conv2d(128, 10, 1)
    self.pool7 = nn.AvgPool2d(3)

  def forward(self, x):
    #-------------
    # INPUT
    #-------------
    x = x.view(-1, 3, 32, 32)
    
    #-------------
    # LAYER 1
    #-------------
    output_1 = self.conv1(x)
    output_1 = F.relu(output_1)
    output_1 = self.batchnorm1(output_1)

    #-------------
    # LAYER 2
    #-------------
    output_2 = self.conv2(output_1)
    output_2 = F.relu(output_2)
    output_2 = self.pool2(output_2)
    output_2 = self.batchnorm2(output_2)

    #-------------
    # LAYER 3
    #-------------
    output_3 = self.conv3(output_2)
    output_3 = F.relu(output_3)
    output_3 = self.batchnorm3(output_3)

    #-------------
    # LAYER 4
    #-------------
    output_4 = self.conv4(output_3)
    output_4 = F.relu(output_4)
    output_4 = self.pool4(output_4)
    output_4 = self.batchnorm4(output_4)

    #-------------
    # LAYER 5
    #-------------
    output_5 = self.conv5(output_4)
    output_5 = F.relu(output_5)
    output_5 = self.batchnorm5(output_5)

    #-------------
    # LAYER 6
    #-------------
    output_6 = self.conv6(output_5)
    output_6 = F.relu(output_6)
    output_6 = self.pool6(output_6)
    output_6 = self.batchnorm6(output_6)

    #--------------
    # OUTPUT LAYER
    #--------------
    output_7 = self.conv7(output_6)
    output_7 = self.pool7(output_7)
    output_7 = output_7.view(-1, 10)

    return F.softmax(output_7, dim=1)

Joining Processes

To train the defined ConvNet, we instantiate the model, utilize the training function to optimize the ConvNet weights, and record the losses during training and validation. The code below combines these processes.

Copy Code


#  instantiating model
model = ConvNet()
#  defining optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=3e-4)

#  training/optimizing parameters
training_losses = train(network=model, training_set=training_set, 
                        batch_size=64, optimizer=optimizer, 
                        loss_function=nn.CrossEntropyLoss())
                        
#  validating optimizations                        
validation_losses = validate(network=model, validation_set=validation_set, 
                             batch_size=64, loss_function=nn.CrossEntropyLoss())

#  deriving model accuracy on the traininig set
training_accuracy = accuracy(model, training_set)
print(f'training accuracy: {training_accuracy}')

#  deriving model accuracy on the validation set
validation_accuracy = accuracy(model, validation_set)
print(f'validation accuracy: {validation_accuracy}')

Epochs and Synchronization

Training for multiple epochs improves the performance of the model. Ensure the processes are properly synchronized to retain optimized weights during successive runs. Packaging all processes into a class is a neat way to handle this, as shown below:

Copy Code


class ConvolutionalNeuralNet():
  def __init__(self, network):
    self.network = network.to(device)
    self.optimizer = torch.optim.Adam(self.network.parameters(), lr=1e-3)

  def train(self, loss_function, epochs, batch_size, 
            training_set, validation_set):
    
    #  creating log
    log_dict = {
        'training_loss_per_batch': [],
        'validation_loss_per_batch': [],
        'training_accuracy_per_epoch': [],
        'validation_accuracy_per_epoch': []
    } 

    #  defining weight initialization function
    def init_weights(module):
      if isinstance(module, nn.Conv2d):
        torch.nn.init.xavier_uniform_(module.weight)
        module.bias.data.fill_(0.01)
      elif isinstance(module, nn.Linear):
        torch.nn.init.xavier_uniform_(module.weight)
        module.bias.data.fill_(0.01)

    #  defining accuracy function
    def accuracy(network, dataloader):
      network.eval()
      total_correct = 0
      total_instances = 0
      for images, labels in tqdm(dataloader):
        images, labels = images.to(device), labels.to(device)
        predictions = torch.argmax(network(images), dim=1)
        correct_predictions = sum(predictions==labels).item()
        total_correct+=correct_predictions
        total_instances+=len(images)
      return round(total_correct/total_instances, 3)

    #  initializing network weights
    self.network.apply(init_weights)

    #  creating dataloaders
    train_loader = DataLoader(training_set, batch_size)
    val_loader = DataLoader(validation_set, batch_size)

    #  setting convnet to training mode
    self.network.train()

    for epoch in range(epochs):
      print(f'Epoch {epoch+1}/{epochs}')
      train_losses = []

      #  training
      print('training...')
      for images, labels in tqdm(train_loader):
        #  sending data to device
        images, labels = images.to(device), labels.to(device)
        #  resetting gradients
        self.optimizer.zero_grad()
        #  making predictions
        predictions = self.network(images)
        #  computing loss
        loss = loss_function(predictions, labels)
        log_dict['training_loss_per_batch'].append(loss.item())
        train_losses.append(loss.item())
        #  computing gradients
        loss.backward()
        #  updating weights
        self.optimizer.step()
      with torch.no_grad():
        print('deriving training accuracy...')
        #  computing training accuracy
        train_accuracy = accuracy(self.network, train_loader)
        log_dict['training_accuracy_per_epoch'].append(train_accuracy)

      #  validation
      print('validating...')
      val_losses = []

      #  setting convnet to evaluation mode
      self.network.eval()

      with torch.no_grad():
        for images, labels in tqdm(val_loader):
          #  sending data to device
          images, labels = images.to(device), labels.to(device)
          #  making predictions
          predictions = self.network(images)
          #  computing loss
          val_loss = loss_function(predictions, labels)
          log_dict['validation_loss_per_batch'].append(val_loss.item())
          val_losses.append(val_loss.item())
        #  computing accuracy
        print('deriving validation accuracy...')
        val_accuracy = accuracy(self.network, val_loader)
        log_dict['validation_accuracy_per_epoch'].append(val_accuracy)

      train_losses = np.array(train_losses).mean()
      val_losses = np.array(val_losses).mean()

      print(f'training_loss: {round(train_losses, 4)}  training_accuracy: '+
      f'{train_accuracy}  validation_loss: {round(val_losses, 4)} '+  
      f'validation_accuracy: {val_accuracy}\n')
      
    return log_dict

  def predict(self, x):
    return self.network(x)

Final Remarks

In this article, we explored the key processes in training neural networks: training, validation, and accuracy. These processes were implemented in PyTorch, and we demonstrated their integration within a class for efficiency. The practical example provided insights into training a convolutional neural network with the CIFAR-10 dataset. These concepts can now be applied to real-world deep learning projects.

Source: digitalocean.com

Create a Free Account

Try now

Posts you might be interested in:

CODEGEN: A Transformative Open-Source Language Model for Versatile Program Synthesis

AI/ML, Tutorial

5 days ago

CODEGEN: A Transformative Open-Source Language Model for Versatile Program Synthesis With the rise of large language models (LLMs), we are thinking and approaching things differently for many tasks, from natural…

Understanding Adversarial Attacks Using Fast Gradient Sign Method | AI Security Guide

AI/ML, Tutorial

5 days ago

Adversarial Attacks in Machine Learning In machine learning and artificial intelligence, adversarial attacks have gained much attention from researchers. These attacks alter the inputs to mislead the model into making…

The Role of Warps in Parallel Processing: GPU Efficiency Explained

AI/ML, Tutorial

7 days ago

GPUs and Parallel Processing: Understanding Warps GPUs are described as parallel processors for their ability to execute work in parallel. Tasks are divided into smaller sub-tasks, executed simultaneously by multiple…

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

Deep Learning with PyTorch: Training, Validation, and Accuracy

Introduction

Prerequisites

Imports and Setup

Anatomy of Neural Networks

Model Objective

The Right Combination

Proper Generalization

Measuring Performance

Dataset

Convnet Architecture

Joining Processes

Epochs and Synchronization

Final Remarks