Content

Vijona

25 Feb at 9:56

Models That Shaped Deep Learning: 2015-2016

In this article we’ll dive deep into the models that came after this period, in 2015-2016. Big advancements were made in just these two years, leading to boosts in accuracy and performance. Specifically, we’ll look at:

ResNet
Wide ResNet
InceptionV3
SqueezeNet
Let’s get started.

Prerequisites

Basic Knowledge of Neural Networks: Understand the fundamentals of convolutional neural networks (CNNs) and deep learning principles.
Familiarity with Backpropagation: Know how gradient descent and backpropagation work for training neural networks.
Understanding of Model Layers: Be aware of different types of layers (e.g., convolutional, pooling, fully connected) used in CNN architectures.
Experience with Deep Learning Frameworks: Have hands-on experience with frameworks like PyTorch or TensorFlow to implement and experiment with these architectures.

ResNet

As deep neural networks are both time-consuming to train and prone to overfitting, a team at Microsoft introduced a residual learning framework to improve the training of networks that are substantially deeper than those used previously. This research was published in the paper titled Deep Residual Learning for Image Recognition in 2015. And so, the famous ResNet (short for “Residual Network”) was born.

When training deep networks, there comes a point where an increase in depth causes accuracy to saturate, then degrade rapidly. This is called the “degradation problem.” This highlights that not all neural network architectures are equally easy to optimize.

ResNet uses a technique called “residual mapping” to combat this issue. Instead of hoping that every few stacked layers directly fit a desired underlying mapping, the Residual Network explicitly lets these layers fit a residual mapping. Below is the building block of a Residual network.

The formulation of F(x)+x can be realized by feedforward neural networks with shortcut connections.

Many problems can be addressed using ResNets. They are easy to optimize and achieve higher accuracy when the depth of the network increases, producing results that are better than previous networks. ResNet was first trained and tested on ImageNet’s over 1.2 million training images belonging to 1000 different classes.

ResNet Architecture

Compared to the conventional neural network architectures, ResNets are relatively easy to understand. Below is the image of a VGG network, a plain 34-layer neural network, and a 34-layer residual neural network. In the plain network, for the same output feature map, the layers have the same number of filters. If the size of output features is halved, the number of filters is doubled, making the training process more complex.

Meanwhile in the Residual Neural Network, as we can see, there are far fewer filters and lower complexity during the training with respect to VGG. A shortcut connection is added that turns the network into its counterpart residual version. This shortcut connection performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no additional parameter. The projection shortcut is mathematically represented as F(x{W}+x), which is used to match dimensions computed by 1×1 convolutions.

Each ResNet block is either two layers deep (used in small networks like ResNet 18 or 34), or three layers deep (ResNet 50, 101, or 152).

ResNet Training and Results

The samples from the ImageNet dataset are re-scaled to 224 × 224 and are normalized by a per-pixel mean subtraction. Stochastic gradient descent is used for optimization with a mini-batch size of 256. The learning rate starts from 0.1 and is divided by 10 when the error increases, and the models are trained up to 60 × 10⁴ iterations. The weight decay and momentum are set to 0.0001 and 0.9 respectively. Dropout layers are not used.

ResNet performs extremely well with deeper architectures. Below is an image showing the error rate of two 18 and 34-layer neural networks. On the left the graph shows plain networks, while the graph on the right shows their ResNet equivalents. The thin red curve in the image represents the training error, and the bold curve represents the validation error.

Below is the table showing the Top-1 error (%, 10-crop testing) on ImageNet validation.

ResNet has played a significant role in defining the field of deep learning as we know it today.

Below are a few important links if you’re interested in implementing a ResNet yourself:

PyTorch ResNet Implementation
Link to the Original Research Paper

Wide ResNet

The Wide Residual Network is a more recent improvement on the original Deep Residual Networks. Rather than relying on increasing the depth of a network to improve its accuracy, it was shown that a network could be made shallower and wider without compromising its performance. This ideology was presented in the paper Wide Residual Networks, published in 2016.

Wide ResNet Architecture

A Wide ResNet has a group of ResNet blocks stacked together, where each ResNet block follows the BatchNormalization-ReLU-Conv structure. This structure is depicted as follows:

Wide ResNet Training and Results

Wide ResNet was trained on CIFAR-10. The following metrics resulted in the lowest error rates:

Convolution type: B(3, 3)
Convolution layers per residual block: 2
Width of residual blocks: A depth of 28 and a width of 10 seemed to be less error-prone.
Dropout: When dropout was included the error rate was further reduced.

The following table compares the complexity and performance of Wide ResNet with several other models, including the original ResNet, on both CIFAR-10 and CIFAR-100:

Below are a few important links for implementing Wide ResNet yourself:

Link to the Original Paper
PyTorch Implementation of Wide ResNet
Tensorflow Implementation of Wide ResNet

Inception v3

Inception v3 mainly focuses on burning less computational power by modifying the previous Inception architectures. This idea was proposed in the paper Rethinking the Inception Architecture for Computer Vision, published in 2015. It was co-authored by Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, and Jonathon Shlens.

In comparison to VGGNet, Inception Networks (GoogLeNet/Inception v1) have proved to be more computationally efficient, both in terms of the number of parameters generated by the network and the economical cost incurred (memory and other resources). If any changes are to be made to an Inception Network, care needs to be taken to make sure that the computational advantages aren’t lost. Thus, the adaptation of an Inception network for different use cases turns out to be a problem due to the uncertainty of the new network’s efficiency.

In an Inception v3 model, several techniques for optimizing the network have been suggested to loosen the constraints for easier model adaptation. The techniques include:

Factorized convolutions
Regularization
Dimension reduction
Parallelized computations

Inception v3 Architecture

The architecture of an Inception v3 network is progressively built, step-by-step, as explained below:

1. Factorized Convolutions: This helps to reduce computational efficiency as it reduces the number of parameters involved in a network. It also keeps a check on the network efficiency.
2. Smaller Convolutions: Replacing bigger convolutions with smaller convolutions leads to faster training. For example, a 5 × 5 filter with 25 parameters can be replaced by two 3 × 3 filters, which have only 18 parameters in total.

1. Asymmetric Convolutions: A 3 × 3 convolution can be replaced by a 1 × 3 convolution followed by a 3 × 1 convolution.

1. Auxiliary Classifier: An auxiliary classifier is a small CNN inserted between layers during training. The loss incurred is added to the main network loss.

Grid Size Reduction: Grid size reduction is usually done by pooling operations. To combat the bottlenecks of computational cost, more efficient techniques have been proposed.

All the above concepts are consolidated into the final architecture.

Inception v3 Training and Results

Inception v3 was trained on ImageNet and compared with other contemporary models. As shown in the table below, when augmented with an auxiliary classifier, factorization of convolutions, RMSProp, and Label Smoothing, Inception v3 achieves the lowest error rates compared to its contemporaries.

Below are a few relevant links if you’re looking to implement Inception v3 yourself:

Link to the Original Research Paper
PyTorch Implementation of Inception v3

SqueezeNet

SqueezeNet is a smaller network designed as a more compact replacement for AlexNet. It has almost 50x fewer parameters than AlexNet, yet it performs 3x faster. This architecture was proposed by researchers at DeepScale, The University of California, Berkeley, and Stanford University in 2016. It was first published in their paper titled SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.

Below are the key ideas behind SqueezeNet:

Use 1 × 1 filters instead of 3 × 3
Decrease the number of input channels to 3 × 3 filters
Downsample late in the network so that convolution layers have large activation maps

SqueezeNet Architecture and Results

The SqueezeNet architecture is comprised of “squeeze” and “expand” layers. A squeeze convolutional layer has only 1 × 1 filters. These are fed into an expand layer that has a mix of 1 × 1 and 3 × 3 convolution filters. This is shown below:

A “Fire Module”

The authors of the paper use the term “fire module” to describe a squeeze layer and an expand layer together. An input image is first sent into a standalone convolutional layer. This layer is followed by eight “fire modules” which are named “fire2-9”.

Below is an image showing how SqueezeNet compares with the original AlexNet:

From left to right: SqueezeNet, SqueezeNet with simple bypass, and SqueezeNet with complex bypass

Following Strategy Two, the filters per fire module are increased with “simple bypass.” Lastly, SqueezeNet performs max-pooling with a stride of 2 after layers conv1, fire4, fire8, and conv10. According to Strategy Three, pooling is given a relatively late placement, resulting in SqueezeNet with a “complex bypass” (the rightmost architecture in the image above)

Below is an image showing how SqueezeNet compares with the original AlexNet.

As we can observe, the weights of the compressed model for AlexNet were 240MB and achieved 80.3% accuracy. Meanwhile, a Deep Compression SqueezeNet consumes 0.47MB of memory and achieves the same performance.

Below are the details of other parameters used in the network:

The ReLU activation is applied between all the squeeze and expand layers inside the fire module.
Dropout layers are added to reduce overfitting, with a probability of 0.5 after the fire9 module.
There are no fully connected layers used in the network. This design choice was inspired by the Network In Netowork (NIN) architecture proposed by (Lin et al, 2013).
SqueezeNet was trained with a learning rate of 0.04, which is linearly decreased throughout the training process.
The batch size for training is 32, and the network used an Adam Optimizer.
SqueezeNet makes the deployment process easier due to its small size. Initially this network was implemented in Caffe, but the model has since gained in popularity and has been adopted to many different platforms.

Below are a few relevant links for implementing SqueezeNet on your own, or further investigating the original implementation:

Link to the Original Implementation of SqueezeNet
Link to the Research Paper
SqueezeNet in Tensorflow
SqueezeNet in PyTorch

Conclusion

The models discussed here—ResNet, Wide ResNet, Inception v3, and SqueezeNet—played a significant role in shaping the field of deep learning as we know it today. Each brought forward innovative ideas that improved both performance and computational efficiency, pushing the boundaries of what neural networks can achieve.

Source: digitalocean.com

Create a Free Account

Try now

Posts you might be interested in:

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

How to Install and Secure GoCD on CentOS 7 with SSL and Firewall

Linux Basics, Tutorial

5 days ago

Installing GoCD on CentOS 7 with Block Storage Configuration GoCD is a freely available automation and continuous delivery platform. It supports designing sophisticated pipelines through both sequential and concurrent task…

Install Leanote on CentOS 7 with SSL, MongoDB & Nginx

Linux Basics, Tutorial

5 days ago

Installing Leanote on CentOS 7 with MongoDB and Let’s Encrypt SSL Leanote is a free, lightweight, and open source note-taking platform built with Golang. Designed with a strong focus on…

Set Up a Secure Git Server with Nginx on Debian 8

Linux Basics, Tutorial

5 days ago

Setting Up a Secure Git Server with Nginx on Debian 8 Git is a widely used version control solution that allows developers to manage and track changes in their source…

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

Models That Shaped Deep Learning: 2015-2016

Prerequisites

ResNet

ResNet Architecture

ResNet Training and Results

Wide ResNet

Wide ResNet Architecture

Wide ResNet Training and Results

Inception v3

Inception v3 Architecture

<img decoding="async" class="aligncenter size-full wp-image-33236" src="https://www.centron.de/wp-content/uploads/2025/01/Screenshot-2025-01-31-102332-e1738315403890.png" alt="" width="900" height="372" />

Inception v3 Training and Results

SqueezeNet

SqueezeNet Architecture and Results