CNN architectures
SAVE for my (TiiL) research perpose only, Nội dung sau đây là lưu lại để tác giả tham khảo sau này (Nguồn: https://www.datasciencecentral.com/lenet-5-a-clasic-cnn-architecture/ )
The most popular CNN architectures are:
- LeNet – which was used for recognizing handwritten numbers, not so deep
- AlexNet – deeper network, first that have been used ReLu as an activation function
- VGGNet – it simplified neural network architectures
- GoogLeNet – it uses many different kinds of methods such as 1×1 convolution and global average pooling that enables it to create deeper architecture
- ResNet – residual networks allow training of a very deep neural network
- Inception – trying several ways of pooling and convolutional layers and concatenate outputs
These architectures can be implemented from scratch, but there are some open-source pre-trained models which are using these architectures and that can be actually used for building any computer vision application. This approach is called transfer learning.
Các nội dung chính
BÀI THAM KHẢO 1
What is Lenet5?
Lenet-5 is one of the earliest pre-trained models proposed by Yann LeCun and others in the year 1998, in the research paper Gradient-Based Learning Applied to Document Recognition. They used this architecture for recognizing the handwritten and machine-printed characters.
The main reason behind the popularity of this model was its simple and straightforward architecture. It is a multi-layer convolution neural network for image classification.
The Architecture of the Model
Let’s understand the architecture of Lenet-5. The network has 5 layers with learnable parameters and hence named Lenet-5. It has three sets of convolution layers with a combination of average pooling. After the convolution and average pooling layers, we have two fully connected layers. At last, a Softmax classifier which classifies the images into respective class.
The input to this model is a 32 X 32 grayscale image hence the number of channels is one.
We then apply the first convolution operation with the filter size 5X5 and we have 6 such filters. As a result, we get a feature map of size 28X28X6. Here the number of channels is equal to the number of filters applied.
After the first pooling operation, we apply the average pooling and the size of the feature map is reduced by half. Note that, the number of channels is intact.
After the first pooling operation, we apply the average pooling and the size of the feature map is reduced by half. Note that, the number of channels is intact.
hen we have a final convolution layer of size 5X5 with 120 filters. As shown in the above image. Leaving the feature map size 1X1X120. After which flatten result is 120 values.
After these convolution layers, we have a fully connected layer with eighty-four neurons. At last, we have an output layer with ten neurons since the data have ten classes.
Here is the final architecture of the Lenet-5 model.
Architecture Details
Let’s understand the architecture in more detail.
The first layer is the input layer with feature map size 32X32X1.
Then we have the first convolution layer with 6 filters of size 5X5 and stride is 1. The activation function used at his layer is tanh. The output feature map is 28X28X6.
Next, we have an average pooling layer with filter size 2X2 and stride 1. The resulting feature map is 14X14X6. Since the pooling layer doesn’t affect the number of channels.
After this comes the second convolution layer with 16 filters of 5X5 and stride 1. Also, the activation function is tanh. Now the output size is 10X10X16.
Again comes the other average pooling layer of 2X2 with stride 2. As a result, the size of the feature map reduced to 5X5X16.
The final pooling layer has 120 filters of 5X5 with stride 1 and activation function tanh. Now the output size is 120.
The next is a fully connected layer with 84 neurons that result in the output to 84 values and the activation function used here is again tanh.
The last layer is the output layer with 10 neurons and Softmax function. The Softmax gives the probability that a data point belongs to a particular class. The highest value is then predicted.
This is the entire architecture of the Lenet-5 model. The number of trainable parameters of this architecture is around sixty thousand.
End Notes
This was all about Lenet-5 architecture. Finally, to summarize The network has
- 5 layers with learnable parameters.
- The input to the model is a grayscale image.
- It has 3 convolution layers, two average pooling layers, and two fully connected layers with a softmax classifier.
- The number of trainable parameters is 60000.
Bài tham khảo 2:
LeNet-5 - A Classic CNN Architecture
Yann LeCun, Leon Bottou, Yosuha Bengio and Patrick Haffner proposed a neural network architecture for handwritten and machine-printed character recognition in 1990’s which they called LeNet-5. The architecture is straightforward and simple to understand that’s why it is mostly used as a first step for teaching Convolutional Neural Network.
The LeNet-5 architecture consists of two sets of convolutional and average pooling layers, followed by a flattening convolutional layer, then two fully-connected layers and finally a softmax classifier.
First Layer:
The input for LeNet-5 is a 32×32 grayscale image which passes through the first convolutional layer with 6 feature maps or filters having size 5×5 and a stride of one. The image dimensions changes from 32x32x1 to 28x28x6.
Second Layer:
Then the LeNet-5 applies average pooling layer or sub-sampling layer with a filter size 2×2 and a stride of two. The resulting image dimensions will be reduced to 14x14x6.
Third Layer:
Next, there is a second convolutional layer with 16 feature maps having size 5×5 and a stride of 1. In this layer, only 10 out of 16 feature maps are connected to 6 feature maps of the previous layer as shown below.
The main reason is to break the symmetry in the network and keeps the number of connections within reasonable bounds. That’s why the number of training parameters in this layers are 1516 instead of 2400 and similarly, the number of connections are 151600 instead of 240000.
Fourth Layer:
The fourth layer (S4) is again an average pooling layer with filter size 2×2 and a stride of 2. This layer is the same as the second layer (S2) except it has 16 feature maps so the output will be reduced to 5x5x16.
Fifth Layer:
The fifth layer (C5) is a fully connected convolutional layer with 120 feature maps each of size 1×1. Each of the 120 units in C5 is connected to all the 400 nodes (5x5x16) in the fourth layer S4.
Sixth Layer:
The sixth layer is a fully connected layer (F6) with 84 units.
Output Layer:
Finally, there is a fully connected softmax output layer ŷ with 10 possible values corresponding to the digits from 0 to 9.
Summary of LeNet-5 Architecture
Implementation of LeNet-5 Using Keras [Code ]
Download Data Set & Normalize
We will download the MNIST dataset under the Keras API and normalize it as we did in the earlier post.
…code
Define LeNet-5 Model
Create a new instance of a model object using sequential model API. Then add layers to the neural network as per LeNet-5 architecture discussed earlier. Finally, compile the model with the ‘categorical_crossentropy’ loss function and ‘SGD’ cost optimization algorithm. When compiling the model, add metrics=[‘accuracy’] as one of the parameters to calculate the accuracy of the model.
It is important to highlight that each image in the MNIST data set has a size of 28 X 28 pixels so we will use the same dimensions for LeNet-5 input instead of 32 X 32 pixels.
… code
We can train the model by calling model.fit function and pass in the training data, the expected output, number of epochs, and batch size. Additionally, Keras provides a facility to evaluate the loss and accuracy at the end of each epoch. For the purpose, we can split the training data using ‘validation_split’ argument or use another dataset using ‘validation_data’ argument. We will use our training dataset to evaluate the loss and accuracy after every epoch.
.. code
Evaluate the Model
We can test the model by calling model.evaluate and passing in the testing data set and the expected output
.. code
Visualize the Training Process
We will visualize the training process by plotting the training accuracy and loss after each epoch.
..code