R-CNN (Region with convolution neural network) is a deep learning approach used to detect various objects in an image. It finds its application in autonomous vehicles, smart surveillance systems, and facial expressions.
The models for object detection using R-CNN are based on three processes. The first process is finding the region in the image that may contain an object, that is, the region of the proposal. It then extracts the CNN features from the region and lastly classifies the object using the extracted features.
There are three variants of R-CNN. These are R-CNN, Fast R-CNN, and Faster R-CNN. As the name suggests, the speed of training and detection improves from the first to the last.
In this tutorial, we are going to use the R-CNN. We will learn how to detect stop traffic signs using Deep R-NN in both real-time and offline mode using Matlab. We will use the image labeler application to create our pre-trained image database. Here, transfer learning is used where a pre-trained R-CNN is re-trained on its image dataset.
To follow along with this tutorial, you’ll need:
Models for object detection using R-CNNs are based on the following three processes:
- Finding regions on the image that may contain an object. These regions are called regions proposals.
- Extract CNN features from the region proposals.
- Classify objects using extracted features.
There are three variants of an R-CNN. These are R-CNNs, fast R-CNN, and faster R-CNN.
In this tutorial, we will implement the basic R-CNN.
Scheme of R-CNN
The basic R-CNN detector first generates region proposals using a selective search algorithm such as search boxes. Unfortunately, these regions’ proposals are
2000, making this process slower since it has to locate the region proposals.
Each region is then cropped out from the image, resized, and reshaped to a square. It is then fed to the CNN for classification. Then a SVM trained using CNN features defines the region proposal bounding box.
The proposed scheme in image form is:
Database creation using Matlab’s image labeler
Although we are using a pre-trained R-CNN network for the proposed work, it must be trained on our database for fine-tuning. It means that we must have an image database. It may not necessarily contain images, but a few images will be sufficient to fine-tune the pre-trained R-CNN.
We must define R-CNN’s ROI (region of interest) by bounding boxes in an image. This image database with the ROI can be created using Matlab’s image labeler app.
For the proposed work, we use a total of
60 images. These images are downloaded from the internet randomly. Since the images are of different sizes, we resize them to a common dimension which is
640px. This is done to reduce the training and the testing time.
These images are then imported into the image labeler app for bounding box marking and exporting to Matlab’s workspace or saved in a folder as a
How to use Matlab’s image labeler
- Open the image labeler app.
- A new window opens up, and when you click on the load tab. Define the data source to import your dataset.
- Import all the
60images. Click on the ROI label definition to define your ROI using the bounding boxes.
- A new window that asks you to give the label a name. Note that since we are using a pre-trained network, the name should be a
stopSignsince it is defined in the network.
okand then draw the bounding box around the stop sign image for all the images. It requires a lot of effort if you have thousands of images.
Once you are done labeling the images, click on the export images and specify where you want to export them. For our case, we export to the current directory.
Since we export to the workspace, we use the
labelformat and give the variable name. Also, the variable name should be
stopSignsdue to the pre-trained network.
Once all these are done, we see all the images exported to the workspace.
- This is how you can use the image labeler to prepare your datasets. Then, after exporting the images, you can use them directly from the workspace or save them as a
.matfile in the current directory by clicking on the
Matlab code for training
We first load the layers of the pre-trained R-CNN and the image database. For the pre-trained R-CNN, we use Matlab in-built file
%program to train an RCNN to detect stop signs load('rcnnStopSigns', 'layers') %loading layers of pre-trained RCNN load stopsign.mat; %loading image database for training
.mat file has many arguments, but we only need layers. So we need to display those layers, and we use the code below:
Igraph = layerGraph(layers); %Getting layers Igraph.Layers %Displaying layers.
We then define the training options and train our dataset.
%Define trainning options options = trainingOptions('sgdm', 'MiniBatchSize', 32, 'initialLearnRate', 1e-6, 'MaxEpochs', 10); %Trainning RCNN rcnn = trainRCNNObjectDetector(stopSigns, layers, options, 'NegativeOverlapRange', [0 0.1]);
All the trained layers and correspondings layers will be stored in the
rcnn variable when training is complete. You can save this
rcnn for future testing, but you can still use it at the moment.
Matlab code for testing images (offline mode)
Here, we require the user to be providing the input image for detection. This image is then read before the detection.
%Reading input image [filename, pathname]=uigetfile('*.*', 'Select test image'); filewithpath=strcat(pathname, filename); img= imread(filewithpath);
This test image is stored in the variable
img. The image is then passed to the detect function as an argument to perform detection.
[bbox, score, label] = detect(rcnn, img, 'MiniBatchSize', 32); %Stop sign detection nobox = size(score, 1); %sorting o the basis of scores
The bounding boxes coordinates detected will go to the
bbox variable, all the confine scores will be stored to the
score variable. Then, all the labels are stored in the
Therefore, we need to find the number
f the bounding boxes to know the corresponding number of the detected traffic stop signs. We then combine the
bbox to form a matrix and then sort this matrix so that at the top, we have the entry of having the maximum score.
scorebox=[score, bbox]; scorebox=sortrows(scorebox, 'descend');
Next, insert the annotation to the image.
img = insertObjectAnnotation(img, 'rectangle', scorebox(1,2:end),... strcat('Stop Sign: Conf.Score:', num2str(scorebox(1,1))));
Let’s introduce a loop for the case of an image with more than one stop sign. It means that in that case, we need more than one
annotation and we achieve that using the code below:
for i=2:nobox if score(i)==1 img=insertObjectAnnotation(img, 'rectangle', scorebox(i,2:end),... strcat('StopSign: Conf. Score: ', num2str(score(i,1)))); end end
We then finally show the output:
Matlab code for testing videos (offline mode)
First input the video:
% Reading input video [filename, pathname] = uigetfile('*.*', 'select test video'); filewithpath=strcat(pathname, filename);
Then, define the video reader object that you will use to read the video and initialize the video player.
v = VideoReader(filewithpath); %Declare video object videoplayer=vision.VideoPlayer(); %Initialize video player
while loop to read the video frames one by one. The loop reads the frame, detects the stop sign, and inserts annotation.
runloop= true; %conditions for the while loop while runloop img = readFrame(v); %Reading one frame [bbox, score, ~] = detect(rcnn, img, 'MiniBatchSize', 32); %Detecting Stop sign [score1, idx]=max(score); %getting max. Score bbox1 = bbox(idx, :); %getting Bounding box corresponding to max. Score img = insertObjectAnnotation(img, 'rectangle', bbox1,... strcat('sStop Sign: Conf. Score:', num2str(score1))); %Insertig anotation step(videoplayer, img); %Displaying image as frame in the video player runloop = isOpen(videoplayer); %checking video player is ON or OFF end
step(videoplayer,img) means that all the images with bounding boxes and annotations are given to the video player object for display. You get the output with the stop sign detected if you execute the program.
Matlab code for testing video (realtime)
In this case, we will be using the webcam to capture the live video. Now, this is the difference between the offline and the online case.
The rest of the code is similar for both cases. Also, instead of reading the frames, we take snapshots and read them.
% Reading input video cam = webcam; videoplayer=vision.VideoPlayer(); %Initialize video player runloop= true; %conditions for the while loop while runloop img = snapshot(cam); %Reading one frame [bbox, score, ~] = detect(rcnn, img, 'MiniBatchSize', 32); %Detecting Stop sign [score1, idx]=max(score); %getting max. Score bbox1 = bbox(idx, :); %getting Bounding box corresponding to max. Score img = insertObjectAnnotation(img, 'rectangle', bbox1,... strcat('sStop Sign: Conf. Score:', num2str(score1))); %Insertig anotation step(videoplayer, img); %Displaying image as frame in the video player runloop = isOpen(videoplayer); %checking video player is ON or OFF end clear cam;
When we run the training program, we see all the information about our pre-trained network.
Traffic stop sign detection is a very important capability of Matlab. The implementation of this feature is possible on the vehicles to avoid accidents. Matlab’s ability to handle deep learning is incredible.
As we have seen, the training code is easy, and its implementation is effective. The training algorithm is also very accurate. Matlab has in-built functions that are easy to use and very effective.
Peer Review Contributions by: Miller Juma