CALIFORNIA STATE UNIVERSITY SAN MARCOS PROJECT …

CALIFORNIA STATE UNIVERSITY SAN MARCOS

PROJECT SIGNATURE PAGE

PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE

MASTER OF SCIENCE

IN

COMPUTER SCIENCE

PROJECT TITLE: Face Mask Detection using YOLOv5 for COVID-19

AUTHOR: Vinay Sharma

DATE OF SUCCESSFUL DEFENSE: 11/24/2020

THE PROJECT HAS BEEN ACCEPTED BY THE PROJECT COMMITTEE IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE.

Nov 26, 2020 Dr. Xin Ye PROJECT COMMITTEE CHAIR SIGNATURE DATE

Dr. Ahmad R Hadaegh Nov 25, 2020 PROJECT COMMITTEE MEMBER SIGNATURE DATE

Name of Committee Member PROJECT COMMITTEE MEMBER SIGNATURE DATE

1

Face Mask Detection using YOLOv5 for COVID-19

In affiliation, with

California State University, San Marcos

In partial fulfillment of the Requirements for the Degree of

Master of Computer Science

By

Vinay Sharma

November 24, 2020

2

ACKNOWLEDGEMENT

I express my deep sense of gratitude to my advisor Dr. Xin Ye and my committee member and program coordinator Dr. Ahmad Hadaegh for their continue support towards my project. Their guidance and motivation helped me throughout my project and leads me with a successfully implemented project as I planned. I thank both of my professor for being with me and help me in my journey of Master of Computer Science at California State University – San Marcos.

3

Table of Contents

ACKNOWLEDGEMENT .......................................................................................................... 2

LIST OF FIGUERS ..................................................................................................................... 4

ABSTRACT ................................................................................................................................ 5

1. INTRODUCTION ................................................................................................................... 5

1.1 SCOPE AND OBJECTIVES............................................................................................. 5

1.2 INTRODUCTION TO SYSTEM ...................................................................................... 5

2. LITERATURE SURVEY ....................................................................................................... 7

2.1 Image processing ............................................................................................................... 7

2.2 TensorFlow ........................................................................................................................ 9

2.3 Object Detection .............................................................................................................. 11

2.4 Face Mask Recognition ................................................................................................... 13

3. METHODOLOGY AND RESULTS .................................................................................... 14

3.1 Libraries used .................................................................................................................. 14

3.2 Data collection ................................................................................................................. 15

3.3 Model Development ........................................................................................................ 16

3.5 Model Training ................................................................................................................ 17

3.6 Testing the Model ............................................................................................................ 18

3.7 Result and Analysis ......................................................................................................... 18

3.8 Recommendations ........................................................................................................... 20

4. CONCLUSION AND FUTURE ENHANCEMENTS ......................................................... 21

REFERENCE ............................................................................................................................ 22

4

LIST OF FIGUERS

Figure 1: TensorFlow sample graph ............................................ ..................................................17 Figure 2: Deducted Results ........................................................................................................... 18 Figure 3:Precision of the developed model .................................................................................. 19 Figure 4:Recall graph of the developed model ............................................................................. 19 Figure 5: mAp graph of the developed model .............................................................................. 20

5

ABSTRACT

COVID-19 is a big threat to human mankind. The whole world is now struggling to reduce the spread of COVID-19 virus. Wearing masks is a good practice that helps to control the COVID-19 effectively. From the results of China and South Korea that is clear wearing, facemask reduces the virus spread. Now they backed to normal life. But ensuring all peoples wearing facemask is not an easy thing. This paper attempts to develop a simple and effective model for real-time monitoring. The proposed model successfully recognize if an individual is wearing a face mask or not.

1. INTRODUCTION

1.1 SCOPE AND OBJECTIVES

The main aim of this project is to provide service to patient in terms of saving their time and find the nearest hospital. The patient will get more time to know about their diseases by submitting the symptoms they are facing. And based on the disease they will be treated in the hospitals nearby.

1.2 INTRODUCTION TO SYSTEM

The coronavirus pandemic is responsible for producing an atmosphere of terror as this disease can transmit through the respiratory system. This virus has killed more than a million people around the globe, and it is expected to kill close to 400,000 people by February 1st, 2021 is US. Currently, there is not any specific single medicine or vaccine in hand to fight against this virus. Therefore, the only option left is to take the utmost care from our side to stay away from the disease. For example, you can maintain the social distancing, wash your hand regularly, and wear a mask. To take part in the protection against the pandemic, my aim is to design a Face Mask Detection program using the famous Deep Learning technique. This technique is useful to find out who is not wearing the facial mask and not deploying the trained model. The WHO report points out that there are two ways of coronavirus spread i.e. the respiratory droplets and any type of physical contact.

The droplets are produced through the respiratory system in case an infected person is coughing or sneezing. If a human is present closer than 4 feet, there are high chances he can inhale these infection-causing droplets. These droplets can stick on those surfaces where the virus can live for days. This way the infected person’s surroundings can become a big reason for virus spread. To prevent the virus from the spread, medical masks are the best bet. In the research, medical masks mean surgical as well as the procedure masks and maybe look like a cup shape or folded. These masks can be attached to the head with cords. They are examined well to control the filtration,

6

easy breathing, and some time for water resistivity. The research examines the collection of video as well as the images to find out those persons who are wearing those medical masks that are according to the govt. guidelines. This way, it can greatly assist the govt. to do action against those people who are not wearing the right type of masks.

Using the mask in public has been a normal thing in countries like China and other Asian countries with the very start of the pandemic. Currently, the USA is under the grip and severe pandemic outbreak and cases are increasing day by day along with the confirmed deaths. The CDC (Centers for Disease Control and Prevention) has cautioned the people too must wear protective equipment like the masks. The studies have revealed that many people, particularly the young ones who have the virus but without any symptoms and can spread the virus to many other persons unknowingly, the same is true for those people who eventually develop the symptoms, but spread the disease before being tested positive. Seeing this, CDC has issue advisory to wear masks in a public gathering where the social distancing is impossible to spread the community-based virus spread. This advisory of the CDC is backed with various studies including the one published in the New England Journal of Medicine which show.

Wearing a mask while going outdoors during pandemic has been a great helping hand in controlling the spread of coronavirus. It is a symbol of being a sensible citizen of a country. Several countries like China and Korea successfully controlled the Covid-19 in a short time due to the habit of using masks regularly. It was recommended to use masks, no matter what type of mask is available to you, just use it and become safe. The mask acts as a physical barrio to prevent the entry of the virus. Many individuals unknowingly infected many other people with the Covid-19. Mask is necessary because of two benefits. It does not let the virus enter your mouth or nose directly from the infected person’s sneeze or cough. The irresponsibility of a few people has to lead to the death of many others due to the spread of the virus. Secondly, if you touch a virus-contaminated surface and then your mouth or nose, the mask will stop the transmission of the virus. Many governments made it compulsory to use masks and people were compelled and monitor to act upon the order. The paper understudy will make deep learning about using masks to prevent the spread of deadly coronavirus. It will also learn the facemask detection by deep learning strategy. This is a useful technique of learning being used these days.

Deep learning is a subfield of machine learning. We study a hierarchy of features and functions of the subject under study with the help of input data. The researchers are found to use this technique intensively for their research work. They used it in their research related to image classification, speech recognition, signal processing, and natural language processing. Deep learning is similar to machine learning. It builds a hierarchy of features from top to bottom. The systematic and arranged features are easy to understand and explain to others as well. This method of learning can learn the features at any level stigmatically. The help of human-made techniques is not required. The best thing about deep learning is that all the models have deep architectures. Being the best opposite of shallow architecture which has few hidden layers, deep architecture has several layers. The Regression, Classification, dimensionality reduction,

7

modelling motion, modelling textures, information retrieval, natural language processing, robotics, fault diagnosis, and road crack detection are some important fields that are found deriving benefit from deep architecture techniques.

2. LITERATURE SURVEY

2.1 Image processing

The traffic lights and street signs are also discussed by the author. These lights and signs have a different appearance in different situations. The impression of street signs is affected by the halfway-impediments, climate, and changes to brightness. The variety of signboards with the best possible appearance should be provided. Many image processing algorithms and neural networks are accompanied together to increase human efficiency while performing different tasks. Machine learning systems cannot manage to deal with the images of different types and sizes and point extent from the dataset. To scale the images to a settled size is the original standard approach. The deep learning technique can be much helpful in this concern.

This can resolve issues when the viewpoint measure is particularly between the first and the objective sizes. Moreover, it discards information in larger pictures or presents artefacts allowing amazingly small images intentionally enhanced. Individuals are decidedly ready to see evidence signs in different sizes, paying limited attention to the likelihood seen from sharp edges. In particular, they are now going after a benchmark dataset for the exposure of development signs in full camera pictures.

The author presented an encoded video grouping before the decoding. The methodology damaged the information contained in the DCT coefficients of MPEG or JPEG encoded video groupings. The system has been attempted viably on various video groupings, joint meetings, presentations, one on one session, and others.

The author has introduced a paper on picture recovery utilizing Meta highlight design. Since the range of picture collections increases rapidly, for example, individual or probably group photos, remedial pictures, etc., productive organization of these image aggregations has transformed into a primary exploration issue in picture revival. Explicitly picture revival techniques have been successfully made with a particular ultimate objective to fulfil mechanical solicitation, for example, to deal with large scale picture collections. What's more, a powerful picture recuperation system is gifted of effectively requesting picture information bases to recoup pictures with high or appealing precision just as a survey. By the day's end, given an inquiry, the

8

purpose of picture revival structure is to recoup numerous similar (or appropriate) pictures that are permitted.

The author proposed a procedure to develop a non-prominent method for the evident verification using appealing overpowering pictures. The system incorporates pre-handling, picture division, highlight extraction, separation, and constructor clarifiers on precision, proficiency, and sneaked past time. The outcomes uncovered by the framework are strong and exact in, spent less time in perversion.

Author inspected a multi-strategic, administrator, and AI system that uses data about pictures dealing with treads and their necessities to assemble executable picture planning in contents to support strange state science requests. This article depicts an overall AI composing approach to manage computerization and use the best method to manage a specific locale of picture planning for planetary science applications, radiometric remedy, and shading triplets.

The author examined how pictures rely upon ordinary pictures' collection and successfully works for advanced achievements using significant learning frameworks. Then again, they examine how picture request systems of distortions in the present times are generally left hidden because considering that poor quality pictures contain very fragile information about the items and the grouped arranging launches. They moreover made use of GPU to stimulate both pictures planning to help acknowledgement of disfigurements and AI. They proposed the mix of significant neural frameworks with unpredictable woodlands classifiers for picture portrayal of film distortions, which performed better than using both of the two frameworks alone.

They used unbalanced woodland for classifier instead of using neural systems. This provided more precision of at least 97.1%. Overall achieved accuracy was larger as compared to other classifiers.

For various types of deformed pictures, the same blending technique can be applied. Due to the different properties of a picture, it wasn’t easy to position some deformity pictures in a definite arrangement.

They tried to increase the overall arrangement precision by applying different methods. Following three main new ideas were applied to reach greater accuracy.

• Extension of Picture Information • Modification of Neural system design and • Change in Layer parameters

The author suggested a prototype i.e. a model representing a connection among modality values for two images. He gave an analysis based on a combined co-sparsity system. The overall

9

functionality of the co-sparsity arrangement was decreased to have attached analysis operatives. This was achieved with the help of a more complex method via conjugate inclination technique.

The main feature of the proposed model was observed under two different submissions. In the first submission, the main goal was to balance the inverse issues of an image. However, in the second application, it was used to solve the problem of bi-modal image processing and registering. This was done by using a different algorithm. Algorism consisted of bi-modal operators’ pairs for the sake of registering intensity, penetrating capacity, and NIR images.

The author has proposed a system based on smart machine learning. According to this proposal, object learning can be done without any dependency on the actual environment. At the start, this work was inspired by the early stages of human graphic outlines. It includes an algorithm that was designed mainly for the easy and prominent recognition of objects. Due to that, they had benefits with photometric invariants.

This specifically designed procedure has relatively low glitches. So, it can be handled more efficiently in real-time. Today, our machine vision structures are using the same algorithm.

In overall design, the salient object recognition was used to create the second part of the total framework. This second part is the machine learning-based recognition and detection unit.

2.2 TensorFlow

TensorFlow is a great resource for machine learning. It is based on an opensource library platform and has all in one plan offering machine learning, models for training and efficient algorithms that aim to assist following;

• Google brain TensorFlow • Processing data • Models testing • Getting precise results • Refining the outcomes

This platform is designed to help developers in every way possible. It can generate a graph consisting of nodes. Each node in the graph is the representation of a mathematical function and every connection signifies data. Now, developers don’t have to worry about the minor details and glitches. Instead, they can remain focus on the overall logic and functionality of the full application.

10

We can say that this library is the most popular among software libraries. The research team at google specialized for AI in deep leaning, named as Google Brain, established this platform in 2015. It was developed using python for front-end to run in an enhanced C++ structure. Their main purpose was to use TensorFlow for internal usage of google itself.

TensorFlow is playing major roles in different categories including various applications that are text-based, image recognition technology, image captioning software, and many others. It’s because it is based on end-to-end open-source different libraries that provide deep machine learning. This platform also has many community resources, useful tools, and comprehensive deploying techniques.

If you know Apple’s Siri, TensorFlow is the main source behind its voice reignition. That is its only example. There are millions of applications. Think about the apps that you installed from google. Each one is made using TensorFlow. What is A tensor? Consider tensor as a matrix but more like a matrix having n-dimensions. Each tensor consists of values representing data types. These values have the same data types and display a certain shape. Through this shape, we get to know the overall dimensionality of our vector.

Each tensor has the following three main features

1. A unique name referred to a label 2. A specific dimension or a shape 3. The data type

A tensor with only one dimension is called a vector. Similarly, tensor with two dimensions is called a dimensional tensor. If a tensor has zero dimension, it is referred to as scalar-tensor. Nodes are the ones who execute the computing of numerical values. Tensor’s edge is responsible for the relation of one input and output to another. So, we know that tensors are the vectors having n-dimensionality and are responsible for taking input. This input then undergoes through various computing operations and based on that gives an output. Sometimes shapes of tensors are unknown and graphs can be built to have all those operations for an output.

As mentioned earlier, tensor flow is a big open library source mainly responsible for machine learning. It majorly uses python for all the front-end frameworks and then runs those frameworks built applications in optimized C++.

TensorFlow is not limited and expands its application to a huge network of technologies including embedding words, various repetitive neural systems, machine translations, and faster language processing. The most wonderful feature of TensorFlow is that it offers support for production estimates at measure while using the same training models. It is proving a big

11

opportunity for developers as they can develop different efficient data flows, visual graphs representing how data will process via a graph and sequence of computing nodes.

Python made it easy for developers to learn by using python. As python is the top one and most easy programming language. So, it is comparatively easy to understand how different complex concepts can be put together. The tensors and the nodes are referred to as objects in Python. So, every TensorFlow application is a python-based application.

Though, computing of numerical values isn’t based on python. These mathematics values are represented in high-level C++ binaries. So, In short, all the transformation is done in C++ while python keeps every function connected and is responsible for hooking complex notions together.

Now, applications build via TensorFlow can perform on any GPU, compatible machine, CPU, an android device, and on iOS. The application can also run on any cloud system. If you are running an application on googles cloud, TPU will accelerate the whole functionality. Final prototypes can be deployed on any device as mentioned above.

TensorFlow version 2.0 came out in October 2019. This version has more easy frameworks and was designed keeping in mind the user’s reviews. This version also supported TensorFlow Lite, which provided more opportunities to perform on various platforms.

Though, if you wrote code in the previous version, you’d have to rewrite all the code to male use of 2.0 TensorFlow. Sometimes, it would require you to change the code only slightly and sometimes the complete. Abstraction for the development of machine learning is the super advantage of using TensorFlow. It has become easy for the developer to mainly focus on the main logic of the application and not worry about the minor details of output-input information handling. As TensorFlow, take responsibility for all these minor actions behind the scenes. It means it is still the same in the core.

Developers can also build a single independent graph for each operation and can modify them separately. They don’t have to put all data in one graph and process them all together. The visualization suite of the Tensor Board provides an efficient and interactive dashboard. This dashboard is web-based and lets the developers inspect the graphs. TensorFlow also backup a marketable outfit of A-list on google. Google made a huge development and also created viable offers around the platform of TensorFlow, which made deployment easy. For example, TPU silicon for faster speed while using googles cloud.

2.3 Object Detection

A process that consists of detecting or recognizing instances of objects belonging to the various class in a video or an image is known as object detection. A class can be any type including humans, animals, etc. Frameworks based on object detection includes the development of

12

different windows. These candidate windows are classified depending on Convolutional neural network (CNN) features. Take an example. Consider a method to employ careful search to originate object proposals. This method produces CNN feature for each object proposal. Now, it feeds the CNN feature to classifier SVM.

Though, there is still another huge number of processes that are working to improve CNN's featured region's performance. Some of those methods are successful enough to reach maximum accuracy. However, they are still not unable to detect the most accurate position in object detection. These methods mostly follow a technique of attached object detection i.e. a segmentation style approach. Most works on object detection for deep learning have different CNN variations.

Other deep models have been minimally used for object detection. For instance, the locating technique for a coarse object which utilizes saliency mechanisms and a Deep Belief Network (DBN) to identify objects in remote sensing images; introduces a recently developed DBN to aid in recognition of 3D objects, whereby, the model, which is top-notch is a third-order Boltzmann machine which is a hybrid algorithm trained. This algorithm integrates the discriminative and generative gradients; utilizes a fused method of deep learning, as it examines a deep model’s representation abilities in a semi-supervised model. Lastly, employs layer piled autoencoders to detect several organs in images from the medical field. At the same time, it uses saliency-guided layer piled autoencoders to detect video-based objects.

One of the most popular computer-based vision applications that have spiked up an interest in business today, is face recognition. Several face recognition programs that use manually developed features have been brought forward; in these instances, the feature extractor draws out the features from a well-positioned face and it, develops an image that is of a lower dimension and the classifier generates predictions. A significant change in facial recognition has been developed by CNNs based on feature learning as well as transformation invariance components. Recently, VGG (Very Deep Convolutional Networks for Large-Scale Image Recognition) Face Descriptor and light CNNs are the most advanced and have the most recognition. A Convolutional DBN performed significantly in face verification, in this study.

CNN's have been used by both Facebook’s DeepFace [29] and Google’s FaceNet [28]. DeepFace designs faces in 3D that position the face appearing as a frontal face. The face is then presented to a filter that is of a single convolution-pooling-convolution kind, then follows three locally linked layers and two completely linked layers that are utilized to come up with the last projection. Despite DeepFace having great functionality recognition, interpretation of the representations is difficult. This is due to no clustering of faces belonging to the same person at the stage of training. In contrast, FaceNet enables clustering of representations from one person in the training process due to a triplet loss function upon the image. Additionally, CNN's are at

13

the heart of OpenFace. OpenFace is a face recognition instrument that is of an open-source, that is of better accuracy, (although the accuracy is a bit lower), it is open-source and sufficient for mobile computing, having a speedy performance time and being of a lower size.

2.4 Face Mask Recognition

Taking maximum advantage of our webcam, the writer used OpenCV (Open Source Computer Vision Library) to perform face detection in real-time from a live stream. It is common knowledge that videos are comprised of frames that are still images. From a video, face detection was carried out in every frame. There is no significant difference between face detections in still images and video streams of real-time. For face detection, we will employ the YOLOv5 algorithm. This is an essential machine learning algorithm for the detection of objects. It is purposed to distinguish objects from a video or image. Having a trained model in our hands, it is now possible to enhance the first section’s code, for it to detect faces and identify if a subject is wearing a mask or not.

The Mask detector model requires images of faces to work. Here we will identify the frames with faces using the various ways illustrated in the first segment, then pass them to the model after preprocessing them. But first, let us get the libraries necessary.

Because the faces variable has the height and width of the rectangle, the top-left corner coordinates enclosing the faces, that can be utilized to produce a face frame then preprocessing it so it can be inserted into the model to predict it. The procedure for preprocessing is the same as that in the second segment used when training the model. What follows is drawing a rectangle on top of the face and adding a label as per the predictions. This now concludes our paper. We have learned how to develop a model that can detect masked faces as well as how to identify faces in real-time. By this model, we can adjust the face detector to mask detector.

In object detection, face detection is a crucial task. Detection is the first part of identity authentication and pattern recognition. Deep learning-based algorithms for object detection have evolved in the last years at a high rate. The algorithms can be subdivided into two general parts. These are one-stage detectors such as YOLO and two-stage detector such as Faster R-CNN (Faster Region Based Convolutional Neural Networks). Though YOLO and its varieties aren’t as good in terms of accuracy as the two-stage detectors, they outmarch their correspondents in speed by a large margin. When facing standard sized objects, YOLO does well but it is unable to detect small objects. When dealing with objects with faces that seem to have large scale changing properties, the accuracy reduces significantly. We propose a face detector called YOLO-face. It is based on ultralytics open-source object-detection method YOLOv5. It can deal with the difficulty of detection of varying face scales, hence improving the performance of face

14

detection. The current technique involves the use of anchor boxes that are more suitable for face detection and a more accurate regression loss function. The enhanced detector improved accuracy significantly while maintaining a fast detection speed.

3. METHODOLOGY AND RESULTS

This section of the paper gives a brief overview of the methodology used in this paper. Here the developed methodology includes four major stages. The first stage is data collection, the second stage is model development, the third stage is model training, and the last stage is testing the developed model. Along with the methodology, a brief description of the different libraries and tools used in this project are explained. Among the used tools and technologies most important processes are discussed in this section. And they are NumPy, TensorFlow, and Open CV.

3.1 Libraries used

3.1.1 NumPy

NumPy consists of matrix and multi-dimensional array data formats. Mathematical operations on arrays like statistical, algebraic and trigonometric routines can be performed by the help of NumPy. It offers a highly-functioning multidimensional array as well as the essential instruments for computing with and controlling the arrays. SciPy is developed on this. It offers greater performance which functions on NumPy arrays and is essential for various engineering and scientific applications. In the stages of pre-processing, an image is enhanced to 224×224 pixels. It is then changed into a NumPy array style. Afterwards, precise labels are included in the dataset images.

3.1.2 TensorFlow

The TensorFlow is used to create a fast numerical calculation and released by Google. This is the latest python library. Create deep learning models directly, use this foundation library. Also, the wrapper libraries which help to create top of TensorFlow. This is one type of math library and machine learning applications like neural networks.

15

There are different types of deep learning models available and install TensorFlow using pip command. TensorFlow helps in Data augmentation before it begins model training. It is also used to make algorithms prediction efficiency perfect, then download pre-trained image net weights.

Using web camera in the PC, this TensorFlow identify easily whether wearing a mask or not. It is also applicable in a mobile phone camera to identify that.

TensorFlow advantages

• Data augmentation. • Load the classifier. • Make a completely new connected head • Pre-processing • Load all the image data

3.1.3 OpenCV

OpenCV also known as Open Source Computer Vision library is used in the computer vision and deep learning programs. It provides us different libraries which can be used in the object detection, face mask detection using computer vision and deep learning algorithms. This library is used in the detection process in this model.

3.2 Data collection

Data collection is the first step in the face mask identification model development project. Accuracy of the training data impacts on the final overall accuracy of the model. In our case, we have to train the model to find if the person is wearing facemask or not. So, we have downloaded a large volume of images of peoples who wear facemask and people who don't wear the facemasks and one who did not properly wear the facemask. System should be able to classify whether a person is wearing a facemask or not.

16

We can't directly feed the training images. For that, we need to label the images initially. It is one of the important process involved in data collection. In this project, we have used a tool called Labelling. This tool allows the users to create the labels on the images and saves the data for the training process. We can save the labelled images in terms of the XML format by using this tool.

3.3 Model Development

TensorFlow is a well-known and commonly used open-source library developed by the Google rank brain team. It is one of the best image processing library. This tool is used in this project for developing the model. In our tasks, this algorithm makes the entire process simpler and easier to implement.

17

Figure 1 TensorFlow sample graph

The scalability of this tool is the major reason behind the selection of this tool for data processing. The model creation process started with the installation of TensorFlow in python. Here the TensorFlow python API has been used. Also, additional libraries are installed in the system. Data flow graphs are the important elements of the TensorFlow. Here the data flow is represented by the graphs. In the graph, each node represents an instance of mathematical functions. Each edge is represented as a tensor. Generally, tensor is a multidimensional dataset. Here all the operations are performed on the tensor. In this project, TensorFlow is used for object detection process.

YOLO (You Only Look Once) is a family of models that are used for real-time object detection application. In our application, this model is used to identify the mask from the video. In this project, YOLOv5, as well as YOLOv5x, are used to compare the performance.

3.5 Model Training

After setup the model the next step is to train the model. The training process is a time and resource-consuming process in deep learning. Because the overall result of the process mainly depends on the quality of the training process. In this project, we already developed the training dataset which contains the information like persons who wear the facemasks correctly and persons who don’t wear the facemask correctly as well as the persons who wear the facemask partially. All the data are labelled with the help of another software tool.

18

3.6 Testing the Model

Testing the developed model is the final part of the process. In this stage, the developed model has been tested using the test dataset. The system processes the data test dataset similar to the training dataset. And then the system calculates the coefficient value and compare this value with the trained value. Based on that, the system classifies the object (facemask). Here we have trained the model for finding the person with and without a facemask. For this process, Open CV is used. This tool allows the model to load the images for the testing process.

3.7 Result and Analysis

In this section of this paper, the results of the developed model will be discussed. The developed model works fairly on the artificially developed test data. The model accurately classified the persons who wear a mask as well as persons who are not wearing masks. Look at the below-given figure. In the below given there are three persons are there. Among them, two are not worn facemasks. One lady wore the facemask. The developed model has been classified people who worn mask and without mask accurately.

Figure 2 Deducted Results – without_mask & with_mask

19

Here look at the statistics of the results. First, considering the precision of the developed model. Precision is a ratio between the number of positive results and the number of positive results predicted by the classifier. From the precision graph it can be said that model is performing well as the precision value increase overtime in the training process.

Figure 3 Precision of the developed model

Figure 4 Recall graph of the developed model

20

The above-given graph shows the recall value of the developed model. Here the recall is the ratio between the correct positive results and all samples founded to be positive.

Figure 5 mAP(Mean Average Precision) graph for the developed model

The model has been tested using YOLOv5s as well as YOLOv5x. Training process has been completed by running the model through Google Colab (Virtual machine developed by Google which provides high RAM and GPU memory for executing machine learning programs). Using Google Colab was a necessity for me as my personal laptop is very old and does not have enough RAM and GPU memory to execute this model. YOLOv5s is quite better than YOLOv5x in terms of performance and speed. The mAP is quite similar in both techniques. If the processing speed is considered the YOLOv5s is a bit superior to YOLOv5x.

3.8 Recommendations

This section of the report discusses the recommendations for improving the understanding of the deep learning projects. I have developed these recommendations purely based on the knowledge gathered during this project. In deep learning project, the input only matters. Higher the amount of training data higher the accuracy of the results. Providing all the different possible variations in the training dataset provides better trained model. Selection of algorithm also plays a crucial role in the accuracy and time required for processing.

21

4. CONCLUSION AND FUTURE ENHANCEMENTS

I have developed the system which can monitor the area through the real-time camera, without any additional devices. The proposed system is a simple real-time video analyzer. It has the potential to check whether the people wear masks or not. It can be installed in any supermarkets and public places. This helps us to defeat the widespread of COVID-19 virus. Because wearing masks reduces the community spread of COVID-19 virus. We can use this for many other options like checking and verifying all the customers have wearing facemask. The system thoroughly checks the persons who enter through the main gate. We can process the video recorded and find whether the person is wearing a facemask or not. If the person wears his/her facemask, the door will open; otherwise it may say some error command like "please wear your facemask".

The developed model uses the YOLOv5 and TensorFlow technologies for processing the images and real-time videos. From the results it can be said that the developed model is able to detect whether an individual is wearing a facemask or not. The model quickly learns the parameters. It collects the video from the camera, process the video, identifies the objects, and finds is a person wears mask or not.

This system has some limitations. For example, sometimes it detects accurately if a person has worn the mask or not only if the person is directly facing the camera. For example, it is quite useful in supermarkets, and airports. One open problem is to improve this system to detect the faces of the people who are not directly facing the camera.

22

REFERENCE

[1] A. Patel, D. R. Kasat, S. Jain and V. M. Thakare, "Performance Analysis of Various Feature

Detector and Descriptor for Real-Time Video based Face Tracking", International Journal of Computer Applications, vol. 93, no. 1, pp. 37-41, 2014. Available: 10.5120/16183-5415.

[2] A. Mikolajczyk and M. Grochowski, "Data augmentation for improving deep learning in image classification problem", 2018 International Interdisciplinary PhD Workshop (IIPhDW), 2018. Available: 10.1109/iiphdw.2018.8388338 [Accessed 28 August 2020].

[3] A. Seghouane, N. Shokouhi and I. Koch, "Sparse Principal Component Analysis With Preserved Sparsity Pattern", IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3274-3285, 2019. Available: 10.1109/tip.2019.2895464.

[4] B. Gupta, A. Chaube, A. Negi and U. Goel, "Study on Object Detection using Open CV - Python", International Journal of Computer Applications, vol. 162, no. 8, pp. 17-21, 2017. Available: 10.5120/ijca2017913391.

[5] C. Gershenson and D. Rosenblueth, "Self-organizing traffic lights at multiple-street intersections", Complexity, vol. 17, no. 4, pp. 23-39, 2011. Available: 10.1002/cplx.20392.

[6] C. Popa, "Extended and constrained diagonal weighting algorithm with application to inverse problems in image reconstruction", Inverse Problems, vol. 26, no. 6, p. 065004, 2010. Available: 10.1088/0266-5611/26/6/065004.

[7] "Face Mask Detection", Kaggle.com, 2020. [Online]. Available: https://www.kaggle.com/andrewmvd/face-mask-detection. [Accessed: 05- Sep- 2020].

[8] G. Mangmang, "Face Mask Usage Detection Using Inception Network", Journal of Advanced Research in Dynamical and Control Systems, vol. 12, no. 7, pp. 1660-1667, 2020. Available: 10.5373/jardcs/v12sp7/20202272.

[9] I. Riadi and A. Wirawan, "Network Packet Classification using Neural Network based on Training Function and Hidden Layer Neuron Number Variation", International Journal of Advanced Computer Science and Applications, vol. 8, no. 6, 2017. Available: 10.14569/ijacsa.2017.080631.

[10] J. Bamber and T. Christmas, "Covid-19: Each discarded face mask is a potential biohazard", BMJ, p. m2012, 2020. Available: 10.1136/bmj.m2012.

[11] J. Cao, C. Song, S. Peng, F. Xiao and S. Song, "Improved Traffic Sign Detection and Recognition Algorithm for Intelligent Vehicles", Sensors, vol. 19, no. 18, p. 4021, 2019. Available: 10.3390/s19184021 [Accessed 28 August 2020].

[12] J. Johnson and T. Khoshgoftaar, "Survey on deep learning with class imbalance", Journal of Big Data, vol. 6, no. 1, 2019. Available: 10.1186/s40537-019-0192-5 [Accessed 28 August 2020].

23

[13] J. Zhu, W. Zheng, J. Lai and S. Li, "Matching NIR Face to VIS Face Using Transduction", IEEE Transactions on Information Forensics and Security, vol. 9, no. 3, pp. 501-514, 2014. Available: 10.1109/tifs.2014.2299977 [Accessed 28 August 2020].

[14] L. Rampasek and A. Goldenberg, "TensorFlow: Biology’s Gateway to Deep Learning?", Cell Systems, vol. 2, no. 1, pp. 12-14, 2016. Available: 10.1016/j.cels.2016.01.009.

[15] M. BANSAL, "FACE RECOGNITION IMPLEMENTATION ON RASPBERRYPI USING OPENCV AND PYTHON", INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY, vol. 10, no. 3, 2019. Available: 10.34218/ijcet.10.3.2019.016.

[16] M. Chanu, "A Deep Learning Approach for Object Detection and Instance Segmentation using Mask RCNN", Journal of Advanced Research in Dynamical and Control Systems, vol. 12, no. 3, pp. 95-104, 2020. Available: 10.5373/jardcs/v12sp3/20201242.

[17] M. Inamdar and N. Mehendale, "Real-Time Face Mask Identification Using Facemasknet Deep Learning Network", SSRN Electronic Journal, 2020. Available: 10.2139/ssrn.3663305.

[18] M. Kiechle, T. Habigt, S. Hawe and M. Kleinsteuber, "A Bimodal Co-sparse Analysis Model for Image Processing", International Journal of Computer Vision, vol. 114, no. 2-3, pp. 233-247, 2014. Available: 10.1007/s11263-014-0786-5 [Accessed 28 August 2020].

[19] M. Loey, G. Manogaran, M. Taha and N. Khalifa, "A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic", Measurement, vol. 167, p. 108288, 2020. Available: 10.1016/j.measurement.2020.108288.

[20] Ming-Hsuan Yang, D. Kriegman and N. Ahuja, "Detecting faces in images: a survey", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 1, pp. 34-58, 2002. Available: 10.1109/34.982883 [Accessed 28 August 2020].

[21] P. Jhinkwan, V. Ingale and S. Chaturvedi, "Object Detection Using Convolution Neural Networks", SSRN Electronic Journal, 2019. Available: 10.2139/ssrn.3422311.

[22] R. Cárdenas, C. Beltrán and J. Gutiérrez, "Small Face Detection Using Deep Learning on Surveillance Videos", International Journal of Machine Learning and Computing, vol. 9, no. 2, pp. 189-194, 2019. Available: 10.18178/ijmlc.2019.9.2.785.

[23] "Real-time Object Detection and Recognition Using Deep Learning with YOLO Algorithm for Visually Impaired People", Journal of Xidian University, vol. 14, no. 4, 2020. Available: 10.37896/jxu14.4/261.

[24] S. Khan, A. Akram and N. Usman, "Real Time Automatic Attendance System for Face Recognition Using Face API and OpenCV", Wireless Personal Communications, vol. 113, no. 1, pp. 469-480, 2020. Available: 10.1007/s11277-020-07224-2.

24

[25] S. Khan and Z. Farooqui, "Face Recognition in Cross-spectral Environment using Deep Learning", International Journal of Computer Applications, vol. 177, no. 19, pp. 21-25, 2019. Available: 10.5120/ijca2019919626.

[26] S. Sumit, J. Watada, A. Roy and D. Rambli, "In object detection deep learning methods, YOLO shows supremum to Mask R-CNN", Journal of Physics: Conference Series, vol. 1529, p. 042086, 2020. Available: 10.1088/1742-6596/1529/4/042086.

[27] S. Yadav, "Deep Learning based Safe Social Distancing and Face Mask Detection in Public Areas for COVID-19 Safety Guidelines Adherence", International Journal for Research in Applied Science and Engineering Technology, vol. 8, no. 7, pp. 1368-1375, 2020. Available: 10.22214/ijraset.2020.30560.

[28] Schroff, F., Kalenichenko, D., & Philbin, J. (2015, June 17). FaceNet: A Unified Embedding for Face Recognition and Clustering. Retrieved November 22, 2020, from https://arxiv.org/abs/1503.03832

[29] Taigman, B., Taigman, Y., Ranzato, M., & Wolf, L. (2014, June 24). DeepFace: Closing the Gap to Human-Level Performance in Face Verification. Retrieved November 22, 2020, from Deep Face (https://research.fb.com/publications/deepface-closing-the-gap-to-human-level-performance-in-face-verification/)

[30] V. Dhar, "The Scope and Challenges for Deep Learning", Big Data, vol. 3, no. 3, pp. 127-129, 2015. Available: 10.1089/big.2015.29000.vdb.

[31] V. Gunjan, R. Pathak and O. Singh, "Understanding Image Classification Using TensorFlow Deep Learning - Convolution Neural Network", International Journal of Hyperconnectivity and the Internet of Things, vol. 3, no. 2, pp. 19-37, 2019. Available: 10.4018/ijhiot.2019070103.

[32] V. S.V, M. Katti, A. Khatawkar and P. Kulkarni, "Face Detection and Tracking using OpenCV", The SIJ Transactions on Computer Networks & Communication Engineering, vol. 04, no. 03, pp. 01-06, 2016. Available: 10.9756/sijcnce/v4i3/0103540102.

[33] V. KumarB.V.P, N. S. Murthy Sharma and K. Lal Kishore, "A Technique to Reduce Glitch Power during Physical Design Stage for Low Power and Less IR Drop", International Journal of Computer Applications, vol. 39, no. 18, pp. 62-67, 2012. Available: 10.5120/5086-7450 [Accessed 28 August 2020].

[34] X. Sun, P. Wu and S. Hoi, "Face detection using deep learning: An improved faster RCNN approach", Neurocomputing, vol. 299, pp. 42-50, 2018. Available: 10.1016/j.neucom.2018.03.030.

https://research.fb.com/publications/deepface-closing-the-gap-to-human-level-performance-in-face-verification/

https://research.fb.com/publications/deepface-closing-the-gap-to-human-level-performance-in-face-verification/

Documents

CALIFORNIA STATE UNIVERSITY SAN MARCOS PROJECT …