1 COVID-19 Cough Classiﬁcation using Machine Learning and

1

COVID-19 Cough Classification using MachineLearning and Global Smartphone Recordings

Madhurananda Pahar, Marisa Klopper, Robin Warren, and Thomas Niesler

Abstract—We present a machine learning based COVID-19 cough classifier which can discriminate COVID-19 positive coughs fromboth COVID-19 negative and healthy coughs recorded on a smartphone. This type of screening is non-contact, easy to apply, and canreduce the workload in testing centres as well as limit transmission by recommending early self-isolation to those who have a coughsuggestive of COVID-19. The datasets used in this study include subjects from all six continents and contain both forced and naturalcoughs, indicating that the approach is widely applicable. The publicly available Coswara dataset contains 92 COVID-19 positive and1079 healthy subjects, while the second smaller dataset was collected mostly in South Africa and contains 18 COVID-19 positive and26 COVID-19 negative subjects who have undergone a SARS-CoV laboratory test. Both datasets indicate that COVID-19 positivecoughs are 15%-20% shorter than non-COVID coughs. Dataset skew was addressed by applying the synthetic minority oversamplingtechnique (SMOTE). A leave-p-out cross-validation scheme was used to train and evaluate seven machine learning classifiers: logisticregression (LR), k-nearest neighbour (KNN), support vector machine (SVM), multilayer perceptron (MLP), convolutional neural network(CNN), long short-term memory (LSTM) and a residual-based neural network architecture (Resnet50). Our results show that althoughall classifiers were able to identify COVID-19 coughs, the best performance was exhibited by the Resnet50 classifier, which was bestable to discriminate between the COVID-19 positive and the healthy coughs with an area under the ROC curve (AUC) of 0.98. AnLSTM classifier was best able to discriminate between the COVID-19 positive and COVID-19 negative coughs, with an AUC of 0.94after selecting the best 13 features from a sequential forward selection (SFS). Since this type of cough audio classification iscost-effective and easy to deploy, it is potentially a useful and viable means of non-contact COVID-19 screening.

Index Terms—COVID-19, cough classification, machine learning, logistic regression (LR), k-nearest neighbour (KNN), support vectormachine (SVM), multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM), Resnet50

F

1 INTRODUCTION

COVID19 (COrona VIrus Disease of 2019), caused bythe Severe Acute Respiratory Syndrome Coronavirus

2 (SARS-CoV2) virus, was declared a global pandemic onFebruary 11, 2020 by the World Health Organisation (WHO).It is a new coronavirus but similar to other coronaviruses,including SARS-CoV (severe acute respiratory syndromecoronavirus) and MERS-CoV (Middle East respiratory syn-drome coronavirus) which caused disease outbreaks in 2002and 2012, respectively [11, 22].

The most common symptoms of COVID-19 are fever, fa-tigue and dry coughs [33]. Other symptoms include shortnessof breath, joint pain, muscle pain, gastrointestinal symptomsand loss of smell or taste [44]. At the time of writing, therewere 142.1 million active cases of COVID-19 globally, andthere had been 3 million deaths, with the USA report-ing the highest number of cases (31.7 million) and deaths(567,729) [55]. The scale of the pandemic has caused somehealth systems to be overrun by the need for testing and themanagement of cases.

Several attempts have been made to identify early symp-toms of COVID-19 through the use of artificial intelligence

• Madhurananda Pahar and Thomas Niesler works at Department of Elec-trical and Electronics Engineering, University of Stellenbosch, Stellen-bosch, South Africa - 7600.E-mail: [email protected], [email protected]

• Marisa Klopper and Robin Warren works at SAMRC Centre for Tuber-culosis Research, University of Stellenbosch, Cape Town, South Africa -7505.E-mail: [email protected], [email protected]

applied to images. The 50-layer residual neural network(Resnet50) architecture has been shown to perform betterthan other pre-trained models such as AlexNet, GoogLeNetand VGG16 in these tasks. For example, it has been demon-strated that COVID-19 can be detected from computedtomography (CT) images with an accuracy of 96.23% byusing a Resnet50 architecture [66]. The same architecturewas shown to detect pneumonia due to COVID-19 with anaccuracy of 96.7% [77] and to detect COVID-19 from x-rayimages with an accuracy of 96.30% [88].

Coughing is one of the predominant symptoms ofCOVID-19 [99] and also a symptom of more than 100 otherdiseases, and its effect on the respiratory system is known tovary [1010]. For example, lung diseases can cause the airwayto be either restricted or obstructed and this can influencethe acoustics of the cough [1111]. It has also been postulatedthat the glottis behaves differently under different patho-logical conditions [1212, 1313] and that this makes it possible todistinguish between coughs due to TB [1414, 1515], asthma [1616],bronchitis and pertussis (whooping cough) [1717, 1818, 1919, 2020].

Respiratory data such as breathing, sneezing, speech,eating behaviour and coughing can be processed by ma-chine learning algorithms to diagnose respiratory illness[2121, 2222, 2323]. Simple machine learning tools, like binaryclassifiers, are able to distinguish COVID-19 respiratorysounds from healthy counterparts with an area under theROC curve (AUC) exceeding 0.80 [2424]. Detecting COVID-19 by analysing only the cough sounds is also possible.AI4COVID-19 is a mobile app that records 3 seconds ofcough audio which is analysed automatically to provide

arX

iv:2

012.

0192

6v2

[cs

.SD

] 1

4 Ju

n 20

21

2

BEST CLASSIFIER

(RESNET 50) ACHIEVES

AUC = 0.98

BEST CLASSIFIER

(LSTM) ACHIEVES AUC

= 0.94 WHEN USING

THE BEST 13 FEATURES

COVID-19

COUGH

CLASSIFIER

SARCOS DATASET

PRE-

PROCESSING

& FEATURE

EXTRACTION

COSWARA DATASET

TRAINED AND

EVALUATED ON

COSWARA DATASET

TRAINED ON COSWARA

DATASET AND

EVALUATED ON SARCOS

DATASET

Fig. 1. Location of participants in the Coswara and the Sarcos datasets: Participants in the Coswara dataset were located on five differentcontinents, excluding Africa. The majority (91%) of participants in the Coswara dataset are from Asia, as indicated in Figure 22. Sarcos participantswho supplied geographical information are mostly (75%) from South Africa, as shown in Figure 33.

an indication of COVID-19 status within 2 minutes [2525].A deep neural network (DNN) was shown to distinguishbetween COVID-19 and other coughs with an accuracy of96.83% on a dataset containing 328 coughs from 150 patientsof four different classes: COVID-19, asthma, bronchitis andhealthy [2626]. There appear to be unique patterns in COVID-19 coughs that allow a pre-trained Resnet18 classifier toidentify COVID-19 coughs with an AUC of 0.72. In thiscase, cough samples were collected over the phone from3621 individuals with confirmed COVID-19 [2727]. COVID-19coughs were classified with a higher AUC of 0.97 (sensitivity= 98.5% and specificity = 94.2%) by a Resnet50 architecture,trained on coughs from 4256 subjects and evaluated on 1064subjects that included both COVID-19 positive and COVID-19 negative subjects by implementing four biomarkers [2828].A high AUC exceeding 0.98 was also achieved in [2929] whendiscriminating COVID-19 positive coughs from COVID-19negative coughs on a clinically validated dataset consistingof 2339 COVID-19 positive and 6041 COVID-19 negativesubjects using DNN based classifiers.

Data collection from COVID-19 patients is challengingand the datasets are often not publicly available. Neverthe-less, efforts have been made to compile such datasets. Forexample, a dataset consisting of coughing sounds recordedduring or after the acute phase of COVID-19 from patientsvia public media interviews has been developed in [3030].The Coswara dataset is publicly available and collected ina more controlled and targeted manner [3131]. At the time ofwriting, this dataset included usable ‘deep cough’ i.e. loudcoughs recordings from 92 COVID-19 positive and 1079healthy subjects. We have also begun to compile our owndataset by collecting recordings from subjects who haveundergone a SARS-CoV laboratory test in South Africa. ThisSarcos (SARS COVID-19 South Africa) dataset is currentlystill small and includes only 44 subjects (18 COVID-19positive and 26 COVID-19 negative).

Both the Coswara and Sarcos datasets are imbalancedsince COVID-19 positive subjects are outnumbered by non-COVID-19 subjects. Nevertheless, collectively these twodatasets contain recordings from all six continents, asshown in Figure 11. To improve machine learning classi-fication performance, we have applied the synthetic mi-nority over-sampling technique (SMOTE) to balance ourdatasets. Furthermore, we have found that the COVID-19 positive coughs are 15%-20% shorter than non-COVIDcoughs. Hence, feature extraction is designed to preservethe time-domain patterns over an entire cough. Classifierhyperparameters were optimised by using the leave-p-out

cross-validation, followed by training and evaluation of ma-chine learning approaches, namely logistic regression (LR),k-nearest neighbour (KNN), support vector machine (SVM),multilayer perceptron (MLP) and deep neural networks(DNN) such as a convolutional neural network (CNN), longshort-term memory (LSTM) and Resnet50. The Resnet50produced the highest AUC of 0.976 ≈ 0.98 when trainedand evaluated on the Coswara dataset, outperforming thebaseline results presented in [3232]. No classifier has beentrained on the Sarcos dataset due to its small size. It canalso not be combined with Coswara as it contains slightlydifferent classes. Instead, this dataset has been used for anindependent validation of the best-performing DNN classi-fiers developed on the Coswara dataset. In these validationexperiments, it was found that the highest AUC of 0.938 ≈0.94 is achieved when using the best 13 features identifiedusing the greedy sequential forward selection (SFS) algo-rithm and an LSTM classifier. We conclude that it is possibleto identify COVID-19 on the basis of cough audio recordedusing a smartphone. Furthermore, this discrimination be-tween COVID-19 positive and both COVID-19 negative andhealthy coughs is possible for audio samples collected fromsubjects located all over the world. Additional validation ishowever still required to obtain approval from regulatorybodies for use as a diagnostic tool.

2 DATA

We have used two datasets in our experimental evaluation:the Coswara dataset and the Sarcos dataset.

2.1 The Coswara Dataset

The Coswara project is aimed at developing a diagnostictool for COVID-19 based on respiratory, cough and speechsounds [3131]. Public participants were asked to contributecough recordings via a web-based data collection platformusing their smartphones (https://coswara.iisc.ac.inhttps://coswara.iisc.ac.in). Thecollected audio data includes fast and slow breathing, deepand shallow coughing, phonation of sustained vowels andspoken digits. Age, gender, geographical location, currenthealth status and pre-existing medical conditions are alsorecorded. Health status includes ‘healthy’, ‘exposed’, ‘cured’or ‘infected’. Audio recordings were sampled at 44.1 KHzand subjects were from all continents except Africa, asshown in Figure 22. In this study, we have made use ofthe raw audio recordings and applied pre-processing asdescribed in Section 2.32.3.

https://coswara.iisc.ac.in

3

0

20

40

60

80

100

0 20 40 60 80

Age Distribution

Asia (91%)

Australia (0.14%)

Europe (2.75%)

North America (5.5%)

South America (0.14%)

282

889

FEMALE MALE

Male and Female Subjects

1079

92

Healthy COVIDPositive

COVID Positive and Healthy Subjects

Fig. 2. Coswara dataset at the time of experimentation: There are1079 healthy and 92 COVID-19 positive subjects in the pre-processeddataset, used for feature extraction and classifier training. Most of thesubjects are aged between 20 and 50. There are 282 female and 889male subjects and most of them are from Asia. Subjects are from fivecontinents: Asia (Bahrain, Bangladesh, China, India, Indonesia, Iran,Japan, Malaysia, Oman, Philippines, Qatar, Saudi Arabia, Singapore,Sri Lanka, United Arab Emirates), Australia, Europe (Belgium, Finland,France, Germany, Ireland, Netherlands, Norway, Romania, Spain, Swe-den, Switzerland, Ukraine, United Kingdom), North America (Canada,United States), and South America (Argentina, Mexico).

2.2 The Sarcos Dataset

A similar initiative in South Africa encouraged participantsto allow the voluntarily recording their coughs using anonline platform (https://coughtest.onlinehttps://coughtest.online) under the re-search project name: ‘COVID-19 screening by cough soundanalysis’. This dataset will be referred to as ‘Sarcos’ (SARCOVID-19 South Africa). Only coughs were collected asaudio samples, and only subjects who had recently under-gone a SARS-CoV laboratory test were asked to participate.The sampling rate for the audio recordings was 44.1 KHz.In addition to the cough audio recordings, subjects werepresented with a voluntary and anonymous questionnaire,providing informed consent. The questionnaire promptedfor the following information.

• Age and gender.• Whether tested by an authorised COVID-19 testing

centre.• Days since the test was performed.• Lab result (COVID-19 positive or negative).• Country of residence.• Known contact with COVID-19 positive patient.• Known lung disease.• Symptoms and temperature.• Whether they are a regular smoker.• Whether they have a current cough and for how

many days.

Among the 44 participants, 33 (75%) subjects assertedthat they are South African residents and therefore representthe African continent, as shown in Figure 33. As there wereno subjects from Africa in the Coswara dataset, together theCoswara and Sarcos dataset include subjects from all sixcontinents.

2% 2%

21%

75%

Country of Origin

Brazil (1)

India (1)

Prefer not to say (9)

South Africa (33)

25

19

No Yes

Do they have a Normal Cough?

26

18

COVIDNegative

COVIDPositive

COVID Positive and Negative Subjects

27

17

Female Male

Male and Female Subjects

26

18

No Yes

Subjects with COVID-19 contacts

25

5 5 3 2 1 3

0 1 to 3 4 to 6 7 to 9 10 to12

13 to15

>15

Days since Coughing started

14

9

4 3 1

13

1 to 3 4 to 6 7 to 9 10 to 12 13 to 15 >15

Days since the Lab Test

Fig. 3. Sarcos dataset at the time of experimentation: There are 26COVID-19 negative and 18 COVID-19 positive subjects in the processeddataset. Unlike the Coswara dataset, there are more female than malesubjects. Most of the subjects had their lab test performed within twoweeks of participation. Only 19 of the subjects reported coughing as asymptom, and for these the reported duration of coughing symptomswas variable. There were 33 subjects from Africa (South Africa), 1 fromSouth America (Brazil), 1 from Asia (India) and the rest declined tospecify their geographic location.

2.3 Data Pre-processingThe raw cough audio recordings from both datasets havethe sampling rate (µ) of 44.1 KHz and is subjected to somesimple pre-processing steps, described below. We note, time-window length (λ) as 0.05 seconds and amplitude thresholdvalue (Φ) as 0.005, where both of these values were deter-mined manually and interactively, as the silence removalwas validated by visual inspection in all cases.

The original cough audio ci(t) is normalised by follow-ing Equation 11.

ci(t) = 0.9× ci(t)

|max(ci(t))|(1)

The processed final cough audio is shown in Figure 44and noted as: C(t). Here, I denotes the time-window andwe define:

CI(t) = Cjµλ(t) · · ·C(j+1)µλ(t) (2)

For example, when j = 0; CI will be the portion of signalwhere C0 · · ·C2205, as µ = 44100 Hz and λ = 0.05 seconds.0 ≤ j ≤ b Ξ

µλc, where Ξ is the length of signal ci(t). C(t) iscalculated by following Equation 33.

C(t) = C(t)⊕ CI{

if CI(t) ≥ Φ (3)

where, ⊕ means concatenation and, CI(t) ≥ Φ, if CIi(t) ≥Φ, where ∀i ∈ I .

Thus, the amplitudes of the raw audio data in theCoswara and the Sarcos dataset were normalised, afterwhich periods of silence were removed from the signal

https://coughtest.online

4

to within a 50 ms margin using a simple energy detector.Figure 44 shows an example of the original raw audio, aswell as the pre-processed audio.

Fig. 4. A processed COVID-19 cough audio which is shorter than theoriginal cough audio but keeps all spectrum resolution. Amplitudes arenormalised and extended silences are removed in the pre-processing.

After pre-processing, the Coswara dataset contains 92COVID-19 positive and 1079 healthy subjects and the Sarcosdataset contains 18 COVID-19 positive and 26 COVID-19 negative subjects, as summarised in Table 11. In bothdatasets, COVID-19 positive coughs are 15%-20% shorterthan non-COVID coughs.

2.4 Dataset BalancingTable 11 shows that COVID-19 positive subjects are under-represented in both datasets. To compensate for this im-balance, which can detrimentally affect machine learn-ing [3333, 3434], we have applied SMOTE data balancing tocreate equal number of COVID-19 positive coughs duringtraining [3535, 3636]. This technique has previously been suc-cessfully applied to cough detection and classification basedon audio recordings [1515, 1818, 3737].

SMOTE oversamples the minor class by generating syn-thetic examples, instead of for example random oversam-pling. In our dataset, for each COVID-19 positive cough, 5other COVID-19 positive coughs were randomly chosen andthe closest in terms of the Euclidean distance is identifiedas xNN . Then the synthetic COVID-19 positive samples arecreated using Equation 44.

xSMOTE = x + u · (xNN − x) (4)

The multiplicative factor u is uniformly distributed be-tween 0 and 1 [3838].

We have also implemented other extensions of SMOTEsuch as borderline-SMOTE [3939, 4040] and adaptive syntheticsampling [4141]. However, the best results were obtained byusing SMOTE without any modification.

3 FEATURE EXTRACTION

The feature extraction process is illustrated in Figure 55. Fea-tures such as mel-frequency cepstral coefficients (MFCCs),log frame energies, zero crossing rate (ZCR) and kurtosisare extracted. MFCCs have been used very successfullyas features in audio analysis and especially in automaticspeech recognition [4242, 4343]. They have also been found to

PRE-PROCESSED COUGH

AUDIO

SPLIT THE PRE-PROCESSED COUGH

INTO MULTIPLE SEGMENTS

FEATURE

EXTRACTION

FEATURES FOR TRAINING

AND TESTING

MFCC

LOG ENERGIES

ZCR

KURTOSIS

MFCC ∆

MFCC ∆∆

Fig. 5. Feature Extraction: Pre-processed cough audio recordings,shown in Figure 44, are split into individual segments after which featuressuch as MFCCs, MFCCs velocity (∆), MFCCs acceleration (∆∆), logframe energies, ZCR and kurtosis are extracted. So, for M numberof MFCCs and S number of segments, the final feature matrix has(3M+ 3,S) dimensions.

be useful in differentiating dry coughs from wet coughs[4444] and classifying tuberculosis coughs [4545]. We have usedthe traditional MFCC extraction method considering higherresolution MFCCs along with the velocity (first-order differ-ence, ∆) and acceleration (second-order difference, ∆∆) asadding these has shown classifier improvement in the past[4646]. Log frame energies can improve the performance inaudio classification tasks [4747]. The ZCR [4848] is the numberof times a signal changes sign within a frame, indicating thevariability present in the signal. The kurtosis [4949] indicatesthe tailedness of a probability density. For the samplesof an audio signal, it indicates the prevalence of higheramplitudes. These features have been extracted by usingthe hyperparameters described in Table 22 for all coughrecordings.

We have extracted features in a way that preserves theinformation regarding the beginning and the end of a coughevent to allow time-domain patterns in the recordings to bediscovered while maintaining the fixed input dimensional-ity expected by, for example, a CNN. From every recording,we extract a fixed number of features S by distributingthe fixed-length analysis frames uniformly over the time-interval of the cough. The input feature matrix for theclassifiers then always has the dimension of (3M+ 3,S) forM number of MFCCs along withM number of velocity (∆)andM number of acceleration (∆∆), as illustrated in Figure55. If Λ is the number of samples in the cough audio, wecan calculate the number of samples between consecutiveframes δ using Equation 55.

δ =

⌈Λ

S

⌉(5)

So, for example a 2.2 second long cough audio eventcontains 97020 samples, as the sampling rate is 44.1 KHz. Ifthe frame length is 1024 samples and number of segments

are 100, then the frame skip (δ) is⌈

97020100

⌉= 971 samples.

In contrast with the more conventionally applied fixedframe rates, this way of extracting features ensures thatthe entire recording is captured within a fixed numberof frames, allowing especially the CNN classifiers to dis-cover more useful temporal patterns and provide betterclassification performance. This particular method of featureextraction has also shown promising result in classifyingCOVID-19 breath and speech [3737].

5

TABLE 1Summary of the Coswara and Sarcos Datasets: In the COSWARA dataset, there were 1171 subjects with usable ‘deep cough’ recordings, 92of whom were COVID-19 positive while 1079 were healthy. This amounts to a total of 1.05 hours of cough audio recordings (after pre-processing)that will be used for experimentation. The Sarcos dataset contains data from a total of 44 subjects, 18 of whom are COVID-19 positive and 26 who

are not. This amounts to a total of 2.45 minutes of cough audio recordings (after pre-processing) that has been used for experimentation.COVID-19 positive coughs are 15%-20% shorter than non-COVID coughs.

Dataset Label Subjects Total audio Average per subject Standard deviationCoswara COVID-19 Positive 92 4.24 mins 2.77 sec 1.62 secCoswara Healthy 1079 0.98 hours 3.26 sec 1.66 secCoswara Total 1171 1.05 hours 3.22 sec 1.67 sec

Sarcos COVID-19 Positive 18 0.87 mins 2.91 sec 2.23 secSarcos COVID-19 Negative 26 1.57 mins 3.63 sec 2.75 secSarcos Total 44 2.45 mins 3.34 sec 2.53 sec

4 CLASSIFIER ARCHITECTURES

We have trained and evaluated seven machine learningclassifiers in total. LR models have been found to out-perform other more complex classifiers such as classifica-tion trees, random forests, SVM in some clinical predictiontasks [1414, 5050, 5151]. We have used gradient descent weightregularisation as well as lasso (l1 penalty) and ridge (l2penalty) estimators during training [5252, 5353]. This LR clas-sifier has been intended primarily as a baseline againstwhich any improvements offered by the more complexarchitectures can be measured. A KNN classifier bases itsdecision on the class labels of the k nearest neighboursin the training set and in the past has been able to bothdetect [5454, 5555, 5656] and classify [1717, 4545, 5757] sounds such ascoughs and snores successfully. SVM classifiers have alsoperformed well in both detecting [5858, 5959] and classifying[6060] cough events. The independent term in kernel functionsis chosen as a hyperparameter while optimising the SVMclassifier. An MLP, a neural network with multiple layersof neurons separating the input and output [6161], is capableof learning non-linear relationships and have for examplebeen shown to be effective when discriminating influenzacoughs from other coughs [6262]. MLP have also been appliedto classify tuberculosis coughs [4545, 5959] and detect coughs ingeneral [6363, 6464]. The penalty ratios, along with the numberof neurons are used as the hyperparameters which wereoptimised using the leave-p-out cross-validation process(Figure 88 and Section 5.25.2).

A CNN is a popular deep neural network architecture,primarily used in image classification [6565]. For example,in the past two decades CNNs were applied successfullyto complex tasks such as face recognition [6666]. It has alsoperformed well in classifying COVID-19 breath and speech[3737]. A CNN architecture [6767, 6868] along with the opti-mised hyperparameters (Table 33) is shown in Figure 66. AnLSTM model is a type of recurrent neural network whosearchitecture allows it to remember previously-seen inputswhen making its classification decision [6969]. It has beensuccessfully used in automatic cough detection [1515, 7070], andalso in other types of acoustic event detection [7171, 7272]. Thehyperparameters optimised for the LSTM classifier [7373] arementioned in Table 33 and visually explained in Figure 77. The50-layer deep residual learning (Resnet50) neural network[7474] is a very deep architecture that contains skip layers, andhas been found to outperform other very deep architectures

8

α2

α2 α1

α4

1

0

INPUT FEATURE

MATRIX CONVOLUTIONAL

2D LAYERS

MAX-POOLING

WITH

DROPOUT

RATE = α3

FLATTENING

WITH

DROPOUT

RATE = α3

REDUCE DENSE

LAYER UNITS TO 8,

THEN 2-UNIT

SOFTMAX

Fig. 6. CNN Classifier: Our CNN classifier uses α1 two-dimansionalconvolutional layers with kernel size α2, rectified linear units as ac-tivation functions and a dropout rate of α3. After max-pooling, twodense layers with α4 and 8 units respectively and rectified linear acti-vation functions follow. The network is terminated by a two-dimensionalsoftmax where one output (1) represents the COVID-19 positive classand the other (0) healthy or COVID-19 negative class. During training,features are presented to the neural network in batches of size β3 for β4epochs.

8 α4 β1

INPUT FEATURE

MATRIX β1 LSTM UNITS

1

0

FLATTENING

WITH DROPOUT

RATE = α3

REDUCE DENSE

LAYER UNITS TO

8, THEN 2-UNIT

SOFTMAX

Fig. 7. LSTM classifier: Our LSTM classifier has β1 LSTM units, eachwith rectified linear activation functions and a dropout rate of α3. Thisis followed by two dense layers with α4 and 8 units respectively andrectified linear activation functions. The network is terminated by a two-dimensional softmax where one output (1) represents the COVID-19positive class and the other (0) healthy or COVID-19 negative class.During training, features are presented to the neural network in batchesof size β3 for β4 epochs.

such as VGGNet. It performs particularly well on imageclassification tasks on the dataset such as ILSVRC, the CI-FAR10 dataset and the COCO object detection dataset [7575].Resnet50 has already been used in successfully detectingCOVID-19 from CT images [66], coughs [2828], breath, speech[3737] and Alzheimer’s [7676]. Due to extreme computation load,we have used the default Resnet50 structure mentioned inTable 1 of [7474].

6

5 CLASSIFICATION PROCESS

5.1 Hyperparameter OptimisationBoth feature extraction and classifier architectures have anumber of hyperparameters. They are listed in Table 22 and 33and were optimised by using a leave-p-out cross-validationscheme.

TABLE 2Feature extraction hyperparameters optimised using the leave-p-out

cross-validation as described in Section 5.25.2

Hyperparameter Description Range

MFCC (M) Number of lower-order 13× k1, whereMFCCs to keep k1 = 1, 2, 3, 4, 5

Frame (F) Frame-size in which 2k2 whereaudio is segmented k2 = 8, · · · , 12

Seg (S) Number of frames 10× k3, whereextracted from the audio k3 = 5, 7, 10, 12, 15

As the sampling rate is 44.1 KHz in both the Coswaraand Sarcos dataset, by varying the frame lengths from 28

to 212 i.e. 256 to 4096 samples, features are extracted fromframes whose duration varies between approximately 5 and100 ms. Different phases in a cough carry important features[4444] and thus each cough has been divided between 50 and150 segments with steps of 20 to 30, as shown in Figure 55. Byvarying the number of lower-order MFCCs to keep (from 13to 65, with steps of 13), the spectral resolution of the featureswas varied.

5.2 Cross-validationAll our classifiers have been trained and evaluated by usinga nested leave-p-out cross-validation scheme, as shown inFigure 88 [7777]. Since only the Coswara dataset was used fortraining and parameter optimisation, N = 1171 in Figure 88.We have set the train and test split as 4 : 1; as this ratiohas been used effectively in medical classification tasks [7878].Thus, J = 234 and K = 187 in our experiments.

The figure shows that, in an outer loop, J subjects areremoved from the complete set of N subjects to be usedfor later independent testing. Then, a further K subjects areremoved from the remaining N − J subjects to serve as adevelopment set to optimise the hyperparameters listed inTable 33. The inner loop considers all such sets of K subjects,and the optimal hyperparameters are chosen on the basis ofall these partitions. The resulting optimal hyperparametersare used to train a final system on all N − J subjects whichis evaluated on the test set consisting of J subjects. If theN −J subjects in the training portion contain C1 COVID-19positive and C2 COVID-19 negative coughs, then (C2 −C1)synthetic COVID-19 positive coughs are created by usingSMOTE. AUC has always been the optimisation criterion inthis cross-validation. This entire procedure is repeated forall possible non-overlapping test sets in the outer loop. Thefinal performance is evaluated by calculating and averagingAUC over these outer loops.

This cross-validation procedure makes the best use ofour small dataset by allowing all subjects to be used forboth training and testing purposes while ensuring unbiasedhyperparameter optimisation and a strict per-subject sepa-ration between cross-validation folds.

N – J – K SUBJECTS

K SUBJECTS

N - J SUBJECTS

NEXT TEST SET

J SUBJECTS

(classifier performance)

OUTER LOOP

NEXT DEV SET

(hyperparameters)

INNER LOOP

FULL DATASET OF N SUBJECTS

DEV TEST

TRAIN

EVALUATE

CHOOSE OPTIMUM

HYPERPARAMETERS

EVALUATE

Fig. 8. Leave p-out cross-validation, used to train and evaluate theclassifiers. The development set (DEV) consisting K subjects has beenused to optimise the hyperparameters while training on the TRAIN set,consisted of N − J −K subjects. The final evaluation of the classifiersin terms of the AUC occurs on the TEST set, consisting J subjects.

5.3 Classifier Evaluation

Receiver operating characteristic (ROC) curves were calcu-lated within the inner and outer loops shown in Figure 88.The area under the ROC curve (AUC) indicates how well theclassifier has performed over a range of decision thresholds[7979]. From these ROC curves, the decision that achieves anequal error rate (γEE) was computed. This is the thresholdfor which the difference between the classifier’s true posi-tive rate (TPR) and false positive rate (FPR) is minimised.

We note the mean per-frame probability that a cough isfrom a COVID-19 positive subject by P :

P =

K∑i=1

P (Y = 1|Xi, θ)

K(6)

where K indicates the number of frames in the cough andP (Y = 1|Xi, θ) is the output of the classifier for featurevectorXi and parameters θ for the ith frame. Now we definethe indicator variable C as:

C =

{1 if P ≥ γEE0 otherwise

(7)

We then define two COVID-19 index scores (COV ID I1and COV ID I2) in Equations 88 and 99 respectively.

COV ID I1 =

N1∑i=1

C

N1(8)

7

TABLE 3Classifier hyperparameters, optimised using the leave-p-out cross-validation as described in Section 5.25.2

Hyperparameter Description Classifier Rangeν1 Regularisation strength LR 10i1 where i1 = −7,−6, . . . , 6, 7 (10−7 to 107)ν2 l1 penalty LR 0 to 1 in steps of 0.05ν3 l2 penalty LR 0 to 1 in steps of 0.05ξ1 Number of neighbours KNN 10 to 100 in steps of 10ξ2 Leaf size KNN 5 to 30 in steps of 5ζ1 Regularisation strength SVM 10i3 where i3 = −7,−6, . . . , 6, 7 (10−7 to 107)ζ2 Kernel Coefficient SVM 10i4 where i4 = −7,−6, . . . , 6, 7 (10−7 to 107)η1 No. of neurons MLP 10 to 100 in steps of 10η2 l2 penalty MLP 10i2 where i2 = −7,−6, . . . , 6, 7 (10−7 to 107)η3 Stochastic gradient descent MLP 0 to 1 in steps of 0.05α1 No. of Conv filters CNN 3× 2k4 where k4 = 3, 4, 5α2 Kernel size CNN 2 and 3α3 Dropout rate CNN, LSTM 0.1 to 0.5 in steps of 0.2α4 Dense layer size CNN, LSTM 2k5 where k5 = 4, 5

β1 LSTM units LSTM 2k6 where k6 = 6, 7, 8

β2 Learning rate LSTM 10k7 where k7 = −2,−3,−4

β3 Batch Size CNN, LSTM 2k8 where k8 = 6, 7, 8β4 No. of epochs CNN, LSTM 10 to 250 in steps of 20

COV ID I2 =

N2∑i=1

P (Y = 1|Xi)

N2(9)

In Equation 88, N1 is the number of coughs from thesubject in the recording while in Equation 99, N2 indicatesthe total number of frames of cough audio gathered fromthe subject. Hence Equation 66 computes a per-cough averageprobability while Equation 99 computes a per-frame averageprobability. For the Coswara dataset, N1 = 1.

The COVID-19 index scores, given by Equations 88 and 99,can both be used to make classification decisions. We havefound that for some classifier architectures one will lead tobetter performance than the other. Therefore, we have madethe choice of the scoring function an additional hyperpa-rameter to be optimised during cross-validation.

We have calculated the specificity and sensitivity fromthese predicted values and then compared them with theactual values and finally calculated the AUC and used itas a method of evaluation. The mean specificity, sensitivity,accuracy and AUC along with the optimal hyperparametersfor each classifier are shown in Table 44 and 55.

6 RESULTS

6.1 Coswara datasetClassification performance for the Coswara dataset is shownin Table 44. The Coswara results are the average speci-ficity, sensitivity, accuracy and AUC along with its standarddeviation calculated over the outer loop test-sets duringcross-validation. These tables also show the values of thehyperparameters which produce the highest AUC duringcross-validation.

Table 44 shows that all seven classifiers can classifyCOVID-19 coughs and the Resnet50 classifier exhibits thebest performance, with an AUC of 0.976 when using a 120-dimensional feature matrix consisting of 39 MFCCs withappended velocity and acceleration extracted from framesthat are 1024 samples long and when grouping the coughs

Fig. 9. Mean ROC curves for the classifiers trained and evaluatedon the Coswara dataset: The highest AUC of 0.98 was achieved bythe Resnet50, while the LR classifier has the lowest AUC of 0.74.

into 50 segments. The corresponding accuracy is 95.3%with sensitivity 93% and specificity 98%. The CNN andLSTM classifiers also exhibited good performance, withAUCs of 0.953 and 0.942 respectively, thus comfortablyoutperforming the MLP, which achieved an AUC of 0.897.The optimised LR and SVM classifiers showed substantiallyweaker performance, with AUCs of 0.736 and 0.815 respec-tively. Table 44 also shows that DNN classifiers exhibit lowerstandard deviation across the folds than other classifiers.This suggests that DNN classifiers are also prone to performbetter on new datasets without further hyperparameteroptimisation.

The mean ROC curves for the optimised classifier of eacharchitecture are shown in Figure 99. We see that LSTM, CNNand Resnet50 classifiers achieve better performance thanthe remaining architectures at most operating points. Fur-thermore, the figure confirms that the Resnet50 architecturealso in most cases achieved better classification performancethan the CNN and LSTM. There appears to be a small regionof the curve where the CNN outperforms the Resnet50classifier, but this will need to be verified by future furtherexperimentation with a larger dataset.

8

TABLE 4Classifier performance when training and evaluating on the Coswara dataset. The best two classifiers along with their feature extraction and

optimal classifier hyperparameters are mentioned. The area under the ROC curve (AUC) has been the optimisation criterion duringcross-validation. The mean specificity (spec), sensitivity (sens), accuracy (ACC) and standard deviation of AUC (σAUC ) are also shown. The best

performance is achieved by the Resnet50.

Classifier Best Feature Optimal Classifier Hyperparameters PerformanceHyperparameters (Optimised inside nested cross-validation) Spec Sens ACC AUC σAUC

LR M=13, F=1024, S=120 ν1 = 10−4, ν2 = 0.25, ν3 = 0.75 57% 94% 75.70% 0.736 0.057LR M=26, F=1024, S=70 ν1 = 10−2, ν2 = 0.45, ν3 = 0.55 59% 74% 66.32% 0.729 0.049

KNN M=26, F=2048, S=100 ξ1 = 70, ξ2 = 20 65% 83% 74.70% 0.781 0.041KNN M=26, F=1024, S=70 ξ1 = 60, ξ2 = 25 64% 81% 73.80% 0.776 0.039SVM M=39, F=2048, S=100 ζ1 = 10−2, ζ2 = 10−3 74% 71% 72.28% 0.815 0.046SVM M=26, F=1024, S=50 ζ1 = 10−4, ζ2 = 102 74% 74% 73.91% 0.804 0.051MLP M=26, F=2048, S=100 η1 = 40, η2 = 10−3, η3 = 0.4 87% 88% 87.51% 0.897 0.033MLP M=13, F=1024, S=100 η1 = 60, η2 = 10−1, η3 = 0.55 84% 68% 76.02% 0.833 0.041CNN M=26, F=1024, S=70 α1=48, α2=2, α3=0.3, α4=16, β3=128, β4=130 99% 90% 94.57% 0.953 0.039CNN M=39, F=1024, S=50 α1=96, α2=2, α3=0.1, α4=16, β3=256, β4=170 98% 90% 94.35% 0.950 0.039LSTM M=13, F=2048, S=70 β1=128, β2=10−3, α3=0.3, α4=32, β3=256, β4=150 97% 91% 94.02% 0.942 0.043LSTM M=26, F=2048, S=100 β1=256, β2=10−2, α3=0.3, α4=16, β3=256, β4=110 97% 90% 93.65% 0.931 0.041

Resnet50 M=39, F=1024, S=50 Default Resnet50 (Table 1 in [7474]) 98% 93% 95.33% 0.976 0.018Resnet50 M=26, F=1024, S=70 Default Resnet50 (Table 1 in [7474]) 98% 93% 95.01% 0.963 0.011

TABLE 5Classifier performance when training on the Coswara dataset and evaluating on the Sarcos dataset. The best performance was achieved

by the LSTM classifier, and further improvements were achieved by applying SFS.

Classifier Best Feature Optimal Classifier Hyperparameters PerformanceHyperparameters (trained on Coswara dataset in Table 44) Spec Sens ACC AUC

CNN M=26, F=1024, S=70 α1=48, α2=2, α3=0.3, α4=16, β3=128, β4=130 61% 85% 73.02% 0.755LSTM M=13, F=2048, S=70 β1=128, β2=10−3, α3=0.3, α4=32, β3=256, β4=150 73% 75% 73.78% 0.779

Resnet50 M=39, F=1024, S=50 Default Resnet50 (Table 1 in [7474]) 57% 93% 74.58% 0.742LSTM + SFS M=13, F=2048, S=70 β1=128, β2=10−3, α3=0.3, α4=32, β3=256, β4=150 96% 91% 92.91% 0.938

We also see from Table 44 that using a larger number ofMFCCs consistently leads to improved performance. Sincethe spectral resolution used to compute the 39-dimensionalMFCCs surpasses that of the human auditory system, weconclude that the classifiers are using information not gen-erally perceivable to the human listeners. We have cometo similar conclusions in previous work considering theclassification of coughing sounds due to tuberculosis [1414].

6.2 Sarcos dataset

Classification performance for the Sarcos dataset is shownin Table 55. Here the CNN, LSTM and Resnet50 classifierstrained on the Coswara dataset (as shown in Table 44) weretested on the 44 subjects in Sarcos dataset. No further hyper-parameter optimisation was performed and hence Table 55simply notes the same hyperparameters presented in Table44. We see that performance has in all cases deterioratedrelative to the better-matched Coswara dataset. The bestperformance was achieved by the LSTM classifier, whichachieved an AUC of 0.779. In the next section, we improvethis classifier by applying feature selection.

6.2.1 Feature Selection

As an additional experiment, SFS has been applied to thebest-performing system in Table 55, the LSTM. SFS is agreedy selection method for the individual feature dimen-sions that contribute the most towards the classifier perfor-mance [8080].

Fig. 10. Sequential Forward Selection, when applied to a featurematrix composed of 13 MFCCs with appended velocity (∆) and accel-eration (∆∆), log frame energies, ZCR and kurtosis (Equation 55). Peakperformance is observed after selecting the best 13 features.

The feature selection hyperparameters in these experi-ments were 13 MFCCs extracted from 2048 samples (i.e.0.46 sec) long frames while coughs were grouped into 70segments. Thus, SFS could select from a total of 42 features:MFCCs along with their velocity (∆) and accelerations(∆∆), log frame energy, ZCR and Kurtosis (Equation 55).After performing SFS to the LSTM classifier, a peak AUCof 0.938 was observed on the Sarcos dataset when using thebest 13 features among those 42, as shown in Figure 1010 andTable 55. These 13 selected features led to an improvementof AUC from 0.779 to 0.938 (Figure 1111) and they include

9

Fig. 11. Mean ROC curve for the best performed LSTM classifiertrained on Coswara dataset and evaluated on Sarcos dataset: AUCof 0.78 has been achieved while using all 42 features. After applyingSFS and selecting the best 13 features, the AUC has been improved to0.94.

MFCCs ranging from 3 to 12 along with their velocity (∆)and acceleration (∆∆), suggesting all dimensions of featurematrix carry equally-important COVID-19 signatures.

7 CONCLUSION AND FUTURE WORK

We have developed COVID-19 cough classifiers usingsmartphone audio recordings and seven machine learningarchitectures. To train and evaluate these classifiers, wehave used two datasets. The first, larger, dataset is publiclyavailable and contains data from 1171 subjects (92 COVID-19 positive and 1079 healthy) residing on all five conti-nents except Africa. The smaller second dataset containsrecordings from 18 COVID-19 positive and 26 COVID-19negative subjects, 75% of whom reside in South Africa.Thus, together the two datasets include data from subjectsresiding on all six continents. After pre-processing the coughaudio recordings, we have found that the COVID-19 posi-tive coughs are 15%-20% shorter than non-COVID coughs.Then we have extracted MFCCs, log frame energy, ZCRand kurtosis features from the cough audio using a spe-cial feature extraction technique which preserves the time-domain patterns and then trained and evaluated those sevenclassifiers using the nested leave-p-out cross-validation. Ourbest-performing classifier is the Resnet50 architecture and isable to discriminate between COVID-19 coughs and healthycoughs with an AUC of 0.98 on the Coswara dataset.These results outperform the baseline result of the AUCof 0.7 in [3232]. When testing on the Sarcos dataset, theLSTM model trained on the Coswara dataset exhibit thebest performance, discriminating COVID-19 positive coughsfrom COVID-19 negative coughs with an AUC of 0.94 whileusing the best 13 features determined by sequential forwardselection (SFS). Furthermore, since better performance isachieved using a larger number of MFCCs than is requiredto mimic the human auditory system, we also concludethat at least some of the information used by the classifiersto discriminate the COVID-19 coughs and the non-COVIDcoughs may not be perceivable to the human ear.

Although the systems we describe require more strin-gent validation on a larger dataset, the results we havepresented are very promising and indicate that COVID-19 screening based on automatic classification of coughing

sounds is viable. Since the data has been captured onsmartphones, and since the classifier can in principle alsobe implemented on such device, such cough classificationis cost-efficient, easy to apply and deploy. Furthermore,it could be applied remotely, thus avoiding contact withmedical personnel.

In ongoing work, we are continuing to enlarge ourdataset and to apply transfer learning in order take advan-tage of the other larger datasets. We are also beginning toconsider the best means of implementing the classifier on areadily-available consumer smartphone.

ACKNOWLEDGEMENTS

This project was funded by the South African MedicalResearch Council (SAMRC) through its Division of ResearchCapacity Development under the SAMRC Intramural Post-doctoral programme, the South African National Treasury,as well as an EDCTP2 programme supported by the Euro-pean Union (TMA2017CDF-1885). We would like to thankthe South African Centre for High Performance Computing(CHPC) for providing computational resources on theirLengau cluster for this research, and gratefully acknowledgethe support of Telcom South Africa. We would also liketo thank Jordan Govendar and Rafeeq du Toit for theirinvaluable assistance in the Sarcos data collection.

The content and findings reported are the sole deduction,view and responsibility of the researchers and do not reflectthe official position and sentiments of the SAMRC, EDCTP2,European Union or the funders.

REFERENCES

[1] WHO et al., “Summary of probable SARS cases withonset of illness from 1 November 2002 to 31 July 2003,”http://www.who.int/csr/sars/country/table2004 04 21/en/index.html, 2003.

[2] R. Miyata, N. Tanuma, M. Hayashi, T. Imamura,J.-i. Takanashi, R. Nagata, A. Okumura, H. Kashii,S. Tomita, S. Kumada et al., “Oxidative stress in patientswith clinically mild encephalitis/encephalopathy witha reversible splenial lesion (MERS),” Brain and Develop-ment, vol. 34, no. 2, pp. 124–127, 2012.

[3] D. Wang, B. Hu, C. Hu, F. Zhu, X. Liu, J. Zhang,B. Wang, H. Xiang, Z. Cheng, Y. Xiong et al., “Clin-ical characteristics of 138 hospitalized patients with2019 novel coronavirus–infected pneumonia in Wuhan,China,” JAMA, vol. 323, no. 11, pp. 1061–1069, 2020.

[4] A. Carfı, R. Bernabei, F. Landi et al., “Persistent symp-toms in patients after acute COVID-19,” JAMA, vol.324, no. 6, pp. 603–605, 2020.

[5] John Hopkins University. (2020, Nov.) COVID-19Dashboard by the Center for Systems Scienceand Engineering (CSSE). John Hopkins University.[Online]. Available: https://coronavirus.jhu.eduhttps://coronavirus.jhu.edu

[6] S. Walvekar, D. Shinde et al., “Detection of COVID-19 from CT images using Resnet50,” in 2nd In-ternational Conference on Communication & Infor-mation Processing (ICCIP) 2020, May 2020, avail-able at SSRN: https://ssrn.com/abstract=3648863 orhttp://dx.doi.org/10.2139/ssrn.3648863.

https://coronavirus.jhu.edu

10

[7] H. Sotoudeh, M. Tabatabaei, B. Tasorian, K. Tavakol,E. Sotoudeh, and A. L. Moini, “Artificial IntelligenceEmpowers Radiologists to Differentiate Pneumonia In-duced by COVID-19 versus Influenza Viruses,” ActaInformatica Medica, vol. 28, no. 3, p. 190, 2020.

[8] M. Yildirim and A. Cinar, “A Deep Learning BasedHybrid Approach for COVID-19 Disease Detections,”Traitement du Signal, vol. 37, no. 3, pp. 461–468, 2020.

[9] A. Chang, G. Redding, and M. Everard, “Chronicwet cough: protracted bronchitis, chronic suppurativelung disease and bronchiectasis,” Pediatric Pulmonology,vol. 43, no. 6, pp. 519–531, 2008.

[10] T. Higenbottam, “Chronic cough and the cough reflexin common lung diseases,” Pulmonary Pharmacology &Therapeutics, vol. 15, no. 3, pp. 241–247, 2002.

[11] K. F. Chung and I. D. Pavord, “Prevalence, pathogene-sis, and causes of chronic cough,” The Lancet, vol. 371,no. 9621, pp. 1364–1374, 2008.

[12] J. Korpas, J. Sadlonova, and M. Vrabec, “Analysis of thecough sound: an overview,” Pulmonary Pharmacology,vol. 9, no. 5-6, pp. 261–268, 1996.

[13] J. Knocikova, J. Korpas, M. Vrabec, and M. Javorka,“Wavelet analysis of voluntary cough sound in patientswith respiratory diseases,” Journal of Physiology andPharmacology, vol. 59, no. Suppl 6, pp. 331–40, 2008.

[14] G. Botha, G. Theron, R. Warren, M. Klopper, K. Dheda,P. Van Helden, and T. Niesler, “Detection of tubercu-losis by automatic cough sound analysis,” PhysiologicalMeasurement, vol. 39, no. 4, p. 045005, 2018.

[15] M. Pahar, I. Miranda, A. Diacon, and T. Niesler, “DeepNeural Network based Cough Detection using Bed-mounted Accelerometer Measurements,” in ICASSP2021 - 2021 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP), 2021, pp. 8002–8006.

[16] M. Al-khassaweneh and R. Bani Abdelrahman, “Asignal processing approach for the diagnosis of asthmafrom cough sounds,” Journal of Medical Engineering &Technology, vol. 37, no. 3, pp. 165–171, 2013.

[17] R. X. A. Pramono, S. A. Imtiaz, and E. Rodriguez-Villegas, “A cough-based algorithm for automatic diag-nosis of pertussis,” PloS one, vol. 11, no. 9, p. e0162128,2016.

[18] A. Windmon, M. Minakshi, P. Bharti, S. Chellappan,M. Johansson, B. A. Jenkins, and P. R. Athilingam,“Tussiswatch: A smart-phone system to identify coughepisodes as early symptoms of chronic obstructivepulmonary disease and congestive heart failure,” IEEEJournal of Biomedical and Health Informatics, vol. 23, no. 4,pp. 1566–1573, 2018.

[19] R. V. Sharan, U. R. Abeyratne, V. R. Swarnkar, andP. Porter, “Automatic croup diagnosis using coughsound recognition,” IEEE Transactions on Biomedical En-gineering, vol. 66, no. 2, pp. 485–495, 2018.

[20] G. Rudraraju, S. Palreddy, B. Mamidgi, N. R. Sripada,Y. P. Sai, N. K. Vodnala, and S. P. Haranath, “Coughsound analysis and objective correlation with spirom-etry and clinical diagnosis,” Informatics in MedicineUnlocked, p. 100319, 2020.

[21] G. Deshpande and B. Schuller, “An Overview on Au-dio, Signal, Speech, & Language Processing for COVID-

19,” arXiv preprint arXiv:2005.08579, 2020.[22] A. N. Belkacem, S. Ouhbi, A. Lakas, E. Benkhelifa, and

C. Chen, “End-to-End AI-based Point-of-Care Diagno-sis System for Classifying Respiratory Illnesses andEarly Detection of COVID-19: A Theoretical Frame-work,” Frontiers in Medicine, vol. 8, p. 372, 2021.

[23] B. W. Schuller, D. M. Schuller, K. Qian, J. Liu, H. Zheng,and X. Li, “COVID-19 and computer audition: Anoverview on what speech & sound analysis could con-tribute in the SARS-CoV-2 Corona crisis,” arXiv preprintarXiv:2003.11117, 2020.

[24] C. Brown, J. Chauhan, A. Grammenos, J. Han,A. Hasthanasombat, D. Spathis, T. Xia, P. Cicuta,and C. Mascolo, “Exploring Automatic Diagnosis ofCOVID-19 from Crowdsourced Respiratory SoundData,” in Proceedings of the 26th ACM SIGKDD Interna-tional Conference on Knowledge Discovery & Data Mining,2020, pp. 3474–3484.

[25] A. Imran, I. Posokhova, H. N. Qureshi, U. Masood,S. Riaz, K. Ali, C. N. John, and M. Nabeel, “AI4COVID-19: AI enabled preliminary diagnosis for COVID-19from cough samples via an app,” Informatics in MedicineUnlocked, vol. 20, p. 100378, 2020.

[26] A. Pal and M. Sankarasubbu, “Pay attention to thecough: Early diagnosis of COVID-19 using inter-pretable symptoms embeddings with cough sound sig-nal processing,” in Proceedings of the 36th Annual ACMSymposium on Applied Computing, 2021, pp. 620–628.

[27] P. Bagad, A. Dalmia, J. Doshi, A. Nagrani, P. Bhamare,A. Mahale, S. Rane, N. Agarwal, and R. Panicker,“Cough Against COVID: Evidence of COVID-19 Signa-ture in Cough Sounds,” arXiv preprint arXiv:2009.08790,2020.

[28] J. Laguarta, F. Hueto, and B. Subirana, “COVID-19 Arti-ficial Intelligence Diagnosis using only Cough Record-ings,” IEEE Open Journal of Engineering in Medicine andBiology, 2020.

[29] J. Andreu-Perez, H. Perez-Espinosa, E. Timonet,M. Kiani, M. I. Giron-Perez, A. B. Benitez-Trinidad,D. Jarchi, A. Rosales, N. Gkatzoulis, O. F. Reyes-Galavizet al., “A Generic Deep Learning Based Cough AnalysisSystem from Clinically Validated Samples for Point-of-Need Covid-19 Test and Severity Levels,” IEEE Trans-actions on Services Computing, no. 01, pp. 1–1, 2021.

[30] M. Cohen-McFarlane, R. Goubran, and F. Knoefel,“Novel Coronavirus Cough Database: NoCoCoDa,”IEEE Access, vol. 8, pp. 154 087–154 094, 2020.

[31] N. Sharma, P. Krishnan, R. Kumar, S. Ramoji, S. R.Chetupalli, P. K. Ghosh, S. Ganapathy et al., “Coswara–A Database of Breathing, Cough, and Voice Sounds forCOVID-19 Diagnosis,” arXiv preprint arXiv:2005.10548,2020.

[32] A. Muguli, L. Pinto, N. Sharma, P. Krishnan, P. K.Ghosh, R. Kumar, S. Ramoji, S. Bhat, S. R. Chetupalli,S. Ganapathy et al., “Dicova Challenge: Dataset, task,and baseline system for COVID-19 diagnosis usingacoustics,” arXiv preprint arXiv:2103.09148, 2021.

[33] J. Van Hulse, T. M. Khoshgoftaar, and A. Napolitano,“Experimental perspectives on learning from imbal-anced data,” in Proceedings of the 24th InternationalConference on Machine Learning, 2007, pp. 935–942.

11

[34] B. Krawczyk, “Learning from imbalanced data: openchallenges and future directions,” Progress in ArtificialIntelligence, vol. 5, no. 4, pp. 221–232, 2016.

[35] N. V. Chawla, K. W. Bowyer, L. O. Hall, andW. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Re-search, vol. 16, pp. 321–357, 2002.

[36] G. Lemaıtre, F. Nogueira, and C. K. Aridas,“Imbalanced-learn: A Python toolbox to tackle thecurse of imbalanced datasets in machine learning,” TheJournal of Machine Learning Research, vol. 18, no. 1, pp.559–563, 2017.

[37] M. Pahar and T. Niesler, “Machine Learningbased COVID-19 Detection from Smartphone Record-ings: Cough, Breath and Speech,” arXiv preprintarXiv:2104.02477, 2021.

[38] L. L. Blagus, R., “SMOTE for high-dimensional class-imbalanced data,” BMC Bioinformatics, vol. 14, p. 106,2013.

[39] H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: a new over-sampling method in imbalanceddata sets learning,” in International Conference on Intelli-gent Computing. Springer, 2005, pp. 878–887.

[40] H. M. Nguyen, E. W. Cooper, and K. Kamei, “Border-line over-sampling for imbalanced data classification,”International Journal of Knowledge Engineering and SoftData Paradigms, vol. 3, no. 1, pp. 4–21, 2011.

[41] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN:Adaptive synthetic sampling approach for imbalancedlearning,” in 2008 IEEE International Joint Conference onNeural Networks (IEEE World Congress on ComputationalIntelligence). IEEE, 2008, pp. 1322–1328.

[42] Wei Han, Cheong-Fat Chan, Chiu-Sing Choy, andKong-Pang Pun, “An efficient MFCC extractionmethod in speech recognition,” in IEEE InternationalSymposium on Circuits and Systems, 2006.

[43] M. Pahar and L. S. Smith, “Coding and DecodingSpeech using a Biologically Inspired Coding System,”in 2020 IEEE Symposium Series on Computational Intelli-gence (SSCI). IEEE, 2020, pp. 3025–3032.

[44] H. Chatrzarrin, A. Arcelus, R. Goubran, and F. Knoefel,“Feature extraction for the differentiation of dry andwet cough sounds,” in IEEE International Symposium onMedical Measurements and Applications. IEEE, 2011.

[45] M. Pahar, M. Klopper, B. Reeve, G. Theron, R. Warren,and T. Niesler, “Automatic Cough Classification forTuberculosis Screening in a Real-World Environment,”arXiv preprint arXiv:2103.13300, 2021.

[46] M. M. Azmy, “Feature extraction of heart sounds usingvelocity and acceleration of MFCCs based on supportvector machines,” in 2017 IEEE Jordan Conference onApplied Electrical Engineering and Computing Technologies(AEECT), 2017, pp. 1–4.

[47] S. Aydın, H. M. Saraoglu, and S. Kara, “Log energyentropy-based EEG classification with multilayer neu-ral networks in seizure,” Annals of Biomedical Engineer-ing, vol. 37, no. 12, p. 2626, 2009.

[48] R. Bachu, S. Kopparthi, B. Adapa, and B. D. Barkana,“Voiced/unvoiced decision for speech signals basedon zero-crossing rate and energy,” in Advanced Tech-niques in Computing Sciences and Software Engineering.

Springer, 2010, pp. 279–282.[49] L. T. DeCarlo, “On the meaning and use of kurtosis.”

Psychological Methods, vol. 2, no. 3, p. 292, 1997.[50] E. Christodoulou, J. Ma, G. S. Collins, E. W. Steyerberg,

J. Y. Verbakel, and B. Van Calster, “A systematic reviewshows no performance benefit of machine learningover logistic regression for clinical prediction models,”Journal of Clinical Epidemiology, vol. 110, pp. 12–22, 2019.

[51] S. Le Cessie and J. C. Van Houwelingen, “Ridge estima-tors in logistic regression,” Journal of the Royal StatisticalSociety: Series C (Applied Statistics), vol. 41, no. 1, pp.191–201, 1992.

[52] Y. Tsuruoka, J. Tsujii, and S. Ananiadou, “Stochasticgradient descent training for l1-regularized log-linearmodels with cumulative penalty,” in Proceedings of theJoint Conference of the 47th Annual Meeting of the ACL andthe 4th International Joint Conference on Natural LanguageProcessing of the AFNLP, 2009, pp. 477–485.

[53] H. Yamashita and H. Yabe, “An interior point methodwith a primal-dual quadratic barrier penalty functionfor nonlinear optimization,” SIAM Journal on Optimiza-tion, vol. 14, no. 2, pp. 479–499, 2003.

[54] J. Monge-Alvarez, C. Hoyos-Barcelo, P. Lesso, andP. Casaseca-de-la Higuera, “Robust Detection of Audio-Cough Events Using Local Hu Moments,” IEEE Journalof Biomedical and Health Informatics, vol. 23, no. 1, pp.184–196, 2018.

[55] R. X. A. Pramono, S. A. Imtiaz, and E. Rodriguez-Villegas, “Automatic cough detection in acoustic signalusing spectral features,” in 2019 41st Annual Interna-tional Conference of the IEEE Engineering in Medicine andBiology Society (EMBC). IEEE, 2019, pp. 7153–7156.

[56] S. Vhaduri, T. Van Kessel, B. Ko, D. Wood, S. Wang,and T. Brunschwiler, “Nocturnal cough and snoredetection in noisy environments using smartphone-microphones,” in 2019 IEEE International Conference onHealthcare Informatics (ICHI). IEEE, 2019, pp. 1–7.

[57] J.-C. Wang, J.-F. Wang, K. W. He, and C.-S. Hsu,“Environmental sound classification using hybridSVM/KNN classifier and MPEG-7 audio low-level de-scriptor,” in The 2006 IEEE International Joint Conferenceon Neural Network Proceedings. IEEE, 2006, pp. 1731–1735.

[58] V. Bhateja, A. Taquee, and D. K. Sharma, “Pre-Processing and Classification of Cough Sounds inNoisy Environment using SVM,” in 2019 4th Interna-tional Conference on Information Systems and ComputerNetworks (ISCON). IEEE, 2019, pp. 822–826.

[59] B. H. Tracey, G. Comina, S. Larson, M. Bravard, J. W.Lopez, and R. H. Gilman, “Cough detection algorithmfor monitoring patient recovery from pulmonary tuber-culosis,” in 2011 Annual International Conference of theIEEE Engineering in Medicine and Biology Society. IEEE,2011, pp. 6017–6020.

[60] R. V. Sharan, U. R. Abeyratne, V. R. Swarnkar, andP. Porter, “Cough sound analysis for diagnosing croupin pediatric patients using biologically inspired fea-tures,” in 2017 39th Annual International Conference ofthe IEEE Engineering in Medicine and Biology Society(EMBC). IEEE, 2017, pp. 4578–4581.

[61] H. Taud and J. Mas, “Multilayer perceptron (MLP),” in

12

Geomatic Approaches for Modeling Land Change Scenarios.Springer, 2018, pp. 451–455.

[62] L. Sarangi, M. N. Mohanty, and S. Pattanayak, “Designof MLP Based Model for Analysis of Patient Sufferingfrom Influenza,” Procedia Computer Science, vol. 92, pp.396–403, 2016.

[63] J.-M. Liu, M. You, Z. Wang, G.-Z. Li, X. Xu, and Z. Qiu,“Cough detection using deep neural networks,” in2014 IEEE International Conference on Bioinformatics andBiomedicine (BIBM). IEEE, 2014, pp. 560–563.

[64] J. Amoh and K. Odame, “DeepCough: A deep convo-lutional neural network in a wearable cough detectionsystem,” in 2015 IEEE Biomedical Circuits and SystemsConference (BioCAS). IEEE, 2015, pp. 1–4.

[65] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Im-agenet classification with deep convolutional neuralnetworks,” Communications of the ACM, vol. 60, no. 6,pp. 84–90, 2017.

[66] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back,“Face recognition: A convolutional neural-network ap-proach,” IEEE Transactions on Neural Networks, vol. 8,no. 1, pp. 98–113, 1997.

[67] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Under-standing of a convolutional neural network,” in 2017International Conference on Engineering and Technology(ICET). IEEE, 2017, pp. 1–6.

[68] X. Qi, T. Wang, and J. Liu, “Comparison of supportvector machine and softmax classifiers in computervision,” in 2017 Second International Conference on Me-chanical, Control and Computer Engineering (ICMCCE).IEEE, 2017, pp. 151–155.

[69] S. Hochreiter and J. Schmidhuber, “Long short-termmemory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[70] I. D. Miranda, A. H. Diacon, and T. R. Niesler, “A com-parative study of features for acoustic cough detectionusing deep architectures,” in 2019 41st Annual Interna-tional Conference of the IEEE Engineering in Medicine andBiology Society (EMBC). IEEE, 2019, pp. 2601–2605.

[71] E. Marchi, F. Vesperini, F. Weninger, F. Eyben, S. Squar-tini, and B. Schuller, “Non-linear prediction with LSTMrecurrent neural networks for acoustic novelty detec-tion,” in 2015 International Joint Conference on NeuralNetworks (IJCNN). IEEE, 2015, pp. 1–7.

[72] J. Amoh and K. Odame, “Deep neural networks foridentifying cough sounds,” IEEE transactions on Biomed-ical Circuits and Systems, vol. 10, no. 5, pp. 1003–1011,2016.

[73] A. Sherstinsky, “Fundamentals of recurrent neural net-work (RNN) and long short-term memory (LSTM)network,” Physica D: Nonlinear Phenomena, vol. 404, p.132306, 2020.

[74] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residuallearning for image recognition,” in Proceedings of theIEEE Conference on Computer Vision and Pattern Recogni-tion, 2016, pp. 770–778.

[75] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona,D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoftcoco: Common objects in context,” in European Confer-ence on Computer Vision. Springer, 2014, pp. 740–755.

[76] J. Laguarta, F. Hueto, P. Rajasekaran, S. Sarma, and

B. Subirana, “Longitudinal Speech Biomarkers for Au-tomated Alzheimer’s Detection,” 2020.

[77] S. Liu, “Leave-p-out Cross-Validation Test for Uncer-tain Verhulst-Pearl Model With Imprecise Observa-tions,” IEEE Access, vol. 7, pp. 131 705–131 709, 2019.

[78] A. Racz, D. Bajusz, and K. Heberger, “Effect of DatasetSize and Train/Test Split Ratios in QSAR/QSPR Mul-ticlass Classification,” Molecules, vol. 26, no. 4, p. 1111,2021.

[79] T. Fawcett, “An introduction to ROC analysis,” PatternRecognition Letters, vol. 27, no. 8, pp. 861–874, 2006.

[80] P. A. Devijver and J. Kittler, Pattern recognition: A statis-tical approach. Prentice Hall, 1982.

Madhurananda Pahar received his BSc inMathematics from the University of Calcutta, In-dia, and his MSc in Computing for Financial Mar-kets followed by his PhD in Computational Neu-roscience from the University of Stirling, Scot-land. Currently, he is working as a post-doctoralfellow at the University of Stellenbosch, SouthAfrica. His research interests are in machinelearning and signal processing for audio signalsand smart sensors in bio-medicine such as thedetection and classification of TB and COVID-19

coughs in a real-world environment.

Marisa Klopper is a researcher at the Divisionof Molecular Biology and Human Genetics ofStellenbosch University, South Africa. She holdsa PhD in Molecular Biology from StellenboschUniversity and her research interest is in TBand drug-resistant TB diagnosis, epidemiologyand physiology. She has been involved in coughclassification for the last 6 years, with applicationto TB and more recently COVID-19.

Robin Warren is the Unit Director of the SouthAfrican Medical Research Council’s Centre forTuberculosis Research and Distinguished Pro-fessor at Stellenbosch University. He has a B2rating by the National Research Council (NRF)and is a core member of the DSI-NRF Centreof Excellence for Biomedical Tuberculosis Re-search and head the TB Genomics researchthrust. He has published over 320 papers in thefield of TB and have an average H-index (Sco-pus, Web of Science and Google Scholar) of 65.

13

Thomas Niesler obtained the B.Eng (1991) andM.Eng (1993) degrees in Electronic Engineeringfrom the University of Stellenbosch, South Africaand a Ph.D. from the University of Cambridge,England, in 1998. He joined the Departmentof Engineering, University of Cambridge, as alecturer in 1998 and subsequently the Depart-ment of Electrical and Electronic Engineering,University of Stellenbosch, in 2000, where hehas been Professor since 2012. His researchinterests lie in the areas of signal processing,

pattern recognition and machine learning.

Documents

1 COVID-19 Cough Classiﬁcation using Machine Learning and