80
Multi-Modal Heart-Beat Estimation On an iPhone by Narges Norouzi A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto Copyright c 2014 by Narges Norouzi

New Multi-Modal Heart-Beat Estimation On an iPhone · 2016. 2. 18. · Abstract Multi-Modal Heart-Beat Estimation On an iPhone Narges Norouzi Master of Applied Science Graduate Department

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Multi-Modal Heart-Beat Estimation On an iPhone

    by

    Narges Norouzi

    A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

    Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

    Copyright c© 2014 by Narges Norouzi

  • Abstract

    Multi-Modal Heart-Beat Estimation On an iPhone

    Narges Norouzi

    Master of Applied Science

    Graduate Department of Electrical and Computer Engineering

    University of Toronto

    2014

    Current generation smartphone video cameras and microphones enable photoplethys-

    mography (PPG) and phonocardiography (PCG) acquisition. In this thesis, I utilized

    the iPhone microphone and camera to measure heart rate. We developed a heart rate

    measurement system using triple sensing mechanisms (finger and face color changes and

    heart sound measurement) all on the iPhone. The three proposed measurement systems

    each provide an independent heart rate estimate, as well as a combined estimation based

    on the fusion of the individual sensors.

    The proposed algorithm estimates the heart rate by (1) heart pulse analysis to com-

    pute the heart rate of the user using our version of the EMD algorithm which is used in

    advanced biomedical signal processing, (2) assessing the quality of the PPG and PCG

    waveforms using the Support Vector Machine (SVM) classifier,(3) concisely combining

    heart rate information from the three different modalities based on the assessed quality

    of the waveforms.

    ii

  • I dedicate my MASc thesis to my dear parents - Malek Norouzi and

    Fakhrolsadat Nabavi, for the advice, guidance, and opportunity they have

    provided me throughout my personal and professional life.

    Acknowledgements

    I would like to say special thanks to Prof. Parham Aarabi, who has supported me through

    my Masters; keeping me going when times were tough, asking insightful questions, and

    offering invaluable advice.

    I would also like to thank my colleagues Mike and Gary for helping with data collection,

    iPhone application development and implementation.

    iii

  • Contents

    1 Introduction 1

    2 Background 3

    2.1 Cardiovascular System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.2 Prior Work on heart rate Monitoring . . . . . . . . . . . . . . . . . . . . 4

    2.2.1 Photoplethysmography . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2.1.1 Photoplethysmography by Smartphone’s Videocamera . 6

    2.2.2 Phonocardiography . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.2.3 heart rate Monitoring on Smartphones . . . . . . . . . . . . . . . 10

    2.2.4 Signal Processing Algorithms . . . . . . . . . . . . . . . . . . . . 13

    2.2.4.1 Empirical Mode Decomposition . . . . . . . . . . . . . . 13

    2.2.4.1.1 Applications of EMD Algorithm . . . . . . . . . 17

    2.2.4.2 Ensemble Empirical Mode Decomposition . . . . . . . . 18

    2.2.4.3 The Fourier Transform and STFT . . . . . . . . . . . . 19

    2.2.4.4 Wavelet Transform (WT) . . . . . . . . . . . . . . . . . 20

    3 Application Architecture 23

    3.1 Fingertip Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3.2 Face Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.3 Audio Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.4 Heartbeat Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 30

    iv

  • 3.5 Signal Filtering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.5.1 FIR Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.5.2 EMD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.5.2.1 Decomposition . . . . . . . . . . . . . . . . . . . . . . . 36

    3.5.2.2 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 37

    3.5.3 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.6 Peak Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    3.7 Learning System for heart rate Estimation based on Support Vector Ma-

    chines (SVMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    3.7.1 SVM Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    3.7.2 SVM Classifier Implementation . . . . . . . . . . . . . . . . . . . 45

    3.8 Multi-Channel heart rate Estimation . . . . . . . . . . . . . . . . . . . . 47

    4 Experimental Results 50

    4.1 Heartbeat Detection Accuracy without the use of the SVM Classifier . . 51

    4.2 SVM Classifier Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    4.3 Heartbeat Detection Accuracy by using SVM Classifier . . . . . . . . . . 57

    5 Conclusion 61

    Bibliography 62

    Appendices 69

    A Acronyms 70

    v

  • List of Tables

    2.1 Heart rate for different ages . . . . . . . . . . . . . . . . . . . . . . . . . 4

    4.1 Root Mean Square Error between heart rate measured by the pulse oxime-

    ter and heart rate estimation using FIR, DWT, and EMD filtering for all

    70 subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    4.2 Average percentage error between heart rate measured by pulse oximeter

    and heart rate estimation using FIR, DWT, and EMD filtering for all 70

    subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    4.3 Determining sensitivity and specificity of the SVM classifier for 84 de-

    noised 5 seconds waveform segments of the PPG signal, recorded from the

    fingertip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    4.4 Determining sensitivity and specificity of the SVM classifier for 84 de-

    noised 5 seconds waveform segments of the PPG signal recorded from the

    face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    4.5 Determining sensitivity and specificity of the SVM classifier for 84 de-

    noised 5 seconds waveform segments of the PCG signal . . . . . . . . . . 56

    4.6 Root Mean Square Error between the heart rate measured by the pulse

    oximeter and heart rate estimation using the SVM classifier and FIR,

    DWT, and EMD filtering for all 70 subjects . . . . . . . . . . . . . . . . 58

    vi

  • 4.7 Average percentage error between heart rate measured by pulse oximeter

    and heart rate estimation using the SVM classifier and FIR, DWT, and

    EMD filtering for all 70 subjects . . . . . . . . . . . . . . . . . . . . . . . 59

    vii

  • List of Figures

    2.1 A typical ECG signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 General scheme to record video for PPG acquisition . . . . . . . . . . . . 7

    2.3 Phonocardiogram copied from [19] . . . . . . . . . . . . . . . . . . . . . . 9

    2.4 Auscultation areas on chest . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.5 Sifting process and envelopes . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.6 Decomposition of sample ECG signal into its first 12 IMFs. . . . . . . . . 16

    2.7 Discrete Wavelet Transform decomposition . . . . . . . . . . . . . . . . . 22

    3.1 Block diagram of the application architecture . . . . . . . . . . . . . . . 24

    3.2 Video recording from the fingertip using back camera . . . . . . . . . . . 25

    3.3 Region of interest in each frame . . . . . . . . . . . . . . . . . . . . . . . 25

    3.4 Example of fingertip data obtained by the described capture method . . 26

    3.5 Video recording from the face using front-facing camera . . . . . . . . . . 27

    3.6 Example of PPG signal from the face obtained by the described capture

    method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    3.7 Example of PCG recorded by the microphone and identifiable heart sounds 29

    3.8 Elements of the proposed algorithm . . . . . . . . . . . . . . . . . . . . . 31

    3.9 FIR filtering module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.10 Sample data captured from the device. It has already been filtered and

    down-sampled as specified. . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    viii

  • 3.11 Sample PPG recording with baseline wander, baseline, and clean PPG

    after removing the baseline . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.12 Original PPG recorded from the fingertip and clean PPG after applying

    FIR filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    3.13 Original PPG recorded from the face and clean PPG after applying FIR

    filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    3.14 Original PCG recorded from the chest of the User and clean PCG after

    applying FIR filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.15 Original PPG recorded from the fingertip and the decomposed IMFs using

    the EMD algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    3.16 Decomposition of the PPG signal into IMFs using the EMD algorithm. . 39

    3.17 Power Spectral Density of the decomposed IMFs. . . . . . . . . . . . . . 39

    3.18 Original PPG recorded from the fingertip and the clean signal after apply-

    ing the EMD algorithm and reconstructing the signal based on the Power

    Spectral Density of the IMFs. . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.19 Block diagram of the wavelet decomposition and reconstruction. . . . . . 41

    3.20 An illustration of the peak detection algorithm . . . . . . . . . . . . . . . 43

    3.21 Histogram of peak-to-peak distance from the fingertip recording . . . . . 48

    3.22 Histogram of peak-to-peak distance from the face recording . . . . . . . . 48

    3.23 Histogram of peak-to-peak distance from the audio recording . . . . . . . 49

    3.24 Histogram of peak-to-peak distance from 3 modalities . . . . . . . . . . . 49

    4.1 Comparison between the accuracy of proposed heartbeat rate estimation

    system with and without the use of SVM classifier in terms of RMSE . . 59

    4.2 Comparison between the accuracy of proposed heartbeat rate estimation

    system with and without the use of SVM classifier in terms of average

    percentage error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    ix

  • Chapter 1

    Introduction

    Advancements in sensor technology allow for new models in automated healthcare

    monitoring. Currently, specialized devices such as electrocardiograph, pulse oximeters

    and phonocardiagraphs are used to measure heart rates. Furthermore, wireless heart

    rate monitors are widely available and provide users with realtime estimates of their

    HR at rest, during and following physical activities [1]. However, wireless HR monitors

    often require wearing a strap around the chest or arm. As heart rate monitors receive

    wide distribution to use as low-cost physiological measurement solutions, the alternative

    idea of using smartphones as heart rate monitors has now emerged. With the heart rate

    monitor applications on smartphones, people do not need to carry heart rate monitors,

    which is much more convenient.

    In recent years, automated health monitoring with mobile smartphones has become

    the subject of great interest [2]. In particular, there has been significant interest in

    accurately estimating heart beat frequency using the smartphone’s built-in camera, ac-

    celerometer, gyroscope, and microphone. This is a relatively easy way to measure user

    heart rates since it does not require any special skills or buying special devices. All that

    is needed is a smartphone with on-board sensors.

    1

  • Chapter 1. Introduction 2

    Monitoring heart rates using a smartphones is important as a non-invasive remote

    health care monitoring option. Monitoring heart rate is both important during exercise

    and rest. Resting heart rates are a good indicator of aerobic fitness and also reduce

    the risk of heart attack. On the other hand, measuring heart rate before, during, and

    after exercise improves the quality of exercise and also ensures the safety of the fitness

    program.

    Currently, there are many applications in the App Store that measures the user’s

    heart rate using either auscultation or pulse oximetry. Auscultation is done in “Heart

    Monitor for iPhone” [21] and “Heart Record” [22]. Pulse oximetry is done both through

    the finger in applications such as “Instant Heart Rate” [24] and “Heart Beat Rate” [25]

    and also through the face in “Cardiio” [28] and “Touchless Pulse Monitor”[29].

    In this thesis, an iPhone application is developed and tested in order to demonstrate

    the potential of the iPhone for measuring heartbeat rates in realtime. This application

    makes use of the iPhones front-facing and back cameras and its microphone for PPG

    and PCG acquisition in order to provide an estimate of users heart rate. The goal of

    this project is to make the application fully functional, providing people with precise and

    useful wellness information based on their pulse using more advanced signal processing

    techniques like EMD and machine learning algorithms.

    The proposed heartbeat estimation system in this thesis provides an estimation of

    user’s HR by (1) computing HR of the input PPGs and PCG signals using our version

    of Empirical Mode Decomposition (EMD) algorithm, (2) assessing the quality of input

    signals in order to distinguish between good/bad waveform segments using the SVM clas-

    sifier, (3) concisely combining heart rates information from the three different modalities

    based on the assessed quality of signals.

    Finally, a quantitative comparison between EMD-based filtering and other filtering algo-

    rithms like Discrete Wavelet Transform (DWT) algorithm and FIR filtering is conducted.

  • Chapter 2

    Background

    2.1 Cardiovascular System

    The circulatory system is responsible for transporting and delivering the blood cell’s

    their nutrition, water and oxygen and carries away waste such as carbon dioxide that

    cells produce. The heart serves as a pump to deliver blood to the body’s tissues. The

    heart does so by undergoing a cycle of contraction and relaxation called the cardiac cycle.

    The heart is comprised of atrium and ventricles to pump blood in each cardiac cycle.

    The heart rate or pulse rate is typically expressed as the number of beats per minute

    (bpm). The pulse rate varies according to the body’s physical and psychological need such

    as age, physical exercise, anxiety, level of stress, and drugs. Although high pulse rates

    indicate abnormality in heart activity and can help determine various problems within

    the body, it cannot be used solely to diagnose an abnormality. Table 2.1 demonstrates

    the heartbeat range for different ages.

    3

  • Chapter 2. Background 4

    Table 2.1: Heart rate for different ages

    Age heart rate (bpm)

    Newborn 100-160

    0-5 months 90-150

    6-12 months 80-140

    1-3 years 80-130

    3-5 years 80-120

    6-10 years 70-110

    11-14 years 60-105

    14+ years 60-100

    2.2 Prior Work on heart rate Monitoring

    Heart rate is the rate at which the heart beats, measured either in the wrist or neck

    given by beats per minute. The pulse rate can be felt directly on the wrist or neck by

    pressing it with the index and middle finger. A more precise method of determining

    heart rate involves the use of an Electrocardiograph (ECG) or a Photoplethysmography

    (PPG). ECG monitors the electrical changes occurring during the cardiac cycle from the

    surface of the body. A normal ECG recording associated with a single cardiac cycle

    contains three waveforms (Figure 2.1).

    The P wave shows the sequential activation of the right and left atria. The QRS

    complex (which consists of the Q, R, and S waves) represents the simultaneous activa-

    tion of the right and left ventricles. The last waveform, the T wave, is triggered by the

    repolarization of the ventricles.

  • Chapter 2. Background 5

    Figure 2.1: A typical ECG signal

    In the rest of this section, the two other non-invasive methods, photoplethysmog-

    raphy and phonocardiography, for measuring heart rate are described in detail.

    2.2.1 Photoplethysmography

    Non-invasive measurements of temporal variation in the blood volume by pulse oxime-

    try is acknowledged to be one of the most important technological advances in monitor-

    ing a patient’s heart rate in clinical settings [3]. The photoplethysmograph was first

    introduced by Hertzman [4], composed of a light source and a photo-detector. In pho-

    toplethysmography a sensor is placed on a thin part of the patient’s body like the ear

    lobes, fingertips, or toes, where a high degree of superficial vasculature exist.

    Photoplethysmogram (PPG) waveform is formed by measuring the amount of light

    passing through the skin and represent the changes in the shape of the pulse. This phe-

    nomenon is caused by absorption of light by the capillaries, which become full of blood in

  • Chapter 2. Background 6

    each heartbeat cycle and thus less light can pass through them. The PPG obtained from

    pulse oximetry has been shown to be used for estimating other important physiological

    features such as blood oxygen saturation and breath rate effectively [5].

    2.2.1.1 Photoplethysmography by Smartphone’s Videocamera

    Two popular portable devices for measuring the heart rate are pulse oximeters (which

    attach to one of the fingers) and heart rate monitors (which use a belt to electronically

    detect the heart rate and relay that information to a specially designed watch). In order

    for these types of “standalone” products to reach the public, there would be significant

    cost and require the user to purchase a device that was designed for a singular purpose,

    which is inconvenient.

    A more convenient alternative would be to create an application on a smartphone that

    could use the hardware functionality of the smartphone to capture the heart rate. Most

    recent smartphones are equipped with high-resolution cameras and LEDs. This is very

    similar to the construction of the pulse oximeters. Users should place their finger on the

    smartphone’s camera covering both the camera and LED. A schematic picture of video

    recording for PPG acquisition on a smartphone is shown in Figure 2.2.

    [6] proposed to use the smartphone’s camera for PPG acquisition. The waveform

    acquisition was done on a Nokia E63 and it is reported that the green channel signal

    is more informative than the red channel signal. However, [7] showed that the distri-

    bution of the pixels in the green channel is not uniform for different smartphones like

    HTC HD2, iPhone 4, Nokia or Samsung, and red channel characteristics are similar for

    different smartphones.

    The PPG signal acquisition on smartphones utilizes the same image acquisition con-

    cept that is available in pulse oximeters. In order to determine oxygenated and deoxy-

    genated blood, based on the blood opacity, the average of the red channel intensities in

  • Chapter 2. Background 7

    Figure 2.2: General scheme to record video for PPG acquisition

    each frame is calculated and the plot of average red channel intensities over time is an

    indication of PPG.

    To estimate the heart rate reliably from the PPG signal recorded by a smartphone,

    the affect of finger pressure on the the lens of camera, finger movement during recording,

    and illumination level of the environment must be taken into account. Several methods

    have been proposed to consider these factors in the literature of PPG processing.

    [8] introduced the idea of removing motion artifact from the PPG signal for the accu-

    rate measurement of aerial oxygen saturation during movement. They use a combination

    of Independent Component Analysis (ICA) and block interleaving with low-pass filter-

    ing to reduce the motion artifact. Enriquez et al. in [9] studied the plethysmographic

    signal using Principle Components Analysis (PCA) and claimed that clinically relevant

    parameters can be obtained from PPG when PCA is used. Furthermore, [10] presented

    a realtime de-noising algorithm for PPG and ECG signals for measuring pulse rate and

  • Chapter 2. Background 8

    blood pressure using Discrete Wavelet Transform (DWT). Additionally, [11] reduced the

    influence of force variation on the estimation of the heart rate by means of Continuous

    Wavelet Transform (CWT). In their study, the experiment was conducted under three

    different force conditions - low, medium, and high.

    In another category of PPG processing algorithms, the idea of using Intrinsic Mode

    Functions (IMF) using Empirical Mode Decomposition (EMD) is introduced [12].

    Finally, several data-driven decision-support systems have been developed in order to

    produce meaningful results from the physiologic data, mainly PPG signals. [13] and [14]

    used Support Vector Machine (SVM) and Neural Network (NN) respectively to assess

    PPG signal and extract heart rate information.

    2.2.2 Phonocardiography

    Heart sound is an essential tool in the clinical setting and provides clinicians with

    valuable diagnostic information on heart diseases. However, phonocardiogram (PCG)

    is a complex signal to analyze visually and heart auscultation can take several years to

    learn; and also has a high degree of subjectivity. But, the low cost of phonocardiography

    still keeps it among the most desirable clinical techniques.

    Phonocardiography breaks the heartbeat into 4 distinct sections. The first sound

    (“S1”) occurs during systole and is produced by opening the heart valves, audibly heard

    as the “lub” in the popular “lub-dub” description of the heart [15]. The second sound

    (“S2”) occurs during diastole and is produced by the valves closing; this is the “dub”.

    These are considered as normal heart sounds [16]. S1 and S2 can be clearly heard while

    listening to a patient’s heart with a stethoscope.

  • Chapter 2. Background 9

    The next two possible sounds (S3 and S4) are generally abnormal in adults and

    produce a distinct “galloping” heartbeat [17]. Finally, there is a class of sounds called

    “murmurs” that can occur during any of the 4 phases and are caused by various ab-

    normalities in the heart valve. Detection and analysis of these murmurs is often critical

    in the diagnosis of heart problems [18]. Figure 2.3 illustrates the normal and abnormal

    PCG signals and their corresponding ECG signal.

    Figure 2.3: Phonocardiogram copied from [19]

    There are four main areas of auscultation on the patient’s chest (Figure 2.4) that

    are optimal sites for auscultation. In these sites the intensity of the heart sound is the

    highest because the sound is being transmitted through solid tissue or through minimal

    thickness of inflated lung.

  • Chapter 2. Background 10

    Figure 2.4: Auscultation areas on chest

    2.2.3 heart rate Monitoring on Smartphones

    For several years people measured heart rate by listening to the heart’s sound though

    the patient’s chest. At the start of the 20th century, Einthoven developed electrocardio-

    graph (ECG). With an ECG, it is possible to record the electrical changes during each

    heartbeat cycle and make a graphic recording of this activity.

    In the 1980s, the first wireless Heart Rate Monitor (HRM) consisting of a transmitter

    and a receiver was developed. The transmitter could be attached to the chest using

    either disposable electrodes or an elastic electrode belt. The receiver was a watch-like

    monitor worn on the wrist [20]. The development of this relatively small wireless monitor

    resulted in increased utilization of HRMs by sportsmen. As a consequence, the objective

    measure of HR replaced the more subjective perceived exertion as an indicator of exercise

    intensity. Another relatively recent development in HR monitoring is the measurement

    of Heart Rate Variability (HRV) that may have various applications. These features and

    their reliability and validity will be discussed in the following sections.

  • Chapter 2. Background 11

    The latest category of devices for monitoring heart rate are smartphones. Cur-

    rently, in the App Store there are applications that measure the user’s heart rate using

    either auscultation or pulse oximetry. Auscultation is done with the “Heart Monitor

    for iPhone” [21] and “Heart Record” [22]. Pulse oximetry is done through the finger in

    “Heart Rate - Free” [23], “Instant Heart Rate” [24], “Heart Beat Rate” [25], “Runtastic”

    [26], and “HeartTracker” [27].

    Pulse oximetry can be also done through the face in “Cardio” [28], “Touchless Pulse

    Monitor” [29] and “What’s My Heart Rate” [30]. Users need to hold the iPhone roughly

    six inches (15 cm) in front of them and line up their face inside a guiding box. Cardiio

    uses only the front-facing camera for video recording and claims an accuracy of within

    3 beats per minute of a clinical pulse oximeter [31]. It can estimate heart rate through

    both face and fingertip separately. In the “What’s My Heart Rate” application, users can

    switch between front-facing and back cameras in order to measure heart rate of others

    and it also measures breath rate in its premium version.

    Another category of mobile applications use external heart rate monitors, like a heart

    rate monitoring strap to measure the heart beat rate and then the data is transferred to

    the smartphone for recording in the history. This type of application is mostly used for

    tracking workouts and monitoring the heart rate before, during and after each workout.

    The “Digifit iCardio Multi-Sport Heart Rate Monitor Training” application is an exam-

    ple of this type of application that uses the heart rate monitor strap to monitor heart

    pulse during workouts [32]. Another example is “Fitbeat Heart Rate Monitor”, which

    works with a 5.3K Hz un-coded heart rate belt or Bluetooth smart devices.

    In addition to the above heart rate monitors, there is currently a paid app developed

    by Azumio- “Stress Check Pro” that uses a pulse oximetry technique to estimate the

    user’s stress level [33].

  • Chapter 2. Background 12

    One note is that most of these applications do not try to infer any information from

    the heartbeat other than the heart rate, and that they all use only one method to obtain

    that information. It should be pointed out that most of the developers of these mobile

    applications mentioned that their apps are intended for “informational and entertain-

    ment purposes only”, and shouldn’t be used instead of professional medical equipment.

    Different studies have been conducted to explore the potential of the smartphone to

    estimate heart rate. [34] used video recorded from the face of the user using the front-

    facing camera of an iPhone 4 as an indication of the PPG signal. Then, they detected

    the facial region in each frame and extracted the cardiac pulse signal using frequency

    analysis of the raw trace signal and the analyzed signal form ICA.

    Also, Laure et al. analyzed the PPG recorded from the fingertip of the user and intro-

    duced two different peak detection algorithms for HR estimation [35], [36]. They applied

    the two proposed peak detection algorithms introduced in [35] and [36] on a set of 50

    test measurements. In 20% of the calculations using the peak detection algorithm in [35],

    the estimated values differ from real heart rate by more than 5%. Also, the application

    of the peak detection algorithm proposed in [36] on the same data yielded 8% incorrect

    calculations.

  • Chapter 2. Background 13

    2.2.4 Signal Processing Algorithms

    Several signal processing algorithms can be introduced to remove the noise added

    to the original PPG and PCG signals recorded by a smartphone. The noise added to

    the PPG signal normally corresponds to the different illumination levels during the video

    recording, motion artifact added to the signal by face or finger movement, different finger

    pressure levels on the camera, and objects that are covering some part of the face. Also,

    the noise that presents in the recorded PCG corresponds to the background noise in the

    environment and movement of the phone on the chest during the recording.

    In this section, we will provide an overview of the different signal processing algorithms

    used in processing PPG and PCG signals in the literature.

    2.2.4.1 Empirical Mode Decomposition

    Empirical Mode Decomposition (EMD) [37] is a method of breaking down signal with-

    out leaving the time domain. EMD is an recursive method introduced to analyze non-

    linear and non-stationary signals like biomedical signals [39]. This algorithm is based

    on the decomposition of the original signal into a collection of Intrinsic Mode Functions

    (IMFs) using a numerical sifting process.

    IMFs must fulfill two conditions: i) the number of extremas and the number of zero

    crossings must be equal or different at most by one; and ii) the mean value between the

    upper and lower envelopes is zero everywhere.

    The sifting process can be separated into following steps:

    For a signal x(t), let m1 be the mean of its upper and lower envelopes as determined

    from a cubic-spline interpolation of local maxima and minima. The locality is deter-

    mined by an arbitrary parameter; the calculation time and the effectiveness of the EMD

    depends greatly on such a parameter.

  • Chapter 2. Background 14

    The first component h1 is computed:

    h1 = x(t)−m1. (2.1)

    • In the second sifting process, h1 is treated as the data, and m11 is the mean of h1’s

    upper and lower envelopes:

    h11 = h1 −m11. (2.2)

    • This sifting procedure is repeated k times, until h1k is an IMF, that is:

    h1(k−1) −m1k = h1k (2.3)

    .

    • Then, it is designated as c1=h1k, the first IMF component from the data, which

    contains the shortest period component of the signal. We separate it from the rest

    of the data:

    X(t)− c1 = r1. (2.4)

    • The procedure is repeated on rj : r1 − c2 = r2, ...., rn−1 − cn = rn.

    • Thus the original signal x(t) can be expressed as:

    x(t) =n∑

    j=1

    cj(t) + rn(t). (2.5)

    • cj(t) is an IMF where j represents the number of corresponding IMF and rn(t) is

    residue.

    The sifting process, the maximum, minimum, and the mean envelopes are shown in

    Figure 2.5.

  • Chapter 2. Background 15

    0 1 2 3 4 5 6 7 8 9 10−1

    −0.5

    0

    0.5

    1x 10

    −3

    Time (s)

    Am

    plit

    ud

    e

    Data

    Upper Envelope

    Lower Envelope

    Mean

    Figure 2.5: Sifting process and envelopes

    The stoppage criterion determines the number of sifting steps to produce an IMF.

    Two different stoppage criteria have been used traditionally:

    1. This criterion is proposed by Huang et al. [38] and defined as a sum of the differ-

    ences, SD,

    SDk =

    T∑t=0

    |hk−1(t)− hk(t)|2

    T∑t=0

    h2k−1(t)

    (2.6)

    Then the sifting process will stop when SD is smaller than a pre-given value.

    2. A second criterion is based on the number called the S-number, which is defined as

    the number of consecutive siftings when the numbers of zero-crossings and extrema

    are equal or at most differing by one. Specifically, an S-number is pre-selected. The

  • Chapter 2. Background 16

    sifting process will stop only if for S consecutive times the numbers of zero-crossings

    and extrema stay the same, and are equal or at most differ by one.

    The EMD decomposes non-stationary signals into narrow-band components with de-

    creasing frequency. The decomposition is complete, almost orthogonal, local and adap-

    tive. All IMFs form a completely and “nearly” orthogonal basis for the original signal.

    The basis directly comes from the signal, which guarantees the inherent characteristic

    of the signal and avoids the diffusion and leakage of signal energy. The sifting process

    eliminates riding waves, so each IMF is more symmetrical and is actually a zero mean

    AM-FM component. An example of decomposition of ECG signal into its first 12 IMFs

    is shown in Figure 2.6.

    Figure 2.6: Decomposition of sample ECG signal into its first 12 IMFs.

  • Chapter 2. Background 17

    Mode mixing appears to be the most significant drawback of the EMD algorithm,

    which implies either a single IMF consisting of signals of dramatically disparate scales or

    a signal of the same scale appearing in different IMF components, and usually causing

    intermittency of the analyzing signal.

    2.2.4.1.1 Applications of EMD Algorithm Nimunkar in [39] implemented the

    EMD algorithm for filtering noisy ECG signals and compared the result of the EMD

    algorithm with a traditional low-pass filtering approach. Also Tong et al. in [40] used

    empirical mode decomposition for filtering power line noise in electrocardiogram signal.

    They added pseudo noise at a frequency higher than the highest frequency of the signal to

    filter out just the power line noise in the first IMF. They also compared the results with

    traditional IIR-based bandstop filtering. This technique can also be used for filtering

    power line noise during the enhancement of stress ECG signals. Furthermore, [41] used

    EMD and PCA algorithms to obtain cardiovascular signals from the sensing hardware

    embedded in a chair.

    In another study, [42] showed that its proposed methods using EMD algorithm provides

    better performance of noise reduction than wavelet thresholding de-noising methods in

    aspects of remaining geometrical characteristics of ECG signal and the signal-to-noise

    ratio (SNR).

    The steps for de-noising the ECG signal proposed by [42] using the EMD are:

    • Transform the noisy ECG signal s(k) by EMD, ci is used to denote a series of IMFs

    of EMD at scale i, where i = 1, 2, .., n

    • Calculate the mean square value δi at scale i, then threshold ti can be determined

    by 3δ rule

    • Apply the hard-thresholding method to obtain the estimated IMFs ci as follows:

  • Chapter 2. Background 18

    c̃i(k) =

    ci(k) if |ci(k)| ≥ ti

    0 if |ci(k)| < ti(2.7)

    • Reconstruct the de-noised ECG signal s(k) from c̃i(k)

    2.2.4.2 Ensemble Empirical Mode Decomposition

    Ensemble EMD (EEMD) was introduced to remove the mode-mixing effect. The

    EEMD overcomes largely the mode-mixing problem of the original EMD by adding white

    noise into the targeted signal repeatedly, and provides physically unique decompositions

    when it is applied to data with mixed and intermittent scales.

    The EEMD decomposing process can be separated into the following steps:

    • Add a white noise series w(t) to the targeted data x(t) , the noise must be zero

    mean and variance constant, so X(t) = x(t) + w(t).

    • Decompose the data with added white noise into Intrinsic Mode Functions (IMFs)

    and residue rn,

    X(t) =n−1∑j=1

    cj + rn (2.8)

    • Repeat step 1 and step 2 for N times, but with different white noise, wi(t), serried

    each time. So,

    Xi(t) =n−1∑j=1

    cij + rin (2.9)

    • Obtain the ensemble means of corresponding IMFs of the decompositions as the

    final result. Each IMF is obtained by decomposing the target signal.

    cj =1

    N

    N∑i=1

    cij (2.10)

  • Chapter 2. Background 19

    This new approach utilizes the full advantage of the statistical characteristics of uni-

    form distribution of frequency of white noise to improve the EMD method. Adding white

    noise into the targeted signal, all scales continue to avoid the mode-mixing phenomenon.

    Comparing the IMF component at the same level, EEMD has more concentrated and

    band limited components.

    2.2.4.3 The Fourier Transform and STFT

    The Fourier Transform (FT), X(ω), of a signal x(t) is defined as:

    X(ω) =

    ∫ ∞−∞

    x(t)e−jωtdt (2.11)

    where t and ω are the time and frequency parameters, respectively. It defines the spec-

    trum of x(t) which consists of components at all frequencies over the range of which it is

    nonzero.

    Historically, Fourier spectrum analysis has provided a general method for examining

    the global energy-frequency distribution. Fourier analysis has dominated the data anal-

    ysis efforts soon after its introduction because of its power and simplicity. The Fourier

    transform belongs to the class of orthogonal transformations that uses fixed harmonic

    basis functions. The Fourier transform result can be shown as a decomposition of the

    initial signal into harmonic functions with fixed frequencies and amplitudes.

    For many signals, Fourier analysis is useful because the signal’s frequency content

    is important. But Fourier analysis has a serious drawback for information loss while

    transforming the signal to frequency domain. It is only valid under extremely general

    conditions, (i.e. the system must be linear, and the data must be strictly periodic or

    stationary) otherwise the resulting spectrum will make little physical sense.

    Dannis Gabor in 1946 adapted the Fourier transform to analyze only a small set of sig-

  • Chapter 2. Background 20

    nals at a time. It is called Short-Time Fourier Transform (STFT). The STFT is obtained

    from the usual FT by multiplying time domain signal x(t) by an appropriate sliding time

    window w(t). Thus, instead of the usual FT expression one gets a time-frequency ex-

    pression of the form:

    X(τ, ω) =

    ∫ ∞−∞

    x(t)w(t− τ)e−jωtdt (2.12)

    where w(t) is the time window applied to the signal.

    The information STFT provides has limited precision, which is determined by the size

    of the window.

    2.2.4.4 Wavelet Transform (WT)

    The Wavelet Transfrom (WT) is used to analyze the signal in time and frequency

    domain. The WT describes the properties of a waveform that change over time and the

    waveform is divided into segments of scale. It involves representing a time function in

    terms of simple, fixed building blocks, termed wavelets. These building blocks are actu-

    ally a family of functions, which are derived from a single generating function called the

    mother wavelet by translation and dilation operations.

    The WT can be categorized into two types of continuous and discrete. Continuous

    Wavelet Transform (CWT) is used to divide a continuous-time function into wavelets.

    The CWT of a continuous, square-integrable function x(t) at a scale a > 0 and transla-

    tional value b ∈ R is dened by:

    Wω(a, b) =1√|a|

    ∫ +∞−∞

    x(t)g∗(t− ba

    )dt (2.13)

    Where ∗ denotes a complex conjugate, g(t) is a so-called analyzing wavelet and is a

    continuous function in both the time domain and frequency domain. g(t) is called the

  • Chapter 2. Background 21

    mother wavelet.

    To recover the original signal x(t), inverse continuous wavelet transform can be ex-

    ploited.

    x(t) =

    ∫ +∞0

    ∫ +∞−∞

    1

    a2Xω(a, b)

    1√|a|g̃

    (t− ba

    )db da (2.14)

    g̃(t) is the dual function of g(t).

    The analyzing wavelet g(t) should satisfy a certain number of properties. The most

    important property is integrability and square integrability. Also, the wavelet has to be

    concentrated in the time and frequency as much as possible.

    However, calculating wavelet coefficients for every possible scale can represent a consid-

    erable effort and result in a vast amount of data. Therefore, Discrete Wavelet Transform

    (DWT) is often used. The WT can be thought of as an extension of the classic Fourier

    transform, except that, instead of working on a single scale (time or frequency), it works

    on a multi-scale basis. This multi-scale feature of the WT allows the decomposition of

    a signal into a number of scales, each scale representing a particular coarseness of the

    signal under study [43].

    The DWT of a signal x[n] is calculated by passing it through a series of filters. In

    each stage two 2 digital filters and 2 down samplers by 2 exist as shown in Figure 2.7.

    g[n] is the discrete mother wavelet and is a high-pass filter and h[n] is its minor version

    and low-pass in nature.

    The outputs giving the detail coefficients (from the high-pass filter) and approxima-

    tion coefficients (from the low-pass filter) is computed as follows:

    ylow[n] =

    +∞∑k=−∞

    x[k]h[2n− k]

    yhigh[n] =+∞∑

    k=−∞

    x[k]g[2n− k](2.15)

  • Chapter 2. Background 22

    x[n]

    h[n]

    h[n]g[n]

    g[n]

    2

    22 ...

    Level 1 DWT coefficients

    Level 2 DWT coefficients Level 3 DWT

    coefficients

    Figure 2.7: Discrete Wavelet Transform decomposition

    The wavelet transform is often compared with the Fourier transform. The Fourier

    transform is a powerful tool for processing stationary signals (a signal where there is no

    change in properties of the signal). To avoid constraints associated with non- stationary

    signals, a wavelet transform is introduced. Like the Fourier transform, it performs de-

    composition in a fixed basis of functions. However, unlike FT it expands the signal in

    terms of wavelet functions which are localized in both time and frequency [44].

  • Chapter 3

    Application Architecture

    We developed and tested an Apple iOS application to demonstrate the iOS device’s

    potential for measuring user heart rates in realtime. This application makes use of the

    iPhone’s front-facing and back cameras and also microphone for PPG and PCG acquisi-

    tion, in order to provide an estimation of the user’s heart rate. Once the measurements

    are obtained, the app will analyze the signals to compute the user’s heart rate.

    At a high level, the core algorithm can be represented by the block diagram in Figure

    3.1. Testing the iDevice sensors’ capability for retrieving heart pulse information is per-

    formed in 4 steps. The video and audio processing units take in their inputs in the first

    3 steps, and in the last step, signal processing and machine learning algorithms are used

    to estimate the heart rate.

    3.1 Fingertip Processing Unit

    The application records video from the fingertip of the user, using the back camera

    for 10 seconds. The user needs to gently press the camera lens and its LED with his

    index finger as previously shown in Figure 2.2. When the user presses the camera lens

    of the device and its LED simultaneously, the ambient light travels through the finger

    23

  • Chapter 3. Application Architecture 24

    Figure 3.1: Block diagram of the application architecture

    and is reaches the camera sensor. A sample frame of the recording video from the index

    fingertip is shown in Figure 3.2.

    Our application utilizes the same image acquisition concept that is available in pulse

    oximeters. In order to determine oxygenated and deoxygenated blood, based on the

    blood opacity, we measure the brightness of skin over time. In order to compute the

    brightness variation of the skin, we calculate the average red channel intensities of pixels

    in the region of interest in each frame. So, we divide each frame into 9 cells and PPG

    waveform extraction takes the central cell into consideration as shown in Figure 3.3.

  • Chapter 3. Application Architecture 25

    Figure 3.2: Video recording from the fingertip using back camera

    Figure 3.3: Region of interest in each frame

  • Chapter 3. Application Architecture 26

    The average red channel intensity is calculated by the equation 3.1 to determine the

    PPG signal.

    PPG1(t) =

    ∑x,y

    R(x, y, t)

    WH(3.1)

    where R(x, y, t) is the red channel intensity of frame at time t at the pixel (x, y) and

    hence 0 ≤ R(x, y, t) ≤ 255. WH is the number of pixels in the region of interest, which

    is 192 x 144 in our application.

    Sample PPG signal from the video recorded from the index fingertip is shown in Fig-

    ure 3.4. The data is quite clean and peak-to-peak distances are visually identifiable.

    Figure 3.4: Example of fingertip data obtained by the described capture method

  • Chapter 3. Application Architecture 27

    3.2 Face Processing Unit

    Another mechanism to sense the color changes of skin during a cardiac cycle is record-

    ing video from the face. The application records video from the face for 10 seconds.

    To record properly, the face should be placed in front of the front-facing camera in a

    well-lit environment. In this application, the user should place his/her forehead in a

    pre-determined area that is displayed on the screen (Figure 3.5).

    Figure 3.5: Video recording from the face using front-facing camera

    In order to capture the PPG signal from the face we again apply the equation 3.2

    to the region of interest in each frame of the recorded video. The PPG signal computed

    from the face might have additional noise corresponding to illumination levels in the

    room, objects covering the forehead, or possible movement of the device during recording.

    PPG2(t) =

    ∑x,y

    R(x, y, t)

    WH(3.2)

  • Chapter 3. Application Architecture 28

    and again R(x, y, t) is the red channel intensity of the frame at time t at the pixel (x, y)

    and hence 0 ≤ R(x, y, t) ≤ 255. WH is the number of pixels in the region of interest,

    which is 192 x 144 in our application.

    A sample PPG signal from the video recorded from the face is shown in Figure 3.6.

    The data is still clean, but it contains additional noise compared to the PPG signal

    recorded from the fingertip.

    Figure 3.6: Example of PPG signal from the face obtained by the described capture

    method

    For both fingertip and face recording, the exposure settings on the device are locked

    in order to eliminate the effect of auto-exposure on the captured results. For example,

    during the fingertip test, the finger completely covers the camera, resulting in the iOS

    device over-exposing the capture, thinking it is in a low-light condition. This trend tends

    to drop the frame rate and saturate the red intensity of the captured data.

  • Chapter 3. Application Architecture 29

    3.3 Audio Processing Unit

    In the final step, the user should place the microphone directly on his chest, preferably

    on the auscultation sites that was shown in Figure 2.4. The audio will be recorded for

    10 seconds using the primary microphone of the device at sampling frequency 44.1 kHz.

    The recorded audio is an indication of the PCG signal and two main sounds in PCG,

    S1 and S2 are identifiable from it. The sample PCG signal recorded using the primary

    microphone on the iDevice is shown in Figure 3.7.

    Figure 3.7: Example of PCG recorded by the microphone and identifiable heart sounds

    From the PCG signal shown in Figure 3.7, two different heart sounds, S1 and S2,

    are quite identifiable, but like any signal, the recorded audio by the microphone might

    contain some noise corresponding to the movement of the phone placed on the chest

    during recording or possible background noise in the room.

    In the next section, our proposed method in filtering the three different captured

    signals and estimating the heart rate of the user is presented.

  • Chapter 3. Application Architecture 30

    3.4 Heartbeat Estimation Algorithm

    The proposed heartbeat estimation algorithm in this thesis provides an estimate of

    the user’s HR by (1) computing the HR from the input PPGs and PCG signals using

    our version of the Empirical Mode Decomposition (EMD) algorithm, (2) assessing the

    quality of the input signals in order to distinguish between good/bad waveform segments

    using the SVM classifier, (3) concisely combining heart rate information from the three

    different modalities based on the quality of signals.

    Figure 3.8 illustrates the components of this approach. In the first component, we

    use a signal filtering algorithm (either EMD, DWT, or FIR filtering) to remove the noise

    artifact in each waveform. Next, we apply the peak detection algorithm to compute

    heart rate in each segment independently. Third, we separately qualify PPG and PCG

    waveform segments (each segment is 5 seconds) as either good or bad through the use of

    the machine learning algorithm in the form of SVMs. In the fourth and final component,

    through a decision-logic algorithm, we precisely combine the result of the three previous

    components to provide the final heart rate estimation.

    3.5 Signal Filtering Algorithms

    In this section we will discuss three different signal filtering algorithms applied to our

    dataset. Our main contribution in the signal processing algorithm is to introduce a ver-

    sion of the EMD algorithm to reduce noise in PPG and PCG signals. We also applied

    Wavelet Transform and FIR filtering to our dataset, in order to compare the results with

    our proposed EMD algorithm.

  • Chapter 3. Application Architecture 31

    Figure 3.8: Elements of the proposed algorithm

    3.5.1 FIR Filtering

    An overview of the designed FIR filtering algorithm is shown in Figure 3.9. We first

    need to down-sample the audio before getting off the device and in the next phase we

    remove baseline wander, and also filter out the noise outside the heartbeat range of

    interest. Finally, we apply a moving average algorithm in order to detect the peaks

    efficiently.

    In the first step we need to filter and down-sample the audio to get the recorded

    data off the device in a reasonable amount of time (with a reasonable size). The raw

    data is sampled at 44.1 kHz for 10 seconds. The data is then filtered with a 6th order

    Butterworth filter (low-pass) and down-sampled to ensure that the Nyquist requirement

    is still met. An example of the data at this stage is illustrated in Figure 3.10.

  • Chapter 3. Application Architecture 32

    Figure 3.9: FIR filtering module

    Figure 3.10: Sample data captured from the device. It has already been filtered and

    down-sampled as specified.

  • Chapter 3. Application Architecture 33

    After the signal acquisition, a band-pass filter attenuates frequencies outside the in-

    terest band. This reduces the noise in later processing steps and makes the resulting

    heart rate signal smoother. In this case, we first remove the baseline wander to provide

    signals with zero mean. Then we divided each of the three vital signals into 10 intervals,

    because the average of the input signals can shift over time due to sensor drift. Linear

    trends were subtracted for each interval to remove the baseline wander. A sample of the

    original PPG signal, the baseline wander of the PPG, and the clean signal after removing

    noise is shown in Figure 3.11.

    Secondly, we apply the Butterworth band-pass filter of order four to each input signal

    with cutoff frequencies of 0.8 Hz and 3.0 Hz to reject the noise outside the heart rate

    range of 48 to 180 beats per minute.

    0 2 4 6 8 10−5

    0

    5

    Time (s)

    Rvalu

    e

    Clean PPG after removing baseline

    0 2 4 6 8 1040

    50

    60

    70

    Rvalu

    e

    Baseline

    0 2 4 6 8 1040

    50

    60

    70

    RV

    alu

    e

    Original PPG signal

    Figure 3.11: Sample PPG recording with baseline wander, baseline, and clean PPG after

    removing the baseline

  • Chapter 3. Application Architecture 34

    Finally, a moving average is applied to the filtered data. The equation is:

    y[n] =1

    2L+ 1

    n+L∑m=n−L

    |x[m]| (3.3)

    where L is the length of the window used for averaging. The study in [45] suggests that

    the shorter heart sound is approximately 67-87 ms in length, and so we applied a window

    of 63 ms.

    Figures 3.12 and 3.13 illustrate the result of applying the FIR filtering module de-

    scribed above to the PPG recorded from the fingertip and PPG recorded form the face

    respectively.

    0 2 4 6 8 1010

    15

    20

    25Original fingertip recording

    Rva

    lue

    0 2 4 6 8 10−1

    −0.5

    0

    0.5

    1Filtered fingertip recording

    Time (s)

    Rva

    lue

    Figure 3.12: Original PPG recorded from the fingertip and clean PPG after applying

    FIR filtering

  • Chapter 3. Application Architecture 35

    0 2 4 6 8 10169

    170

    171

    172

    173

    174

    Rvalu

    e

    Original face recording

    0 2 4 6 8 10−1

    −0.5

    0

    0.5

    1

    Time (s)

    Rvalu

    e

    Filtered face recording

    Figure 3.13: Original PPG recorded from the face and clean PPG after applying FIR

    filtering

    Finally, since we need to correlate the three inputs to estimate the heart rate of

    the user, we need the sampling rates of the audio recording from the microphone to

    be proportionally equal to the sampling rate of our camera, which is 30 fps. So, we

    down-sampled the audio and for this purpose we use the Butterworth low-pass filter

    with appropriate cutoff frequencies to avoid aliasing. Figure 3.14. illustrates the PCG

    recorded from the chest of the user and its down-sampled, filtered, and smooth result

    after applying the algorithm.

  • Chapter 3. Application Architecture 36

    0 2 4 6 8 10−4000

    −2000

    0

    2000

    4000Audio recording

    Am

    plit

    ude

    0 2 4 6 8 10−400

    −200

    0

    200

    400

    Time (s)

    Am

    plit

    ude

    Filtered audio recording

    Figure 3.14: Original PCG recorded from the chest of the User and clean PCG after

    applying FIR filtering

    3.5.2 EMD Algorithm

    Now we will discuss the EMD algorithm proposed for filtering out noise from the two

    PPGs and the PCG signals, which consists of two parts:

    3.5.2.1 Decomposition

    The Empirical Mode Decomposition algorithm is basically introduced to analyze non-

    linear and non-stationary signals like biomedical signals and the algorithm is based on

    decomposing the signal into a collection of IIMFs. These IMFs should fulfill 2 conditions

    that were discussed previously.

  • Chapter 3. Application Architecture 37

    In the first step of signal filtering using EMD we decompose the original PPGs and

    a PCG using the EMD method using the following:

    1. Initialize h1(t) with the original signal.

    2. Identify the extreme of the signal, hi(t).

    3. Generate the upper and lower envelopes by interpolation of maxima and minima

    points developed in the previous step.

    4. Calculate the mean of the two envelopes to determine the local mean value, m(t).

    5. Calculate d(t) = hi(t)−m(t).

    6. Test if d(t) becomes a zero-mean signal, then d(t) is considered as the next IMF,

    hi+1(t) = d(t). Otherwise replace hi(t) with d(t) and repeat from step (2).

    7. Update the residue series as r = r− hi(t) and i = i+ 1. Repeat steps (2) to (6) by

    sifting the residual signal. The process is stopped when the final residual signal is

    obtained as a monotonic function.

    A sample decomposition of the PPG signal recorded from the fingertip into its IMFs

    is illustrated in Figure 3.15.

    3.5.2.2 Reconstruction

    After applying the EMD algorithm on the input signal, the signal is decomposed into

    a residue and a collection of IMFs. Hence it can be expressed as:

    x(t) =n∑

    i=1

    hi(t) + r (3.4)

    where n is the number of IMFs.

  • Chapter 3. Application Architecture 38

    0 2 4 6 8 1040

    60

    80

    PP

    G

    0 2 4 6 8 10−5

    0

    5

    IMF

    1

    0 2 4 6 8 10−5

    0

    5

    IMF

    2

    0 2 4 6 8 10−5

    0

    5

    IMF

    3

    0 2 4 6 8 1050

    60

    70

    Time (s)

    IMF

    4

    Figure 3.15: Original PPG recorded from the fingertip and the decomposed IMFs using

    the EMD algorithm.

    We know from the literature of EMD applications that the last IMFs are considered

    as baseline wander and also that high frequency noise components lie in the first IMFs. In

    order to reconstruct the clean signal from the decomposed IMFs, we need to determine the

    noise level in the signal. To determine the noise level and recover the heartbeat signal, the

    IMFs corresponding to heartbeat is determined according to their peak frequencies. So,

    we compute the Power Spectral Density of each IMF, which demonstrate the dominant

    frequency in the IMF. In our algorithm, the IMFs with peak frequency, Fi, in the range

    of 0.8 Hz - 3.0 Hz are classified as a component of the heartbeat signal. [46] tested these

    cutoff limits on the output of sensors in a designed “HeartPhone”. Therefore, we can

    reconstruct the heartbeat signal as:

    Hclean(t) =∑i

    hi(t) (Fi ∈ [0.8, 3.0] Hz) (3.5)

    Figures 3.16 and 3.17 illustrate the decomposition of the PPG signal recorded from

  • Chapter 3. Application Architecture 39

    the fingertip into five IMFs and corresponding power spectral densities.

    0 2 4 6 8 105

    10

    15

    PP

    G

    EMD decomposition of PPG signal

    0 2 4 6 8 10−0.5

    0

    0.5IM

    F1

    0 2 4 6 8 10−1

    0

    1

    IMF

    2

    0 2 4 6 8 10−0.2

    0

    0.2

    IMF

    3

    0 2 4 6 8 10−2

    0

    2

    IMF

    4

    0 2 4 6 8 105

    10

    15

    Time (s)

    IMF

    5

    Figure 3.16: Decomposition of the PPG signal into IMFs using the EMD algorithm.

    0 1 2 3 4 5 6 7 8 9 10−60

    −40

    −20

    IMF

    1

    Welch Power Spectral Density Estimate

    0 1 2 3 4 5 6 7 8 9 10−100

    −50

    0

    IMF

    2

    0 1 2 3 4 5 6 7 8 9 10−100

    −50

    0

    IMF

    3

    0 1 2 3 4 5 6 7 8 9 10−100

    0

    100

    IMF

    4

    0 1 2 3 4 5 6 7 8 9 10−100

    0

    100

    Normalized Frequency (Hz)

    IMF

    5

    Figure 3.17: Power Spectral Density of the decomposed IMFs.

  • Chapter 3. Application Architecture 40

    A comparison between the original and reconstructed PPG signal, recorded from

    the fingertip via the use of the EMD algorithm, is shown in Figure 3.18. The recon-

    struction was based on IMFs whose dominant frequency components are in the range of

    0.8 Hz - 3.0 Hz. According to the frequency range, we used second and third IMFs for

    partial reconstruction of the signal.

    0 2 4 6 8 106

    7

    8

    9

    10

    11

    12

    Rva

    lue

    Original PPG signal

    0 2 4 6 8 10−1

    −0.5

    0

    0.5

    1

    Time (s)

    Rva

    lue

    De−noised PPG after applying EMD algorithm

    Figure 3.18: Original PPG recorded from the fingertip and the clean signal after applying

    the EMD algorithm and reconstructing the signal based on the Power Spectral Density

    of the IMFs.

    3.5.3 Wavelet Transform

    Wavelet transform can be used for data decomposition and reconstruction. By decom-

    posing the original signal, we can eliminate the wavelets corresponding to the noise and

    reconstruct a clean signal. In order to implement the WT to filter the recorded data, we

  • Chapter 3. Application Architecture 41

    use Multi-Resolution Analysis (MRA).

    According to Wavelet Transform analysis approximations are the high-scale, low-

    frequency components and the details are the low-scale, high-frequency components of

    the signal. Under a varied level of decomposition, a threshold is needed to determine

    which level of components should be eliminated.

    The selection of an appropriate wavelet and number of decomposition levels is very

    important in the analysis of signals using the WT. The number of decomposition levels is

    chosen based on the dominant frequency component of the signal. The levels are chosen

    so that those parts of the signal that correlate with the frequencies required for classifi-

    cation of the signal are retained in the wavelet coefficients. In our algorithm, the level

    of decomposition was chosen to be 4 [43]. Thus the PPG and PCG signal were decom-

    posed into the details D1 −D4 and one final approximation, A4. A4 contains dominant

    frequency in the [0, 3.75] Hz, which corresponds to the heart pulse.

    Usually tests are performed with different types of wavelets and the one that gives

    the maximum efficiency is selected for the particular application. [43] suggests using the

    Daubechies wavelet of order 2 for the PPG, ECG, and EEG signals, so we have also

    done our analysis by (db2) at the level of 4. The block diagram of the wavelet transform

    algorithm is illustrated bellow.

    Figure 3.19: Block diagram of the wavelet decomposition and reconstruction.

  • Chapter 3. Application Architecture 42

    3.6 Peak Detection Algorithm

    The peak detection algorithm used in our own algorithm is a version of the Adaptive

    Peak Identification Technique (ADAPIT) introduced by [13]. This algorithm detects

    peak and computes heart rate for each waveform segment. The main steps of the peak

    detection algorithm are as follows:

    1. In order to detect peaks precisely, we need to remove the baseline of the signal

    which we have already did in the previous section.

    2. In this step the first estimation of the actual peaks is given:

    • Two thresholds, T1 and T2 are computed. T1 is set to 2σ1 where σ1 denotes

    standard deviation of all the data points of the waveform and defines the

    waveform’s baseline range [-T1, T1]. T2 is set to 3σ2, σ2 being the baseline

    standard deviation. The peaks greater than T2 are taken as the first estimation

    of the actual peaks.

    • The lower bound on the amplitude of the peak is set to one half of the median

    amplitudes of all the peaks identified in the previous step.

    3. To determine the actual peaks retained from the previous step, strings of markers

    with period P are iteratively generated and moved along the timeline to align with

    the retained peaks. Through this iteratively process, P is modified to a range of

    length equivalent to HRs between 48 and 180 bpm. The largest P aligned to largest

    number of peaks is selected.

    4. Each unaligned marker of the selected P is allowed to move back and forth along

    the timeline by as much as one half of P, in an attempt to line up any unaligned

    peak.

    Figure 3.20 illustrates the peak detection algorithm. (a) and (b) show the original

    signal and its baseline wander and in (c) a clean signal without baseline wander is shown.

  • Chapter 3. Application Architecture 43

    The threshold T1 and T2 is shown in this figure. In part (d) the primary peaks detected

    by T3 are shown. Also, (e) and (f) show the peak-to-peak intervals and detected peaks

    respectively.

    Figure 3.20: An illustration of the peak detection algorithm

    3.7 Learning System for heart rate Estimation based

    on Support Vector Machines (SVMs)

    3.7.1 SVM Classifier

    SVM is a commonly used method for statistical pattern recognition. Consider the

    problem of separating the input vectors belonging to two separate categories

    V = {(x1, y1), ..., (xi, yi)} , i = 1, 2, ...,m, xi ∈ Rn, yi ∈ {±1}, with a hyper-plane wTx +

    b = 0, where xi ∈ Rn are the patterns to be classified and yi ∈ {±1} are their categories,

    “w” is a normal vector and “b” is a bias term.

  • Chapter 3. Application Architecture 44

    The goal of the SVM classifier is to find the optimal separating hyper-plane which

    optimally separates error and maximizes the distance between the closest vector to the

    hyper-plane. Training the classifier involves the minimization of the error function:

    1

    2wTw =

    1

    2||w2|| (3.6)

    subject to constraints:

    yi(wTxi + b) ≥ 1, i = 1, 2, ...,m (3.7)

    From the Equation 3.6 we can find that wTx+ b ≤ 0 for yi = −1 while wTx+ b ≥ 0 for

    yi = 1.

    The optimization problem can be formulated as follows:

    min J(w, ζ) =1

    2wTw + C

    N∑1

    ζi (3.8)

    such that

    yi(wTϕ(xi) + b) ≥ 1− ζi (3.9)

    ζi ≥ 0 i = 1, ..., N (3.10)

    where C is a positive regularization constant, which is chosen empirically, w is the weight

    vector for training parameter, ζi is a positive slack variable indicating the distance of xi

    with respect to the decision boundary, and ϕ is a nonlinear mapping function used to

    map input data point xi into a higher dimentional space.

    SVMs can be written using the Lagrange multiplier α ≥ 0. The solution for the

    Lagrange multiplier is obtained by solving a quadratic programming problem. The SVM

    decision function can be expressed as:

    g(x) =∑

    xi∈SV

    αiyiK(x, xi) + b (3.11)

    where K(x, xi) is the kernel function and defined as:

    K(x, xi) = ϕ(x)Tϕ(xi) (3.12)

    In this work the linear kernel function is used and is defined as K(x, xi) = xTxi.

  • Chapter 3. Application Architecture 45

    3.7.2 SVM Classifier Implementation

    The SVM classifier is used as a post-processing analysis in our heart rate estimation

    system. The presented heartbeat estimation system is based on filtering and peak detec-

    tion algorithms that were discussed in previous sections and the SVM classifier is used

    to distinguish between good and bad recordings. The results from the filtering and peak

    detection modules used to provide an estimation of heart rate based on the classified

    good waveforms using the SVM classifier.

    In our proposed method, we first apply our version of the EMD algorithm. Then, we

    apply the peak detection module on the clean signal to detect the peaks in the signal,

    from which the features of the classifier will be computed. We then apply the SVM

    classifier, which is a supervised machine learning algorithm to distinguish between good

    and bad waveforms. [48] has shown that the SVM classifier is an effective classifier in a

    wide variety of applications, including in the characterization of PPG and PCG signals.

    This component of the heartbeat detection system implements our premise that the

    reliability of the heart rate estimation is highly dependent on the quality of the underly-

    ing waveforms from which they are derived. A machine learning classifier, implemented

    by SVM, automates the categorization of the waveforms by attempting to mimic the

    performance of human who relies on visual inspection. A classifier learns the rules by

    finding coefficients that optimize the correlations between a set of waveform-extracted

    features and waveform quality obtained from manually categorized waveform samples.

    There are five steps in the development of a Support Vector Machine-learning Classi-

    fier:

    1. manually classify and categorize sample waveform segments as good or bad.

    2. define candidate waveform features that distinguish good/bad waveforms.

    3. Select the most informative feature.

  • Chapter 3. Application Architecture 46

    4. Train the classifier

    5. Test the classifier

    As a supervised-learning algorithm, the development of an SVM requires a set of in-

    put/output learning samples, where the input consists of a list of discriminatory features

    and the output consists of labeled binary classes.

    To manually categorize waveforms, each recording is divided into two 5 seconds seg-

    ments. Each segment is visually examined by a person. We examined 56 five second

    segments for each of the PPG and PCG recordings. A segment is ranked as bad if more

    than 2 expected peaks are not observed or if more than 2 expected peaks from the seg-

    ment cannot be distinguished. Otherwise, it is ranked as good.

    The success of an SVM classifier is highly dependent on good feature selection. So for

    the feature selection part of the algorithm, we used the two features that are validated

    by [13] for heart pulse signals, the Fraction of Aligned Waves (FW) and Pulse Wave

    Variability (PV). Both of the features are time-domain features. FW provides a measure

    of temporal regularity of the potential heartbeat signal and PV provides a measure of

    the variability of the time interval between two adjacent pulse waves.

    The training part of the SVM classifier uses the FW and PV features on a set of wave-

    forms. The performance of the SVM classifier can also vary depending on the number of

    waveforms used in the training phase and also the quality distribution of waveforms.

    Hence, for any new data collected from the iDevice, we first filter the signals and run

    the peak detection algorithm and then we use the learned SVM to assess the quality of

    6 waveforms. After assessing the quality of the waveform, the ones that are classified as

    good waveforms contribute to the final estimation of the heartbeat from our algorithm,

    and can be computed using the equation:

  • Chapter 3. Application Architecture 47

    Hr =

    ∑S∈Wgood

    θ(S)

    ∑S∈Wgood

    T(S)× 60 (3.13)

    where Wgood is the class of waveforms that are classified as a good pulse signal, θ(S) is

    the number of peaks in the waveform S, and T (S) is the duration of the waveform S in

    seconds.

    3.8 Multi-Channel heart rate Estimation

    The multi-channel heart rate estimation module use the location of detected peaks in

    peak detection module. The interval between successive detected peaks are calculated for

    each of the waveforms. The mean value of the histogram of peak-to-peak distances is used

    as an estimation of the heart rate. Hence, we can use the computed mean value to provide

    an estimation of user’s heart rate. Figures 3.21 to 3.23 demonstrates the histogram of

    peak-to-peak distances for fingertip, face, and audio recordings, respectively.

    The main idea behind the use of 3 different modalities for estimating the heart rate

    is to make the final estimation more reliable. To achieve this goal, we assume that the

    heart rate of the user has a negligible change during the test so that we can use fusion of 3

    modalities for our estimation. Therefore, we combine three histograms to create a single

    histogram of the fused data. Figure 3.24 illustrates the combination of three histograms.

    The final heart rate estimation of the system has peak-to-peak distance of 0.93s which is

    64 bpm.

  • Chapter 3. Application Architecture 48

    0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5Histogram of Peak Intervals for the Fingertip Recording

    Peak−to−peak Distance (Seconds)

    Nu

    mb

    er

    of

    Occu

    rre

    nce

    s

    0.8358

    Figure 3.21: Histogram of peak-to-peak distance from the fingertip recording

    0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4Histogram of Peak Intervals for the Recording of Face

    Peak−to−peak Distance (Seconds)

    Nu

    mb

    er

    of

    Occu

    rre

    nce

    s

    0.8870

    Figure 3.22: Histogram of peak-to-peak distance from the face recording

  • Chapter 3. Application Architecture 49

    −1 −0.5 0 0.5 1 1.5 2 2.50

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5Histogram of Peak Intervals for the Audio Recording

    Peak−to−peak Distance (Seconds)

    Nu

    mb

    er

    of

    Occu

    rre

    nce

    s

    0.9364

    Figure 3.23: Histogram of peak-to-peak distance from the audio recording

    0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

    2

    4

    6

    8

    10

    12

    14Histogram of Peak Intervals for the Fused Data

    Peak−to−peak Distance (Seconds)

    Num

    ber

    of O

    ccurr

    ences

    0.8747

    Figure 3.24: Histogram of peak-to-peak distance from 3 modalities

  • Chapter 4

    Experimental Results

    Our iPhone application was developed with data collected from 70 adults, aged 19-62-

    all without any known history of cardiovascular abnormalities. The diverse sample of

    participants consisted of 37 females and 33 males. Furthermore, 11 of the participants

    had dark skins.

    The experiments were conducted in a quiet and well-lit environment. Throughout the

    experiment, subjects were comfortably seated, holding an iPod in their right hand and a

    CMS 50-E pulse oximeter connecting to the index finger of their left hand.

    We recorded user’s heart rate simultaneously using a pulse oximeter in order to ad-

    dress the accuracy of our proposed algorithm. We used a pulse oximeter because it is

    the easiest non-invasive way to measure heart rate of users, with a known error rate

    that does not exceed 2% [49]. The heart rate measured by the pulse oximeter during

    the experiment was recorded and used as a reference in comparison to the results of our

    proposed algorithm.

    50

  • Chapter 4. Experimental Results 51

    4.1 Heartbeat Detection Accuracy without the use

    of the SVM Classifier

    In this section the accuracy of each of the FIR filters, Discrete Wavelet Transform,

    and our proposed EMD algorithm is presented and compared. To evaluate the accuracy

    of each method, we apply each of the signal processing algorithms to remove the high

    frequency noise and baseline wander. Then, we apply the peak detection algorithm to

    the de-noised signal and fuse the histograms of peak-to-peak distances to retrieve the

    corresponding heart rate.

    To compute the similarity between the actual heart rate measured by the pulse oxime-

    ter and the estimated heart rate by each of the signal processing algorithms, the Root

    Mean Square Error (RMSE) is computed. The RMSE between the heart rate measured

    by pulse oximeter and heart rate estimation using FIR, DWT, and EMD filtering algo-

    rithms for all 70 subjects are shown in Table 4.1. The RMSE is computed for each of

    the three modalities and also the estimation from the fusion of the modalities.

    From Table 4.1 we can see that, although each of the modalities contain valuable

    information about the heart rate, the fused data provides us with a more accurate es-

    timation of the users heart rate. Moreover, PPG signals are more informative than the

    PCG signal due to the unavoidable background sounds in the experimental environment.

    Furthermore, the video recorded from the fingertip is more accurate than the face, since

    the illumination level of the environment and other objects covering the face have less

    effect.

  • Chapter 4. Experimental Results 52

    Table 4.1: Root Mean Square Error between heart rate measured by the pulse oximeter

    and heart rate estimation using FIR, DWT, and EMD filtering for all 70 subjects

    Fingertip

    Recording

    (bpm)

    Face Recording

    (bpm)

    Audio Recording

    (bpm)

    Fused Estimation

    (bpm)

    FIR Filtering 4.8 5.2 6.1 4.7

    DWT Filtering 4.7 4.8 5.1 4.5

    EMD Filtering 4.1 4.3 5.2 3.8

    Table 4.2 also demonstrates the average percentage error between the actual heart

    rate measured by pulse oximeter and estimated heart rate using FIR, DWT, and EMD

    filtering algorithms.

    Table 4.2: Average percentage error between heart rate measured by pulse oximeter and

    heart rate estimation using FIR, DWT, and EMD filtering for all 70 subjects

    Fingertip

    Recording

    Face Recording Audio Recording Fused Estimation

    FIR Filtering 4.4% 4.8% 5.6% 4.3%

    DWT Filtering 4.2% 4.5% 4.8% 3.8%

    EMD Filtering 3.7% 3.8% 4.7% 3.6%

  • Chapter 4. Experimental Results 53

    As the results show, the heart rate estimation algorithm using our EMD algorithm

    is more accurate than the FIR filtering and DWT algorithm. It should be noted that the

    EMD algorithm works well for all of the three modalities and that the heart rate estima-

    tion algorithm using EMD on the fused data is the most accurate heartbeat estimation.

    The RMSE between the heart rate estimation using EMD on the fused data from the

    three modalities and the recorded heart rate from the pulse oximeter is 3.8.

    4.2 SVM Classifier Sensitivity

    The level of noise recorded in PPGs from the face and fingertip and the PCG could

    lead to a distortion of the heart pulse signal. In order to determine the sensitivity of the

    SVM classifier discussed in Chapter 3, we tested the SVM classifier through 20 cross-

    validation procedures. For each cross-validation procedure, we measure the quality of the

    recordings visually and categorize them accordingly. In each of the 20 cross-validation

    repetitions, 40% of the waveforms were used for training and the other 60% waveforms

    were used for testing the classifier. The training sample waveforms are chosen randomly

    in all 20 cross-validation procedures. For all the simulations, we used the same SVM

    model with a linear kernel function and at the end of the 20 simulations, the average

    performance measures, such as sensitivity (Se) and specificity(Sp), was computed and

    the human classification was used as the ground truth. Classifier sensitivity provides a

    measure of the incorrectly classified bad waveform segments, whereas classifier specificity

    provides a measure of false hits (i.e. the fraction of good segments classified as bad).

    The sensitivity (Se) and specificity (Sp) are defined as:

    Se =TP

    TP + FN(4.1)

    Sp =TN

    TN + FP(4.2)

  • Chapter 4. Experimental Results 54

    where TP is the number of good waveform segments that are identified as good waveforms,

    and TN is the number of bad waveform segments classified correctly as bad waveforms,

    FP is the number of good waveforms incorrectly identified as bad waveforms, and FN is

    the number of bad waveform segments incorrectly classified as good waveforms.

    For classification purposes, as described earlier, each waveform is divided into two

    segments of 5 seconds and hence we have 56 waveform segments for each of the modalities

    for training the classifier. Tables 4.3, 4.4, and 4.5. correspond to the results of testing

    the SVM classifier on the rest of 84 waveform segments of the PPG signal recorded from

    the fingertip, the PPG signal recorded from the face, and the PCG signal.

    For each set of training-testing waveforms, we run the simulation three times using

    the de-noised PPG signal from applying FIR filtering, DWT de-noising algorithm, and

    our proposed EMD algorithm. Averaged over 20 cross-validations, the results in Table

    4.3 show the sensitivity and specificity of the SVM classifier on the de-noised signal using

    FIR filtering, DWT de-noising algorithm, and our proposed EMD algorithm.

    Table 4.3: Determining sensitivity and specificity of the SVM classifier for 84 de-noised

    5 seconds waveform segments of the PPG signal, recorded from the fingertip

    Average Sensitivity Average Specificity

    FIR Filtering 85% 86%

    DWT De-noising 82% 87%

    EMDAlgorithm 89% 95%

    From Table 4.3 we can see that the SVM classifier works best on the waveform

    segments de-noised by the EMD algorithm and the sensitivity and specificity indexes

    are 89% and 95% respectively. Also, the sensitivity of the SVM classifier is higher on

    de-noised signals using FIR filtering than de-noised signals using the DWT algorithm,

  • Chapter 4. Experimental Results 55

    whereas the specificity of the SVM classifier on the later is better than the first one.

    The second result from Table 4.3 is a higher percentage of the average specificity index

    rather than the average sensitivity of the SVM classifier in all three types of de-noised

    waveform segments. This shows that the SVM classifier finds bad waveforms more cor-

    rectly, with higher probability than good waveform. This happens because the peak

    detection algorithm cannot detect peaks better than the human eye. Hence more good

    waveform segments are classified as bad waveforms and caused the lower sensitivity per-

    centage of the SVM classifier.

    Table 4.4 illustrates the sensitivity and specificity of the SVM classifier on three

    types of clean PPG signals, recorded from the face de-noised by FIR filtering, DWT, and

    EMD algorithms by the same cross-validation procedure as described. Here again, the

    sensitivity and specificity of the SVM classifier on de-noised waveform segments, using

    our EMD logarithm is the highest of all. Also, the specificity percentage in all of the

    three types of clean PPG signals is better than the sensitivity percentage of the SVM

    classifier on them.

    Table 4.4: Determining sensitivity and specificity of the SVM classifier for 84 de-noised

    5 seconds waveform segments of the PPG signal recorded from the face

    Average Sensitivity Average Specificity

    FIR Filtering 72% 81%

    DWT De-noising 69% 73%

    EMDAlgorithm 74% 92%

  • Chapter 4. Experimental Results 56

    Furthermore, the performance of the SVM classifier on PPG signals, recorded from

    the fingertip is better than its performance on PPG signals recorded from the face. This,

    as we expect, shows that if the PPG signal recorded from the face is not clean and that

    noise has contaminated the heart pulse, it is so intense that it would lead to distortion

    of the pulse signal. The highest level of noise in the PPG recorded from the face might

    be due to different illumination levels in the environment during the experiment and also

    movement of the device during recording.

    The sensitivity and specificity of the SVM classifier for the de-noised PCG signals

    using the previously described cross-validation procedure is shown in Table 4.5. Speci-

    ficity of the SVM classier has the highest percentage on de-noised PCG using the EMD

    algorithm, which is 91%. Also the highest sensitivity of the SVM classifier is on the clean

    PCG from the DWT (86%) which is slightly higher than that of the EMD algorithm,

    which is 84%. The performance of SVM classifier on PCG waveform segments is better

    than it performance on the PPG signal recorded from the face, but still worse than its

    performance on the PPG signal recorded from the fingertip. Again, we can justify this

    result from the fact that with no background noise in the environment, the device can

    record the heart sound well, but if there is background noise, the heart pulse signal might

    distort and lead to poor heart rate estimation.

    Table 4.5: Determining sensitivity and specificity of the SVM classifier for 84 de-noised

    5 seconds waveform segments of the PCG signal

    Average Sensitivity Average Specificity

    FIR Filtering 78% 82%

    DWT De-noising 86% 88%

    EMDAlgorithm 84% 91%

  • Chapter 4. Experimental Results 57

    4.3 Heartbeat Detection Accuracy by using SVM

    Classifier

    In this section we will explore the accuracy of the heartbeat detection algorithms using

    the SVM classifier as a post-processing analysis. We first filtered the signal using one of

    the FIR, DWT, or EMD filtering algorithms. Then, we ran the peak detection algorithm

    to detect the heart pulse peaks in the recorded signals. We used the peak locations and

    estimated the heart rate from each of the modalities to extract the classification features

    of the SVM classifier. As described in the previous chapter, we used fraction of aligned

    waves and pulse variability as the classification features of our SVM classifier. Next, we

    applied the SVM classifier using these two features to classify the waveform segments as

    good or bad using a linear kernel. Finally, we computed the heart rate from the waveform

    segments that are classified as good.

    To measure the performance of the proposed heartbeat detection system using SVM

    classifier, we tested the SVM classifier through 20 cross-validation procedures employing

    manually categorized waveform samples, at each of the 20 cross-validation repetitions

    40% of the samples were used for training and the other 60% of samples were used

    for testing the classifier. The training sample waveforms are chosen randomly in all 20

    cross-validation procedures. For each of the training-testing set of waveforms, we run

    the simulation three times using the de-noised signals using FIR filtering, the DWT de-

    noising algorithm, and our proposed EMD algorithm.

    The results of the heartbeat detection algorithm using the SVM classifier on each of

    the signal filtering methods are shown in Table 4.6. As the results show, adding the SVM

    classifier to the heartbeat estimation system provides better performance in terms of the

    RMSE between the estimated heart rate and that measured heart rate using the pulse

    oximeter. The major effect of adding the SVM classifier as a post-processing analysis

  • Chapter 4. Experimental Results 58