Using Time Series Classiﬁcation in the Gym with Smart

Registration number 100164855

2019

Using Time Series Classification in the Gymwith Smart Watches to Gamify and

Encourage Exercise

Supervised by Dr Jason Lines

University of East Anglia

Faculty of Science

School of Computing Sciences

Abstract

This papers assesses the performance of currently available classification algorithms

in the context of human activity recognition for common bodyweight exercises, such

as Rotation Forest and Dynamic Time Warping. It compares generic machine learning

algorithms to the more specific algorithms designed for use with time series data, such as

the data collected for this paper. It uses data collected from a three-axis accelerometer

in a modern smartphone for four different exercises; the pull-up, push-up, sit-up and

squat. It compares numerous variations of this data such as combining axis and/or

downsampling the data in order to see what effects they have on the performance of the

classification algorithms tested. This was all done with the aim of producing a model

that can correctly classify, as many exercises as possible, while maintaining the best

performance possible.

The results of this paper shows that the performance of the classification algorithms

and data variations used vary a huge amount, with accuracies ranging from ~28% up

to ~94%, depending on the combination of classification algorithm and data variation

used. This highlights the importance of taking the time to make sure that the optimal

techniques and data are being used, depending on the purpose and application for which

the results will be used.

Acknowledgements

I would like to thank my supervisor and lecturer, Dr. Jason Lines, for all the support,

advice and encouragement he has provided me throughout the duration of this project

for without him, this project would not have been possible.

I would also like to thank Dr. Pierre Chardaire for all the time he has invested into

the organisation of the third year projects module, providing all students with useful

information regarding the flow of the project and appropriate use of LATEX.

CMP-6013Y

Contents

1 Introduction 8

2 Background 9

2.1 Health and Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Human Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Gamifying Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Time Series Classification . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Design & Planning 13

3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.3 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Prototype Results 16

5 Implementation 17

5.1 Pseudocode & File Structure . . . . . . . . . . . . . . . . . . . . . . . 17

5.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.3 Data Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.4 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.4.1 ZeroR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.4.2 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.4.3 Rotation Forest . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.4.4 C4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.4.5 Sequential Minimal Optimisation with SVM . . . . . . . . . . 23

5.4.6 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.4.7 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.4.8 Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . 24

Reg: 100164855 iii

CMP-6013Y

5.4.9 Time Series Forest . . . . . . . . . . . . . . . . . . . . . . . . 24

5.4.10 BOSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.4.11 HIVE-COTE . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Analysis of Results 25

6.1 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.2 BOSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.3 C4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.4 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.4.1 1NN & 30% Warping . . . . . . . . . . . . . . . . . . . . . . . 27

6.4.2 1NN & 60% Warping . . . . . . . . . . . . . . . . . . . . . . . 28

6.4.3 3NN & 30% Warping . . . . . . . . . . . . . . . . . . . . . . . 28

6.4.4 3NN & 60% Warping . . . . . . . . . . . . . . . . . . . . . . . 28

6.5 Euclidean Distance 1NN . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.6 Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.7 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.8 Rotation Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.9 Sequential Minimal Optimisation with SVM . . . . . . . . . . . . . . . 31

6.10 Time Series Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.11 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.12 ZeroR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.13 EE, Flat-COTE & HIVE-COTE - NEEDS FINISH . . . . . . . . . . . 33

6.14 Results Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7 Project Evaluation 36

7.1 Problems Encountered . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7.2 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

8 Conclusion 39

9 Appendix 41

Reg: 100164855 iv

CMP-6013Y

References 52

Reg: 100164855 v

CMP-6013Y

List of Figures

1 ED v DTW when calculating the distance between two series . . . . . . 12

2 The file structure used in this project . . . . . . . . . . . . . . . . . . . 18

3 Example showing the ar f f files for the data variations . . . . . . . . . 21

4 Critical difference diagram using accuracy for the classifiers . . . . . . 35

5 Critical difference diagram using accuracy for the data-sets . . . . . . . 35

6 Prototype pseudocode showing: data being loaded, the classifier being

created and trained and then classifying the test data . . . . . . . . . . . 41

7 Pseudocode showing: looping through the classifiers, looping through

the data-sets, building the classifier, classifying test data, outputting results 42

8 Original project Gantt chart . . . . . . . . . . . . . . . . . . . . . . . 50

9 Revised project Gantt chart . . . . . . . . . . . . . . . . . . . . . . . . 51

Reg: 100164855 vi

CMP-6013Y

List of Tables

1 MoSCoW analysis for the project . . . . . . . . . . . . . . . . . . . . . 15

2 Overview of the number of instances for each class . . . . . . . . . . . 19

3 Results for the AdaBoost classifier . . . . . . . . . . . . . . . . . . . . 43

4 Results for the BOSS classifier . . . . . . . . . . . . . . . . . . . . . . 44

5 Results for the C4.5 classifier . . . . . . . . . . . . . . . . . . . . . . . 44

6 Results for the DTW1NN @ 30% classifier . . . . . . . . . . . . . . . 45

7 Results for the DTW3NN @ 30% classifier . . . . . . . . . . . . . . . 45

8 Results for the ED1NN classifier . . . . . . . . . . . . . . . . . . . . . 46

9 Results for the MLP classifier . . . . . . . . . . . . . . . . . . . . . . . 46

10 Results for the Random Forest classifier . . . . . . . . . . . . . . . . . 47

11 Results for the Rotation Forest classifier . . . . . . . . . . . . . . . . . 47

12 Results for the SMO classifier . . . . . . . . . . . . . . . . . . . . . . 48

13 Results for the TSF classifier . . . . . . . . . . . . . . . . . . . . . . . 48

14 Results for the XGBoost classifier . . . . . . . . . . . . . . . . . . . . 49

15 Results for the ZeroR classifier . . . . . . . . . . . . . . . . . . . . . . 49

Reg: 100164855 vii

CMP-6013Y

1 Introduction

This project will focus on the area of human activity recognition using time series

classification, specifically on finer detail ‘Gym’ movements such as push-ups, sit-ups

etc. The project will be conducted by recording accelerometer data from a cell phone

and/or smart watch and trying to classify the data as one of the aforementioned exer-

cises. Different variations of the recorded data will be used to not only find the best

results but to also try and use data that closely resembles how it would be in the real

world. When classifying the data, multiple different algorithms will be used as there

is no algorithm that best suites all data sets and therefore it must worked out which

algorithm gives the best results for the data sets collected.

The purpose of this project is to, if possible to accurately classify these exercises, use

this in order to try and encourage exercise in the real world through the use of gamifying

it. It is widely know that fitness is an important part of leading a healthy life in more

ways than one and encouraging exercise, not necessarily in the ways covered in this

paper, has the potential to positively impact a large number of peoples lives.

This projects goal can be seen to be quite large, and while it is, it can be split into

different sections as it includes multiple areas of interest such as the human activity

recognition problem, the time series classification (TSC) problem and the gamification

problem in order to be more clearly defined. The overarching aim of this project is to

critically assess whether finer detail exercises can be identified using accelerometer data

from a cell phone and/or smart watch. Some examples of these finer detail exercises that

could be included are, but not limited to, push-ups, sit-ups, pull-ups, bench press and

deadlift. The other aim of this project is to hopefully analyse the effect that gamification

has on exercise. These two aims can be further boiled down to smaller objectives. Ex-

amples of such objectives include assessing whether the exercises can be identified just

using the sensors in either the cell phone or the smart watch or are both required, further

assess whether the exercises can not only be identified but counted and compared, as-

sess whether the identification of the exercises is person dependent or independent and

to analyse how effective gamification is on encouraging exercise.

Reg: 100164855 8

CMP-6013Y

2 Background

2.1 Health and Fitness

A 2015 report found that 58% of women and 68% of men were either overweight or

obese, with the obesity of the population rising from 15% in 1993 to 27% in 2015 1.

The same report a year later found that in 2016 there were 617,000 hospital admissions

where obesity was a factor which is an increase of 18% on 2015 2. In 2008 an investiga-

tion was conducted into the effects of aerobic endurance training and strength training

on cardiovascular health. They split the subjects into three groups, each designated a

specific type of training, and put them on a twelve week program which consisted of

exercise three times per week. From this they found that ‘both aerobic exercise training

at either high or moderate intensities and high-intensity strength training improve en-

dothelial function and decrease the cardiovascular risk profile in obese adults. However,

high-intensity aerobic interval training results in a greater improvement in endothelial

function and a decrease in the cardiovascular risk profile’. For the moderate-intensity

and high-intensity aerobic training body weight decreased by 3% and 2% respectively

and body fat also decreased by 2.5% and 2.2% respectively. While the strength training

did not show any decrease in either body weight or body fat, it did show an increase of

10% for VO2 max as well as an increase in strength of 25% (Schjerve et al., 2008).

The modern world is becoming more and more work oriented with greater pressure

on people to devote more time towards their careers, this leaves people with less recre-

ational time and therefore less time concerned with physical well-being, as for the ma-

jority of people exercise is not what they want to spend their limited recreational time

doing. The motivation behind this project stems from the fact that exercise, as outlined

above, is an important contributor to physical and psychological well-being and comes

in a close second after diet. Regular exercise is well known to reduce the risk of many

chronic diseases, such as cardiovascular diseases, obesity, diabetes, etc.

1National Health Service (2017). Statistics on obesity, physical activity and diet.

2National Health Service (2018). Statistics on obesity, physical activity and diet.

Reg: 100164855 9

CMP-6013Y

2.2 Human Activity Recognition

Activity recognition is an ever-growing field, especially with expanding industry of

wearable tech which includes smart watches and even the release of so-called smart

glasses. Within this field however, the majority of research focuses on exercises that

would be considered whole body movements such as walking, running, cycling and

swimming. During 2007 a project was carried out on the recognition of free-weight

exercises such as the bench press, deadlift and overhead press. They used two different

classification models, Naive Bayes and Hidden Markov, which both resulted in around

a 90% accuracy of correctly identifying the types of exercise. In this project instead

of a smart watch being used a three-axis accelerometer was incorporated into a work-

out glove as well as an accelerometer on a user’s waist to track body posture (Chang

et al., 2007). Four years later a similar experiment was carried out, however only on

whole-body exercises, using accelerometers in cell phones on twenty-nine volunteers

with android-based cell phones and also found that most activities were recognised cor-

rectly 90% of the time (Kwapisz et al., 2010). This may indicate that the best results

can be achieved with a combination of data from sensors in smartphones and wearable

sensors.

2.3 Gamifying Exercise

In 2015 a study was done into the effects of gamification on exercise. Prior to the

study subjects were asked to ‘complete a questionnaire survey eliciting their exercise

habits in terms of the type of exercise and frequency they engaged in them, their atti-

tudes and their level of enjoyment of the exercises they performed’. They were then

introduced to the gamification service and asked to use it as part of their exercise rou-

tine for a month. After this they completed questionnaire survey that was similar to

the first one. From this investigation they found that their attitude toward exercise had

changed a improved significantly, while also their perception towards enjoying exercise

improved. The participants on average also improved their exercise habits by increasing

the amount of time spent exercising. However this was not the case for all participants as

‘there were also those who felt that the gamification features overemphasised the com-

Reg: 100164855 10

CMP-6013Y

petitive aspect. This caused them to feel demotivated by the scores they had achieved’.

This study is similar to the latter stages of this project and could be used as an indicator

of the type of results that may be achieved (Goh and Razikin, 2015).

2.4 Time Series Classification

A time series is a series of data points indexed in time order, usually these data points

are taken at equally spaced out intervals. Time series classification (TSC) is whereby a

time series pattern is assigned to a specific category. When using TSC we require a set

of n time series:

T = {T 1,T 2, ...,T n}

where each time series is made up of m real-valued, ordered observations:

T i =< t i,1, t i,2, ..., t i,m >

and a class label ci. Given T, the TSC problem is to find a function that maps from the

space of possible time series to the space of possible class values. For simplicity, in our

definition we assume that all series of T are the same length (Lines, 2015).

For this project we can use TSC to assign a time series pattern to a particular exercise

or movement, and then use this to recognise when it carried out again based on the time

series pattern. It is mentioned by Hills et al. (2014) that TSC problems have a specific

challenge which is how to measure the similarity between series. This means finding

the algorithm that matches the time series pattern to the correct category that has the

best performance and as TSC problems arise across a diverse range of domains, there is

no single approach that can be singled out as the best.

For a long time the general consensus was that the gold standard in TSC was dynamic

time warping (DTW), a nearest neighbour (NN) classifier. However, recent work has

demonstrated that many newer approaches significantly outperform this standard, with

the best being collection of transformation ensembles (COTE). NN classification works

by calculating the distance (or similarity) between two time series. Euclidean distance

(ED) and DTW are both examples of NN classification. The former is, simply put, the

Reg: 100164855 11

CMP-6013Y

point-wise distance between two series, whereas DTW allows for "warping" between

two series to facilitate small misalignments in the data (Lines, 2018b).

Figure 1: ED v DTW when calculating the distance between two series

Elastic ensemble (EE) is a combination of eleven NN classifiers, including ED and

full window DTW, that use whole series elastic distance measures in the time domain

and with first order derivatives. It is shown that this algorithm, using a voting scheme

that weights according to cross-validation training set accuracy, is significantly more

accurate than any single component (Bagnall et al., 2017).

In Bagnall et al. (2017) COTE is the only classifier they were aware of that explicitly

ensembles over different representations. We use the classifier structure called flat-

COTE. This involves pooling 35 classifiers into a single ensemble with votes weighted

by train set cross validation accuracy. There is however one difference: the Shapelet

Transform (ST) described in Bostrom and Bagnall (2015) is used rather than the version

in Hills et al. (2014).

A range of other classifiers are also covered in this paper, see Section 5.4 for the full

list.

Reg: 100164855 12

CMP-6013Y

3 Design & Planning

3.1 Design

The title of this project, Using Time Series Classification in the Gym with Smart

Watches to Gamify and Encourage Exercise, identifies three main parts which are time

series classification, human activity recognition in the gym and using gamification to

promote exercise. It states in the project description that it could be broken down into

five main steps which are as follows; ‘Formalising the problem, Interacting with the

software of a wearable device, Collecting data for experimentation, Implementation

and testing of algorithms to predict behaviour, Extensions, such as gamification, further

trials, unsupervised detection of exercise, etc’ (Lines, 2018a). I decided early on that I

did not want to restrict this project to just using a wearable device like a smart watch,

but also include cell phones, as not only did it limit the results we may see but the large

majority of people do not own or use a smart watch device on a regular basis which

therefore further limits the applications of this project in the real word.

3.1.1 Methodology

For this project I felt that following an agile methodology was most appropriate

mainly due to the time constraints placed upon the project. This sort of methodology

allows me to remain somewhat flexible on what needs to be done by what time, as the

project can always undergo minor changes to allow for this. For example if a stage were

to finish ahead or behind schedule then the work plan could be adjusted by finding a

way to allocate more or less time to other stages without it causing too many problems

meaning the project can continue.

3.1.2 Literature Review

Conducting a literature review allowed me to expand my knowledge in each of the

topic areas covered in this project, it also allowed me to identify gaps within the topic

areas as well as providing ideas that could be used to improve this project. From the

literature review I came to the conclusion that there seems to be a lack of knowledge

Reg: 100164855 13

CMP-6013Y

in the area of free-weight and bodyweight exercises as the majority of human activity

recognition studies, related to exercises, are focused on whole-body movements such as

running etc. This led me to the decision that this project should focus itself on the areas

where there is a lack of knowledge. By doing this, the project will be more original and

not just a copy of another and will be more likely to have real world applications.

3.1.3 Extension

The gamification extension of this project could carried out in a number of different

ways, the most obvious being a mobile app or web-based app that gets updated when

recognised exercise has been completed. The implementation of the app could also

be a number things, ranging from a video game to a competition with friends with

a prize pool to the winner. However, what they would have in common is that they

would feature some sort of in-app reward that based upon the exercise completed, the

reward could vary depending amount of exercise completed and/or more optimistically

the intensity of which the exercise was performed e.g. 10 push-ups in a shorter time

period would be worth more or harder exercises are also worth more.

3.1.4 Constraints

With any project there will always be constraints, things that limit a projects desired

outcome, and this project is no exception. One of the main constraints of this project is

time. This project has a specified deadline of when everything must be completed which

limits the amount of time that can be spent on each phase of the project as outlined in

the Gantt charts at the end of the Appendix. This could mean that for example in the

coding phase that a non-optimal algorithm is used as there is not enough time to develop

a more efficient version. To combat this, efficient time management is an absolute must

to ensure that no time is wasted as well as starting with existing algorithms in the code

base rather than producing one from scratch. Another constraint is cost, while not being

the biggest constraint on this project due to it being a student project, it is most definitely

still a factor as it could affect the amount or even the quality of the smart watches being

used which could in turn affect any results obtained. Short of using personal finances,

there is nothing that can really be done about this. A project specific constraint could be

Reg: 100164855 14

CMP-6013Y

the number of different exercises that I or any volunteer is able to perform depending on

the availability of equipment at their chosen facility. This could limit the results of the

project to smaller number of exercises which could in turn affect the gamification stage

as some exercises may not be rewarded properly due to them not being recognised.

Must Have:

• TSC of exercises using recorded data.

• Include two basic bodyweight exer-

cises.

• Evaluation of algorithms used.

Should Have:

• Include additional bodyweight exer-

cises.

• Use at least five different algorithms.

Could Have:

• Evaluation of data variations used.

• Include non-bodyweight exercises.

• Include a gamification component.

• Use data from both a cell phone and/or

a smart watch.

Won’t Have:

• Automatic data recording based on lo-

cation.

• Suggest exercise form improvements

using recorded data.

Table 1: MoSCoW analysis for the project

Reg: 100164855 15

CMP-6013Y

3.2 Planning

After designing the project, the next stage is planning. This is where work plan should

be set out of all parts in the project and how long each part should take, by doing this

you can give yourself more specific targets on what needs to accomplished and by when.

You could further this by comparing it to how the project would go in an ideal world,

which could be used as a measure of how successful the project was. An excellent way

to do all this is through the use of a Gantt chart. To see the projects original and revised

Gantt chart please see the end of the Appendix.

Comparing the two Gantt charts the first noticeable difference is more project spe-

cific headings instead of the previous generic headings allowing me to better follow

what part of the project currently needs work. The other changes are the addition of a

Gantt bar for any research and implementation of the optional project extensions and the

milestone signifying the end of the project as well as the length of the some Gantt bars

being shortened including the ‘Prototype Design & Implementation’ (previously called

design), ‘Introduce more exercises’, ‘Experimenting with data set variations’ and ‘Final

report writing’. I feel the revised Gantt chart gives a much clearer and more detailed

view of the project which will allow me, and has allowed me, to keep much better track

of the progress being made to try and ensure that things are completed on time.

4 Prototype Results

For the prototype, data was only collected for 2 different exercises which were push-

ups and sit-ups. For each exercise, a relatively small amount of data of approximately

40 instances was collected, which was equivalent to roughly 2 minutes. This data was

then split as evenly as possible between two groups, train and test, in which it was split

further so that each axis had its own ar f f file.

Within the prototype I decided to only use Euclidean distance (ED) and dynamic

time warping (DTW) nearest neighbour classifiers on the data. On the first run I used

ED classifier on each of the x, y, z axes and then also on t and I achieved accuracy

results of 75.00%, 67.50%, 95.00% and 85.00% respectively, for details on the axes

see Section 5.3. After this I moved on to using a DTW classifier, bear in mind that

Reg: 100164855 16

CMP-6013Y

0% warping is equivalent to ED. I first ran it with 10% warping and achieved 100%

accuracy for both the x and t axes while the y and z gave results of 92.50% and 97.50%

respectively. The results for DTW, at the tested warping allowances of 10% through

100% at 10% increments, are consistently better than its ED counterpart. The t data set

gives accuracy of 100% for all DTW test whereas the x axis only gives 100% accuracy

with 10% warping and otherwise provides values of 97.50%. The y axis hovers around

the 92.50% - 95.00% mark depending on the warping. z axis gives an accuracy of

97.50% on all DTW tests. See Appendix for pseudocode relating to the how the results

are obtained.

From the results outlined above we can see that DTW is the better choice out of the 2

test classifiers as easily outperforms ED at all levels of warping. It also appears that the

t data set is the best choice as it gives 100% across the board for the DTW tests.

5 Implementation

5.1 Pseudocode & File Structure

Figure 7 in the Appendix, is the pseudocode of the code that was ran in order to

obtain all the results necessary. The implementation of this was done in Java, Ora-

cle Corporation (1995), using the University of East Anglia’s time series classification

repository, Bagnall et al. (2016), which is built on top of the Weka framework (Univer-

sity of Waikato, 1997).

Figure 2, below, shows the file structure that was used in this project. This is relevant

to the implementation of the pseudocode as it allowed for efficient looping through

directories instead of having to load all the data into a data structure at the start of each

run. The data collected from the smartphone app was given an appropriate name and

folder where it was stored. The Python script, see Section 5.3, outputted its ar f f files

into the ‘Formatted Data’ folder where they were checked, given a suitable name, see

Figure 3, and moved to the correct subfolder in ‘Usable Data’ ready for use.

Reg: 100164855 17

CMP-6013Y

Figure 2: The file structure used in this project

5.2 Data Collection

As with the prototype, the same smartphone was used and all the data collected is my

own. The app used to record the data records values for each of the x, y and z axes in

the accelerometer separately and has an additional value t for total acceleration. If the

phone is lying face up on a flat surface in front of you with the camera end of the phone

Reg: 100164855 18

CMP-6013Y

away from you then the x axis typically refers to sideways motion (left is negative, right

is positive), the y axis typically refers to forward or backward motion (away from you

is positive, towards is negative) and the z axis typically refers to upwards or downwards

motion (upwards is positive, downwards is negative). This app also allows you to change

the collection rate which it uses to record the data between two preset options which are

‘Slowest’ (~15Hz) and ‘Fastest’ (~200Hz) which correspond to the slowest/fastest that

the sensors are able to collect data on that specific device, so the values above are for

my device, being a OnePlus 3. The app also offers another option where you are able

to set the collection rate to a custom value between the ‘Slowest’ and ‘Fastest’ presets,

however, this option is only available in the premium version of the app. When the data

was recorded, it was all done so using the ‘Fastest’ preset in order to gather as much

data as possible.

I decided on having 4 different exercises, the push-up, the sit-up, the pull-up and

the squat, which are the 4 main bodyweight exercises as they are all exercises that the

majority of people can do which again doesn’t limit this project to a smaller than need

be subset of the population. The process of collecting the data was simple; the data

recording was started and device placed in an available pocket, the chosen movement

was then carried out for as many repetitions as possible in one go, the data recording

was stopped. This was repeated for each exercise until approximately 6 minutes worth

of data had been recorded which equated to approximately 140 instances for each class,

as opposed to the prototype which used 2 minutes. A breakdown of this can be seen in

Table 2 below.

Class No. Train Instances No. Test Instances Instance Distributionpushup 63 59 0.5164, 0.4836

situp 76 66 0.5352, 0.4648

squats 84 76 0.5250, 0.4750

pullup 69 64 0.5188, 0.4812

Total 292 265 0.5242, 0.4758

Table 2: Overview of the number of instances for each class

Reg: 100164855 19

CMP-6013Y

5.3 Data Variations

After the data had been collected it had to be split up and formatted correctly in order

to allow for the most experiments to be run, with the hope of achieving better results.

Unlike in the prototype, where this was done by hand as there was relatively little data

in comparison, it was carried out by a custom script written in Python (Rossum, 1990).

This Python script took the data, which had been exported from the app in CSV format,

and created the ar f f files which can be understood by Weka. There were a number of

different ar f f files produced this way. Each of the x, y, z and t axes had their own ar f f

file, there was also an ar f f file for each axis when data had been downsampled by a

factor 2 and by a factor of 4 as well as an ar f f file for the concatenated data.

There were only 4 ar f f files for the standard data, one for each axis. The standard

data was recorded at 200Hz and split in such a way that instance each was given 500

attributes, equivalent to 2.5 seconds, which I chose as it similar to how long a single

repetition may take to complete. The instances for each exercises were added in their

groups, however, the order is irrelevant as each instance is treated separately.

The ar f f file for the combined data was formatted such that all the x, y, z, and t

values were concatenated together for each instance, giving 2000 attributes.

The data was also downsampled from 200Hz by two different factors, 2 and 4. This

meant that the data downsampled by a factor of 2, was equivalent to 100Hz and had 250

attributes per instance and when downsampled by a factor of 4, was equivalent to 50Hz

with 125 attributes per instance. The concatenated data was also downsampled using

the same factors giving, giving an ar f f file with each 1000 and 500 attributes. Also, 2

different methods for the downsampling were used which were decimation and average

decimation. In the former, if you are downsampling by a factor of 4 you simply take

every 4th value and discard the rest. In the latter, again downsampling by a factor of 4,

you instead take the mean of each 4 values as the new values.

In Figure 3 below the names take the format: ‘axis_collectionrate_{exercises}_usage’

where ‘Skip’ or ‘Mean’ is added to collection rate depending on what type of downsam-

pling, if any, is used.

Reg: 100164855 20

CMP-6013Y

Figure 3: Example showing the ar f f files for the data variations

5.4 Classifiers

This section gives an informal overview into each of the classification algorithms that

I chose to use. I chose these classifiers as they are a range of different types of classifiers

including generic ones and time series specific ones. I did this with the aim of providing

a better overview on the best classifier and type of classifier for the type of data gathered

for this project. All classifiers, other than Dynamic Time Warping, were used ‘as is’,

with no optional configuration/optimisation taking place.

The algorithms: ED, Dynamic Time Warping, Elastic Ensemble and Flat-COTE were

also tested, however, they have already been given an overview in Section 2.4 and there-

fore I saw no reason to include them again here.

5.4.1 ZeroR

ZeroR is one of the most primitive classifiers available. It works by predicting the

majority class from the training data but it is still sometimes used as baseline classifier

(Witten et al., 1999). For data with categorical class values it uses the mode and for

numeric class values it uses the mean. In Krishna et al. (2013), over an average of 3

data sets relating to cancer classification, it achieved an average accuracy of 60%.

Reg: 100164855 21

CMP-6013Y

5.4.2 Random Forest

Random Forest is an ensemble of decision trees, generally trained using the bagging

method (Breiman, 2001). It adds randomness to these decision trees by using a random

subset of the available attributes and/or training data for each decision tree and not all

of them, thereby increasing the diversity of the model. Random Forest is currently one

of the most popular classifiers, and often used as a benchmark, due to it performing

well for any given data set as well as it being inexpensive train and quick to run. Two

examples of this are in Ghimire et al. (2012) where it achieved an accuracy 92% in land-

cover classification and in Pal (2005) where it achieved an average accuracy of 88.37%

for remote sensing classification.

5.4.3 Rotation Forest

Rotation Forest is a method for generating classifier ensembles based on feature ex-

traction. The training data for an individual classifier is created by randomly splitting

the attribute set into k subsets. k axis rotations take place to form the new attributes for

an individual classifier, with the idea of using rotations to encourage individual classi-

fier accuracy and diversity in the ensemble. Rotation Forest has been compared against

bagging, AdaBoost, and Random Forest using 33 random data sets from the UCI repos-

itory, UC Irvine (1987), and found that ‘Rotation Forest outperformed all three methods

by a large margin’ (Rodriguez et al., 2006).

5.4.4 C4.5

C4.5 is a classifier that generates a decision tree which uses the theory of informa-

tion gain and entropy (Quinlan, 1993). In its simplest form it says that the amount

of information gained is inversely proportional to the probability of an event happen-

ing. Therefore the attribute with the highest information gain will be split first, and is

continued recursively until all data is classified. It became very popular in 2008 after

being ranked number 1 in ‘Top 10 Algorithms in Data Mining’, Wu et al. (2008), and 3

years later the authors of Weka described it as a landmark decision tree program that is

probably the most widely used in practice to date (Witten et al., 2011).

Reg: 100164855 22

CMP-6013Y

5.4.5 Sequential Minimal Optimisation with SVM

SMO is a support vector machine (SVM) in which the training problem is solved

using John Platt’s sequential minimal optimisation (SMO) algorithm (Platt, 1998). A

support vector machine is an algorithm that tries to find a hyperplane (or line) that

separates the data into its classes (Awad and Khanna, 2015). While support vector

machines were originally designed to support only two-class problems there are ways

of extending them to multiclass problems, like the one presented in this paper. The two

most common methods of doing this are ‘one-against-one’ and ‘one-against-all’ (Hsu

et al., 2002). In the former a binary classifier is trained for each pair of classes and the

outputs are combined whereas in the latter for k classes, k binary classifiers are trained

and each determines whether the test instance is the same class as itself or one of the

other classes and the classifier with the largest output is taken as the actual class of the

test case.

5.4.6 XGBoost

XGBoost is an open-source implementation of gradient boosting that use classifi-

cation and regression trees (CART) as base models designed to be ‘highly efficient,

flexible and portable’ (The XGBoost Contributors, 2014). It implements an additional

regularisation term which penalises the complexity of the model and helps to reduce

over-fitting (Chen and Guestrin, 2016). Using data provided by Microsoft for a mal-

ware classification challenge hosted on Kaggle, an accuracy of 99.77% was achieved on

a combination of all categories while employing bagging and parameter optimisation

(Ahmadi et al., 2016).

5.4.7 AdaBoost

AdaBoost, or Adaptive Boosting, is a boosting algorithm where the output of the

weak classifiers is combined into a weighted sum that represents the final output of

the classifier (Schapire, 2013). It works by initialising the weights equally and getting

predictions for the training data on this. It then gets an overall weight for the classifier

and modifies the instance weight based on if it was correctly classified or not before

Reg: 100164855 23

CMP-6013Y

normalising so that the weights add up to one.

5.4.8 Multilayer Perceptron

Multilayer Perceptron (MLP) is a deep, artificial neural network that uses one or more

hidden layers and a nonlinear activation function on its nodes. The learning occurs

by changing connection weights, using backpropagation, after each piece of data is

processed in order to minimise error. They generally take longer to train than other

types of classifier because of this (Gardner and Dorling, 1998).

5.4.9 Time Series Forest

Time Series Forest (TSF) is a tree based ensemble method that was designed for time

series classification. It uses a combination of entropy gain and distance to evaluate the

possible splits, and randomly samples attributes at each node in the tree. The overall

prediction of a given instance is done using majority voting of all trees in the ensemble

(Deng et al., 2013).

5.4.10 BOSS

Bag of SFA symbols (BOSS) is a dictionary based ensemble classifier. It works by

extracting patterns from a time series and carries out ‘low pass filtering and quantisation’

to reduce the noise in these patterns. The classification is then done by comparing the

new noise-reduced patterns and can achieve up to a 10% higher accuracy than its rivals

(e.g. Bag of patterns (BOP), Lin et al. (2012)) as well as run up to 13-times as fast

(Schäfer, 2015).

5.4.11 HIVE-COTE

Hierarchical Vote Collective of Transformation-Based Ensembles (HIVE-COTE) is,

put simply, an improved version of Flat-COTE. It has been improved by defining a new

hierarchical probabilistic voting structure, defining a new spectral ensemble classifier,

and the addition of a dictionary-based & interval-based classifier. This new classifier

is significantly more accurate than Flat-COTE, achieving a significantly better average

Reg: 100164855 24

CMP-6013Y

rank of 1.6353 compared to 2.8588 respectively, when tested against 85 UCR data-sets

(Lines et al., 2016).

6 Analysis of Results

This section of the paper covers the results that were obtained after running all exper-

iments described in Section 5.3 and 5.4. The results will first be described by classifier

with a table showing the results for common metrics, rounded to 4 decimal places, for

the data-sets followed by a summary of all the classifiers and data-sets in comparison

to each other. The table for Section 6.1 will show the results for all data sets, however,

the subsequent tables will show slightly less as I feel it is redundant to show so much

information each time.

The table will contain the accuracy as should always be the first place to start when

evaluating the results of a classifier, as well as the balanced accuracy as there is no harm

in including it as it tends to be more interpretable, however, it tends to be more use-

ful for problems with a large number of classes or a class imbalance. Sensitivity and

specificity, also known as the true positive rate and true negative rate respectively, are

normally used for binary classification problems but can be applied to multiclass prob-

lems, like this one, through the use of averaging. The precision, F1 and the Area Under

the Receiver Operating Characteristic (AUROC) curve are also included as further ways

to any differences between the data-sets, however these are again preferred for binary

classification problems.

6.1 AdaBoost

Looking at Table 3 in the Appendix the first thing that becomes apparent is that Ad-

aBoost performs poorly across the board, with the accuracy ranging from 30% up to

47% . The t and concat axes perform significantly better on accuracy than the other

3 axes with all data-sets having an accuracy in the mid 40s compared to the low 30s

respectively. The data downsampled to 50Hz appears to perform better on average than

then the 100Hz data which was similar to the original 200Hz data. The ‘Mean’ data vs

the ‘Skip’ data is seen to have performed similarly, possibly with ‘Mean’ edging ahead.

Reg: 100164855 25

CMP-6013Y

The balanced accuracy tells the same story but with accuracy values ranging from 28%

up to 43%.

The sensitivity for the t and concat axes are once again significantly higher than the

rest, with a few exceptions this time being ‘x_100HzSkip’ etc. Also, in the t and concat

axes, the trend that 50Hz is better than 100Hz remains true, while for the others axes

it varies. The precision, like the accuracy, is low across the board with t and concat

axes instead coming in the middle of the pack. The specificity has a wide range of

values, from 12% up to 72%, with an almost random looking distribution if not for

worse values for the first 2 axes. The F1 scores are also low throughout, with the higher

values appearing for the first 2 axes. The AUROC values are relatively stable, hovering

around the mid 50s to mid 60s mark, and once again t and concat come in first place.

6.2 BOSS

The results for the BOSS classifier, Table 4 in the Appendix, has had all data-sets

from y and z axes apart from 1 removed as they all performed incredibly similarly. The

table has a wide range accuracy values, from 59% up to 94% depending on the data-

set. The concat axis performs the best with all accuracies being 90%+ followed by

the y and z axes, which had very similar results for all data-sets (86% - 88%). The t

axis performed the worst with accuracies averaged in the mid 60s. For all data-sets the

balanced accuracy remains incredibly close to the normal accuracy meaning all classes

were predicted with similar success. Both 50Hz and 100Hz performed similarly on all

data-sets bar concat and t where 100Hz was better; 200Hz on the other hand varies

between the data-sets from worst in concat to best in t.

The sensitivity follow the same trend as the accuracy, however, for ‘x_100HzMean’

and below it is lower than the accuracy. The precision appears high over all data-sets, in-

cluding t which had the worst accuracy. The specificity comes in very high throughout,

with all values being above 90% apart from ‘x_100HzSkip’ at a high 89%, and having

no discernible pattern. Like with precision, the AUROC values are all quite high, but

with all values being above 81% instead of 89%, and the clear loser being the t axis.

Reg: 100164855 26

CMP-6013Y

6.3 C4.5

Table 5 in the Appendix, for the C4.5 classifier, has had 1 data-set from each axes

removed apart from z which has 1 remaining. We can see that the concat and y axes

perform the best in both accuracy and balanced accuracy, while the t and x axes trail

by approximately 10% around the 50% mark . It appears as though 50Hz performs

significantly better, on average, than both 100Hz and 200Hz while it appears that ‘Mean’

tends to perform slightly better than ‘Skip’.

The sensitivity displays a large range of values, from 30% for ‘t_200Hz’ to 75%

for ‘y_50HzSkip’, and is usually approximately 10-15% lower than the accuracy. The

precision also has varying results between axes, with concat and y appearing the best,

however, even within each axis there is a fair amount of variation e.g. the y axis ranges

from 48-64%. Over all the data-sets the specificity tends to be higher than sensitivity by

about 25%, and with no obvious pattern. The AUROC values all appear to group around

the mid 60s to mid 70s area, with concat and y looking the best, but not by a significant

amount.

6.4 Dynamic Time Warping

6.4.1 1NN & 30% Warping

The results for DTW1NN @ 30%, shown in Table 6 in the Appendix, has had 7

data-sets removed, 1 from each axes, apart from t and z which had 2 removed. We can

see immediately that concat performs the best, according to the accuracy and balanced

accuracy, followed by the y axis with accuracies of about 88% and 85% respectively.

Both t and x perform very similarly using these metrics, at around 70%. The difference

between 50Hz and 100Hz seems to vary between axes, with some showing that 50Hz

is better, and others showing 100Hz as better, and one that displays no real difference.

The difference between ‘Mean’ and ‘Skip’ also shows the same trend.

The sensitivity shows itself as being very high for the concat, y and z axes, with

concat even achieving 100% on 2 different data-sets. The remaining 4 metrics display a

similar trend to that of accuracy, with concat performing the best and with y being next

etc. However, the trend is less obvious for the AUROC values as, while the t and x axes

Reg: 100164855 27

CMP-6013Y

are lower than the other axes, they are not lower to the same extent.

6.4.2 1NN & 60% Warping

The difference in the results for this variation compared to the last are very small

for all of the data-sets and metrics, with the averages changes being of about 1%, and

therefore I see no reason to include a table for it. The most notable change was to the

accuracy of the ‘y_200Hz’ data-set which increased by around 3.5%.

6.4.3 3NN & 30% Warping

Table 7 in the Appendix for the DTW1NN @ 30% classifier has had the 200Hz data-

set for each axis removed as they each performed almost identically to its corresponding

‘100HzSkip‘. The table shows us that the concat and y axes perform equally well with

accuracy and balanced accuracy values of around 86% which is significantly different to

the next best axis, which is z with an average accuracy of approximately 79%. The table

also shows that, for the t axis 50Hz performs better than 100Hz, the opposite is true

for the z axis, and there is no significant for the other 3 axes. There is also no obvious

difference between the results of ‘Mean’ and ‘Skip’ that works for all data-sets, as the

best performing one varies between the data-sets and what level of downsampling was

used.

Like with the results in Section 6.4.1 the sensitivity shows itself as being very high

for the concat, y and z axe, with concat edging ahead, at around 98%, and getting 100%

on 2 data-sets again, while y and z are identical for almost every data-set. Both the

precision and the specificity display the same trend of concat performing equally as

well as z, followed closely by y and both t and x performing significantly worse than the

rest. The F1 and AUROCs trend differs from the previous trend in that it shows concat

and y performing equally well, followed closely by z.

6.4.4 3NN & 60% Warping

As with the 1NN variation of DTW, the difference in the results for this variation

compared to the last are very small for all of the data-sets and metrics, with the aver-

Reg: 100164855 28

CMP-6013Y

ages changes again being of about 1%, so I decided to leave the table out again. The

most notable change this time was to the accuracy of the t_100HzMean’ data-set which

increased by around 4%.

6.5 Euclidean Distance 1NN

Table 8 in the Appendix, for ED1NN has had the 200Hz data-set for each axis re-

moved as they each performed almost identically to its corresponding ‘50HzSkip‘, as

well as the ‘concat_100HzSkip’ which performed similarly to ‘concat_100HzMean’.

We can see from the table the concat axis performed significantly better than the other

axes, with an average accuracy and balanced accuracy of approximately 80%, compared

to the next best which was the z axis with 67%. The x axis performed the worse by a

significant margin with accuracies around 40%. It appears as if there is no significant

difference between the 50Hz and 100Hz results for any of the axes, however, there does

appear to be a slight performance increase for ‘Mean’ in comparison to ‘Skip’ for all of

the axes.

For the most part the sensitivity results seemed to follow a pattern in which it would

remain somewhat close to its corresponding accuracy value, anywhere from a drop of

5% to an increase f 10%. The exception to this however are the x and y axes which

show a significant drop in the value compared to their accuracy, with the y axis dropping

around 15% and the x axis dropping around 30%. The precision stays consistently high

through all data-sets, apart from the t axis, with its lowest value being 83%, and 2

data-sets managing to score 100%. However, the data-sets where 100% precision was

achieved, there was an extremely low sensitivity value, of approximately 7.5%, meaning

there was a low false positive rate, but a high false negative rate. The specificity was

also high across all data-sets, which supports a high precision. The F1 and AUROC

values don’t really contain any information of note as they just follow the same trend as

the accuracy.

Reg: 100164855 29

CMP-6013Y

6.6 Multilayer Perceptron

Table 9 in the Appendix, for the MLP classifier, has had 4 data-sets removed; ‘con-

cat_50HzSkip’, ‘t_100HzSkip’, ‘t_200Hz’ and ‘x_100HzMean’ as each of them per-

forms similar to another data-set in the same axis as it. We can see that the accuracy and

balanced accuracy clearly shows that once again the concat axis performs the best by

a significant difference, with the y axis being the next best with average accuracies of

around 84% and 71% respectively. The x axis performed the worst with an average of

56%. The results do not show any significant differences between the 50Hz and 100Hz

data-sets or between the ‘Mean’ and ‘Skip’ data-sets.

The sensitivity, like the previous classifier, had a high range of values that largely

followed the same trend as the accuracy with the exception of the y axis. The y axis had

significantly lower accuracies than concat, however, achieved very similar sensitivity

scores. The AUROC values were generally high for all the data-sets, only really just

dropping below 80% for the x axis which is unsurprising as it also achieved the lowest

accuracies. The precision, specificity and F1 all show nothing nothing out of the ordi-

nary other than that the z axis achieved higher precision values than the y axis, which is

where it is regaining the ground it lost to y in sensitivity.

6.7 Random Forest

Table 10 in the Appendix, for the Random Forest classifier, has only had 2 data-sets

removed which are ‘concat_200Hz’ and ‘x_200Hz’ as they both achieved very similar

results to that of their corresponding ‘50HzSkip’ data-set. The accuracy and balanced

accuracy seems to be more closely grouped over all data-sets than in previous classifiers,

the range of values only going from 53% up to 76%. However, even given this we can

still see that concat and y are the best performing axes, with accuracies around 70%,

and that x is the worst performing axis at around 56%. Also, there doesn’t seem to be

any significant difference between the 50Hz and 100Hz data-sets, except for the y axis

where 50Hz performs better by about 6%. The difference between ‘Mean’ and ‘Skip’ is

small, with the majority of data-sets showing a slight improvement for ‘Mean’, usually

around 2%.

Reg: 100164855 30

CMP-6013Y

The sensitivity for each of the data-sets seems to remain relatively true to the same

trend as accuracy, usually only deviating by about 5$ each time. The specificity, like

the accuracy, is just as closely grouped, ranging from 65% to 88%, however, it instead

shows the z axis as performing the best, with concat and y performing similarly. The

precision, F1 and AUROC all show the same trends as the accuracy, with neither any

extreme high values nor extreme low values.

6.8 Rotation Forest

Table 11 in the Appendix, for the Rotation Forest classifier, has had 4 data-sets re-

moved; the 200Hz data-set from concat and x as they are similar to its ‘100HzSkip’

counterpart, ‘50HzMean’ from y which is similar to ‘100HzMean’ and ‘100HzSkip’

from y which is similar to ‘100HzMean’. The values for accuracies and balanced accu-

racies, unlike Random Forest, are not as closely grouped with values ranging from 56%

to 84%. It seems to show the concat and y axes as the best, performing equally well

with accuracies around 81%, and the x axis once again performs the worst at around

59%. From this table, it also appears there is no significant difference between 50Hz

and 100Hz data-sets, with the better one varying between axes. However, it does seem

to show that ‘Mean’ performs better than ‘Skip’ for most of the data-sets.

The sensitivity shows instead shows that only the y axis is the best performing axis

for this classifier, followed by z and then by concat. The AUROC values were gen-

erally high for all the data-sets, with the lowest value being just over 80% for the

‘x_50HzMean’ data-set. The precision, specificity and F1 all show the same trend as

accuracy does, just with a little bit more variation in the values for data-sets in the same

axis.

6.9 Sequential Minimal Optimisation with SVM

Table 12 in the Appendix, for the SVM classifier, has had 6 data-sets removed; 3

from concat which were similar to both remaining data-sets, the 200Hz data-set from x

which was similar to ‘x_100Hzmean’, the 200Hz data-set from z which was similar to

‘z_50Hzmean’ as well as ‘t_100HzSkip’ which was similar to its ‘Mean’ counterpart.

Reg: 100164855 31

CMP-6013Y

The accuracy and balanced accuracy values of this classifier has a large range of values

over all the data-sets, from the lowest of 44% (x_100HzMean) up to the highest of 83%

(concat_100HzSkip). There does not seem to be any significant difference between the

50Hz and 100Hz data-sets as well no obvious pattern that either ‘Mean’ or ‘Skip’ is the

superior variation.

The sensitivity appears to follow the the same pattern as the accuracy and balanced

accuracy but with more extreme values, with the exception of the y axis. According

the sensitivity, the y axis performs the best with values around the 90% mark, which is

better than the concat axis by approximately 3%. The precision, F1 and AUROC all

show the same pattern as the accuracy, with the performance values being equally as

closely grouped but just larger (for AUROC). The specificity on the other hand, does

not tell the same story. Using these values the z axis performs equally as well concat

even though it has an average accuracy that is 33% lower, while the t axis performs the

worst as opposed to being in the middle of the pack for the accuracy.

6.10 Time Series Forest

Table 13 in the Appendix for Time Series Forest has had 6 data-sets remove; ‘t_200hz’

which was similar to ‘t_100HzMean’, 200Hz and 50HzMean data-sets for x which were

similar to 100HzSkip and 50HzMean, 200Hz, 100HzSkip from y which were similar to

100HzMean. The accuracies and balanced accuracies place the y axis as the best per-

former by a significant margin at an average of 92%, followed by the z axis at around

84%, while the t axis comes in last place with accuracy values around 70%. The down-

sampled data-sets, 50Hz and 100Hz, do not show any significant differences that are

consistent across all data-sets, with 50Hz appearing slightly better for x and z but the

same of the other data-sets.

The remaining 5 metrics in this table all follow the same trend as the accuracy, not

really providing any extra information, with the AUROC values being very closely

grouped for all data-sets, with its lowest value being just above 90%.

Reg: 100164855 32

CMP-6013Y

6.11 XGBoost

Table 14 in the Appendix showing the results for the XGBoost classifier, has had 5

data-sets removed; 100HzSkip from the concat axis which was similar to 50HzMean,

100HzSkip from x which was similar to 100HzMean, 200Hz from y which was similar

to 100HzMean, and both the 50Hz ‘Mean’ and ‘Skip’ data-sets from z which were

similar to the 200Hz and 100HzMean data-sets respectively. The table clearly shows

that the concat axis performs the best for this classifier by a significant margin, with the

next best being the y axis with an average accuracy, and balanced accuracy, of around

7% less. The ‘Mean’ data-sets appear to have slightly outperformed ‘Skip’ as well as

the 100Hz outperforming the 50Hz for the majority of the data-sets, however, it does

not look like the difference between them is significant.

As with the results for the previous classifier, the remaining 5 metrics do not reveal

any additional information on top of the accuracy and balanced accuracy, with the only

difference being that the AUROC values are less tightly grouped and drop as low as

84%.

6.12 ZeroR

As we can see from the Table, 15 in the Appendix, the results for the ZeroR classifier

has been condensed into a single row. This is because of the way it works, see Section

5.4.1, and the data-sets that were used in this project it will always achieve exactly the

same results.

The data-sets used, while slightly varied, all contained the same number of classes

and therefore the classifier will predict the same class. We can see in Table2 that the

most common class is ‘squat’ with 76 instances out of 265, which is equivalent to the

accuracy shown in the table of ~28.68%.

6.13 EE, Flat-COTE & HIVE-COTE - NEEDS FINISH

Although I had planned to also test these classifier, they were taking too long to

warrant gathering the results. This was an error on my part as there were measures I

could have taken to reduce the runtime of each these, see Section 7.1. The runtime

Reg: 100164855 33

CMP-6013Y

complexity of Elastic Ensemble is O(n2m2), and the runtime complexity of both Flat-

COTE and HIVE-COTE is O(n2m4) compared to the O(nm(n-w)) complexity of the

BOSS classifier, which took the longest out of the classifiers completed; where m is the

length of the series, n is the number of series and w is the number of subseries.

It is unfortunate that I was not able to collect any results for these classifiers as they

would likely have performed very well, giving some good results, as they have demon-

strated to be capable, albeit time consuming classifiers, with other time series data-sets

(Bagnall et al. (2017) and Lines et al. (2016)).

6.14 Results Summary

For the purposes of this project I decided on using ED1NN as the base classifier, this is

because it is the simplest form of classifier that is designed for time series classification.

I also decided not to include critical difference diagrams using balanced accuracy in

this paper, as they were incredibly similar to the ones shown above. This is because the

balanced accuracy achieved on all the data-sets and all the classifiers was very similar

to the standard accuracy, which in turn is likely because there is not a very large number

of classes, 4, or a significant class imbalance, with the largest class accounting for 28%

of the data.

We can see in Figure 4 that the BOSS classifier is rated the best, showing it to have

an average rank of 2.84, and that the ZeroR classifier is rated the worst, being ranked

15th for all data-sets. However, while BOSS is the highest rated, the diagrams does

not show a significant difference between BOSS or the 8th best classifier, XGBoost

with an average rank of 6.66. Six out of the top eight classifiers are classifiers that

have been designed for time series classification (TSC) problems, the exceptions being

XGBoost and Rotation Forest, and only one out of the bottom eight classifiers being a

TSC classifier (ED1NN). This strongly suggests the general rule that classifiers designed

with TSC problems in mind significantly outperform more ‘general use’ classifiers.

Reg: 100164855 34

CMP-6013Y

Figure 4: Critical difference diagram using accuracy for the classifiers

Figure 5: Critical difference diagram using accuracy for the data-sets

Using Figure 5, we first see that there is a huge amount of overlap. between the

Reg: 100164855 35

CMP-6013Y

majority of the classifiers, from the large cliques formed. We also see that the order

of the data-set rankings leave the data-sets grouped by their axis, with concat being

the best option and x being the worst. For each of the axis it also appears that the

downsampled data performs better, on average, than the standard 200Hz data. However,

there does not seem to be any consistent pattern that shows either 50Hz or 100Hz to be

better over all the axes, although there may be a very slight performance increase for the

data downsampled using the average decimation method, or ‘Mean’, in comparison to

normal decimation, or ‘Skip’. Figure 5 shows the best data-set to be ‘concat_50Hz’ with

an average rank of 4.3667, however, it also shows that there is no significant difference

between it and all the other data-sets from the concat, y, and z axes. It shows that the t

and x axes perform significantly worse than the top two axis, concat and y.

The data-sets and classifier that achieved the highest accuracy was ‘concat_100hzmean’

combined with the BOSS classifier, with ~94.7%, while if you combine the best classi-

fier and best data-set according to the figures you get an accuracy of ~91.7%. The crit-

ical difference diagrams are very good at conveying averages, however, if this project

was used in the real world you would likely only want the classifier and data-set that

provides the actual best performance. In the real world, the time performance of the

each of the classifiers would need to be taken into account, depending on the applica-

tion it may not be feasible to use a specific classifier if it takes too long. For example,

when carrying out these experiments there were three classifiers where results were not

obtained due to the runtime of them as well the BOSS classifier, while giving better

results, took significantly longer to run compared to the next best classifier, DTW1NN

@ 30%, which ran approximately 15 times faster.

7 Project Evaluation

7.1 Problems Encountered

The first problem I encountered came as part of the literature review. This project was

relatively broad in terms of its different components so there is a large amount of litera-

ture available and while this is true and a good thing in the most part, it became difficult

Reg: 100164855 36

CMP-6013Y

to find literature that was more than just related but actually similar to this project. It

was mentioned previously that human activity recognition and exercise are both large

fields and a lot of studies have been done in these areas however the majority of them

were done on whole body movements such as running and the like. This did allow me

to find a gap in current knowledge, it also means that there was little information for

me to use as a guide and/or benchmark e.g. what my results should look like or the

algorithm with the highest accuracy. There was little that could be done to overcome

this problem other than making sure that research is carried out thoroughly by using ap-

propriate websites such as sciencedirect.com and link.springer.com in conjunction with

using optimal search terms for the topic.

When I went to begin collecting the data which would be used to train the classifiers

an then to test the performance of each classifiers, I ran into another problem. As I

had to decided to focus only on data recorded using a smartphone in order not limit the

applications of the project, it forced a decision to be made regarding how the data would

be recorded and harvested from the cell phone. The most sensible option to accomplish

this was to use an existing app that could access sensor data. After doing some research

and testing various apps that are able to do this it was clear that the best option was an

app called ‘Physics Toolbox Sensor Suite’, Vieyra Software (2014), which gave me the

ability to record and export this data in the form of a comma-separated values (CSV)

file and choose between different collection rates, which is talked about in more detail

in Section 5.2.

Another problem I encountered, that occurred directly after I finished record the data

for the prototype, was formatting the data. For TSC to be carried out using the data

it has to be correctly formatted into a attribute-relation file format (ARFF). This file

format requires that every instance in the data set has the same number of attributes as

each other. In the case of the data I was collecting for the prototype it means, each

instance was single exercise (either a push-up or a sit-up) and each instance has to have

the same number of attributes of which each is a single value in the data. For example

2.5 seconds of data at a collection rate of 200Hz means the instance has 500 attributes.

Each instance needs to be like this which can be achieved through the trimming the

data to the same length, which introduces the question of when to trim the data or what

Reg: 100164855 37

CMP-6013Y

number of attributes provides the best results. For the prototype I trimmed the data to

151 attributes plus the class attribute which was made easier by following a set of rules

when recording the data. The problem I found when formatting this data was the sheer

amount of time it took, making it a completely impractical method when working with

large amounts of data or data that needs to processed automatically. After the prototype

I decided that the best solution to this was to create a bespoke Python script that takes the

data from the CSV files and outputs all the ar f f files necessary, already in the correct

format, see Section 5.3 for more detail.

A problem that only made itself known to me after the prototype was the runtime of

the some algorithms was much longer than expected, which was definitely not helped

by running them on my personal machine. This led me to the decision that certain clas-

sifiers, while they may yield good results, were too time consuming to warrant running,

as talked about in Section 6.13. A possible, and obvious, solution in hindsight would

have been to make use of the UEA high performance computing cluster instead of my

personal machine (University of East Anglia, 2019).

7.2 Outcome

As mentioned previously in this paper, there were many aspects that contributed to

this project, which opened up room for a specific area to be focused more heavily. Early

on in the project I decided to focus on the human activity recognition aspect, as not

only was it the area that I found most interesting, but it is also quite a broad topic

in itself. This means that the outcomes of this project are not limited to one specific

field, but could be applied to many, one being the extension suggested for this project;

Gamification. I am happy with the choice I made regarding focusing the project into a

specific area, considering the results that were achieved and the information yielded.

When evaluating a project it is essential to look at the quality, and amount of, work

done to what could have been done in an ideal world. A common way of doing this

is to compare it against the requirements outlined at the beginning. To do this we can

look the MoSCoW table in section 3.1.4. From this we can see that all of the points in

both the ‘Must Have’ and ‘Should Have’ have been met, we can also see that the first

‘Could Have’ point has been met. Overall, based on this, and the previous paragraph, I

Reg: 100164855 38

CMP-6013Y

would say that this project has been a success as the majority of the requirements were

satisfied as well as useful information being gained from the results. The project was

also a success on paper as it was finished and delivered on time, including all supporting

material, which was helped hugely by the constant referral to the the revised Gantt

chart shown in the Appendix. Please the attached notebook for a record of supervisor

meetings and milestones reached.

7.3 Further Research

The field of human activity recognition in the context of exercise is a field that has

a huge amount of potential for different things to investigate and in possible real world

applications. An example further work, and real world application, that has been men-

tioned in this project but not carried out, is the use of the exercise recognition in an effort

to try and gamify and encourage exercise, hopefully with the aim of improving the peo-

ples general fitness. Another thing that could be researched further, and would likely

need to be for the aforementioned application, is to include a much larger database of

exercises that can be recognised, as to not limit people to a small number of exercises

that they can use to earn rewards etc. The recognition of exercises could also be applied

to personal training, or maybe even injury prevention and recovery. If the quality of

exercise recognition could get to a high enough level, it could be used check the ‘form’

in which an exercise is carried out which would have applications in the areas just men-

tioned as it could inform the user of what they may be doing wrong and how to improve

with the aim of maximising muscle usage or preventing injury etc.

8 Conclusion

The aim of this project was to evaluate the performance of existing classification

methods for time series data of non whole-body exercises in order to see whether it is

feasible for it to have applications in the real world. This project also had the extension

goal of seeing how possible, if all, it is to gamify exercise, however, this aspect was

dropped in favour of a more in detail look at human activity recognition.

Reg: 100164855 39

CMP-6013Y

The results that were found, supported the fact that the current level of the field is at a

reasonable level in order to start thinking about and working on real world applications.

We found, for the exercises in this paper that we could achieve up to 94.7% accuracy

rating when using a more favourable classification algorithm and data variation. There

are no doubt better classification algorithms and/or data variations that could be em-

ployed, some of which may have been mentioned in Section 6.13, in order to get better

results.

In the future I have no doubt that this field will continue to grow as more people realise

the huge potential it has to positively impact peoples lives as well as, more appealing

to some, make money. The development of increasingly powerful computer hardware,

and more importantly more intelligent & efficient algorithms for classifiers will open up

a new level of performance, in terms of both accuracy etc., and time efficiency.

Reg: 100164855 40

CMP-6013Y

9 Appendix

Algorithm 1 Classifying Test DataInput: trainData.ar f f and testData.ar f f

Output: The number of instances classified correctly and the number of instances tried

to classify

1: trainData← trainData.ar f f

2: classi f ier← define classifier type here e.g. ED1NN

3: build the classi f ier and train it with trainData

4: testData← testData.ar f f

5: correctCount ← 0

6: for each instance in testData do7: classi f iedInstance← prediction of the class of instance

8: actualClass← the actual class of instance

9: if classi f iedInstance = actualClass then10: correctCount = correctCount + 1

11: end if12: end for13: return correctCount and number of instances contained in testData

Figure 6: Prototype pseudocode showing: data being loaded, the classifier being

created and trained and then classifying the test data

Reg: 100164855 41

CMP-6013Y

Algorithm 2 Classifying all data-sets with all classifiersInput: None

Output: Results for each classifier and data-set written to a file

1: Let C be a list of classifiers

2: Let data be a list of training ar f f files and its corresponding test ar f f file

3: for each classi f ier in C do4: for each dataPair in data do5: trainingData← dataPair.train

6: testData← dataPair.test

7: build classi f ier with trainingData

8: results← object storing all results information for current dataPair

9: for each instance in testData do10: dist ← distribution of predicted classes for instance using classi f ier

11: prediction← class with highest value in dist

12: actual ← the actual class of instance

13: results.add(actual, dist, prediction)

14: end for15: resultMetrics← calculate all result metrics using results

16: write resultMetrics to appropriate file

17: end for18: end for

Figure 7: Pseudocode showing: looping through the classifiers, looping through the

data-sets, building the classifier, classifying test data, outputting results

Reg: 100164855 42

CMP-6013Y

Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.4453 0.4141 0.8684 0.3646 0.3114 0.3017 0.652

concat_100HzSkip 0.4415 0.4113 0.8421 0.3616 0.3193 0.2986 0.652

concat_200Hz 0.4642 0.432 0.8947 0.3716 0.3235 0.3171 0.6695

concat_50HzMean 0.4717 0.4396 0.8947 0.3757 0.3353 0.3223 0.6771

concat_50HzSkip 0.4415 0.4093 0.8947 0.3676 0.2952 0.2981 0.6419

t_100HzMean 0.4453 0.4141 0.8684 0.3646 0.3114 0.3017 0.652

t_100HzSkip 0.4415 0.4113 0.8421 0.3616 0.3193 0.2986 0.652

t_200Hz 0.4642 0.432 0.8947 0.3716 0.3235 0.3171 0.6695

t_50HzMean 0.4717 0.4396 0.8947 0.3757 0.3353 0.3223 0.6771

t_50HzSkip 0.4415 0.4093 0.8947 0.3676 0.2952 0.2981 0.6419

x_100HzMean 0.3208 0.3248 0.375 0.4706 0.6932 0.2133 0.5682

x_100HzSkip 0.3358 0.3484 0.9697 0.3404 0.1678 0.2179 0.582

x_200Hz 0.317 0.321 0.375 0.4615 0.6818 0.211 0.5656

x_50HzMean 0.3019 0.3048 0.2344 0.375 0.7222 0.1838 0.5325

x_50HzSkip 0.3358 0.3484 0.9697 0.3404 0.1678 0.2179 0.582

y_100HzMean 0.3094 0.2827 0.8026 0.2891 0.1228 0.1953 0.5528

y_100HzSkip 0.3094 0.2827 0.8026 0.2877 0.1221 0.1956 0.55

y_200Hz 0.3094 0.2827 0.8026 0.2877 0.1221 0.1956 0.55

y_50HzMean 0.3283 0.3116 0.4737 0.3871 0.4722 0.2137 0.5455

y_50HzSkip 0.3283 0.3156 0.3684 0.459 0.6413 0.2114 0.5493

z_100HzMean 0.3208 0.3095 0.3289 0.3571 0.5714 0.2006 0.5518

z_100HzSkip 0.3321 0.3209 0.3289 0.4464 0.6702 0.2092 0.5475

z_200Hz 0.3321 0.3194 0.3684 0.4 0.5882 0.2108 0.5372

z_50HzMean 0.3396 0.327 0.3684 0.4179 0.6139 0.2153 0.5399

z_50HzSkip 0.3208 0.31 0.3158 0.3692 0.598 0.1998 0.5505

Table 3: Results for the AdaBoost classifier

Reg: 100164855 43

CMP-6013Y


concat_100HzSkip 0.9245 0.9206 0.9737 0.881 0.9448 0.924 0.9908

concat_200Hz 0.9019 0.8963 0.9737 0.8605 0.9322 0.9003 0.9936

concat_50HzMean 0.917 0.9131 0.9737 0.9024 0.9548 0.9152 0.9705

concat_50HzSkip 0.9245 0.9226 0.9605 0.9481 0.9773 0.9228 0.9798

t_100HzMean 0.6491 0.6492 0.7105 0.8438 0.9219 0.6492 0.8759

t_100HzSkip 0.6453 0.6473 0.6974 0.8548 0.9291 0.6428 0.8689

t_200Hz 0.6868 0.6835 0.8026 0.8592 0.9237 0.6821 0.9037

t_50HzMean 0.5962 0.5994 0.6053 0.8364 0.9256 0.5988 0.8139

t_50HzSkip 0.6189 0.6192 0.7105 0.8438 0.9167 0.6142 0.8528

x_100HzMean 0.7245 0.7297 0.6316 0.8276 0.9351 0.7166 0.8903

x_100HzSkip 0.7019 0.7074 0.5921 0.7377 0.8981 0.6968 0.8958

x_200Hz 0.717 0.721 0.6316 0.7619 0.9045 0.7121 0.9027

x_50HzMean 0.7472 0.7546 0.6053 0.8846 0.962 0.7448 0.9086

x_50HzSkip 0.7208 0.7268 0.5789 0.8302 0.9423 0.7176 0.9012

y_100HzMean 0.8755 0.8724 0.9605 0.9241 0.9636 0.8722 0.9686

Table 4: Results for the BOSS classifier


concat_100HzSkip 0.6189 0.6164 0.5658 0.589 0.8013 0.6157 0.7348

concat_200Hz 0.5925 0.5933 0.4868 0.5211 0.7792 0.5914 0.7468

concat_50HzMean 0.6151 0.6168 0.5132 0.6724 0.8671 0.6172 0.7446

t_100HzMean 0.4755 0.4761 0.3816 0.3333 0.6258 0.472 0.685

t_100HzSkip 0.4566 0.4599 0.3158 0.3158 0.651 0.4565 0.6697

t_200Hz 0.4981 0.5049 0.3026 0.3151 0.6855 0.4991 0.6906

t_50HzSkip 0.5509 0.557 0.3947 0.4615 0.7682 0.5526 0.7017

x_100HzMean 0.5358 0.5397 0.3947 0.4545 0.7568 0.536 0.6739

x_200Hz 0.5283 0.5317 0.4211 0.4051 0.6968 0.5312 0.7369

x_50HzMean 0.4377 0.4386 0.3684 0.3294 0.6069 0.4486 0.6486

x_50HzSkip 0.4528 0.4497 0.4474 0.3434 0.5695 0.4534 0.6206

y_100HzMean 0.5887 0.5781 0.7368 0.56 0.6944 0.5821 0.7355

y_200Hz 0.5472 0.5445 0.4868 0.4805 0.7297 0.5502 0.6646

y_50HzMean 0.6 0.5931 0.6842 0.6047 0.7589 0.5975 0.7588

y_50HzSkip 0.6415 0.6311 0.75 0.6404 0.7793 0.6349 0.7419

z_100HzMean 0.5245 0.5196 0.5263 0.597 0.7857 0.5169 0.7112

Table 5: Results for the C4.5 classifier

Reg: 100164855 44

CMP-6013Y


concat_100HzSkip 0.883 0.8775 0.9868 0.9259 0.9636 0.8768 0.9484

concat_200Hz 0.8717 0.8679 0.9474 0.9114 0.9578 0.8669 0.9358

concat_50HzMean 0.9094 0.9047 1.0 0.9383 0.9706 0.9045 0.961

t_100HzSkip 0.7057 0.7076 0.6579 0.6173 0.8155 0.703 0.813

t_200Hz 0.6868 0.6867 0.6711 0.593 0.7892 0.6794 0.8085

t_50HzMean 0.717 0.716 0.7368 0.6747 0.8323 0.7058 0.8365

x_100HzMean 0.7132 0.7083 0.7763 0.6146 0.7784 0.6989 0.8337

x_100HzSkip 0.7208 0.7129 0.8158 0.6739 0.8113 0.7024 0.8445

x_200Hz 0.7094 0.7011 0.8289 0.63 0.7716 0.6907 0.8364

x_50HzSkip 0.7019 0.6981 0.75 0.6628 0.8165 0.6795 0.8314

y_100HzMean 0.8566 0.8477 0.9605 0.8295 0.9112 0.8492 0.9164

y_200Hz 0.8604 0.8532 0.9474 0.8471 0.9231 0.8546 0.9198

y_50HzMean 0.8566 0.8481 0.9474 0.878 0.9394 0.8488 0.9171

y_50HzSkip 0.8642 0.858 0.9342 0.8256 0.9133 0.8605 0.9161

z_100HzSkip 0.834 0.826 0.9474 0.9351 0.9675 0.8233 0.923

z_200Hz 0.8038 0.7945 0.9474 0.8372 0.9097 0.7897 0.9052

z_50HzSkip 0.8 0.7902 0.9474 0.9231 0.9589 0.7836 0.909

Table 6: Results for the DTW1NN @ 30% classifier


concat_100HzSkip 0.8642 0.8579 0.9737 0.881 0.9394 0.8582 0.9635

concat_50HzMean 0.8792 0.8728 1.0 0.8941 0.9458 0.8717 0.9721

concat_50HzSkip 0.8679 0.8611 1.0 0.8539 0.9222 0.8596 0.9693

t_100HzMean 0.717 0.7176 0.7105 0.6067 0.7953 0.7109 0.895

t_100HzSkip 0.7283 0.73 0.7237 0.6044 0.7931 0.7212 0.8862

t_50HzMean 0.7321 0.7325 0.7368 0.6154 0.7977 0.725 0.9002

t_50HzSkip 0.7434 0.7447 0.7368 0.6437 0.8198 0.739 0.8893

x_100HzMean 0.7019 0.6904 0.8816 0.6262 0.7484 0.6703 0.8978

x_100HzSkip 0.6943 0.6821 0.8684 0.6286 0.7516 0.6617 0.8936

x_50HzMean 0.7132 0.7059 0.8289 0.6562 0.7925 0.6877 0.8849

x_50HzSkip 0.6868 0.678 0.8026 0.6421 0.7806 0.6586 0.8831

y_100HzMean 0.8717 0.8639 0.9605 0.8391 0.9186 0.866 0.9663

y_100HzSkip 0.8642 0.857 0.9474 0.8571 0.929 0.8581 0.9604

y_50HzMean 0.8717 0.8648 0.9474 0.9 0.9521 0.866 0.9616

y_50HzSkip 0.8642 0.8564 0.9474 0.8571 0.929 0.8579 0.964

z_100HzMean 0.8226 0.8129 0.9474 0.8889 0.9419 0.8104 0.9409

z_100HzSkip 0.8189 0.8097 0.9474 0.8889 0.9416 0.8065 0.9393

z_50HzMean 0.7887 0.7778 0.9474 0.9114 0.9514 0.7707 0.9313

z_50HzSkip 0.7925 0.7812 0.9342 0.8765 0.9329 0.7759 0.9416

Table 7: Results for the DTW3NN @ 30% classifier

Reg: 100164855 45

CMP-6013Y


concat_50HzMean 0.8226 0.8152 0.9211 0.9459 0.9737 0.8138 0.9084

concat_50HzSkip 0.7962 0.7915 0.8421 0.9552 0.98 0.7901 0.8842

t_100HzMean 0.683 0.6819 0.6579 0.641 0.8239 0.6792 0.8033

t_100HzSkip 0.6755 0.6737 0.6579 0.6329 0.8165 0.6711 0.8

t_50HzMean 0.6868 0.6846 0.6842 0.6341 0.8125 0.6819 0.8086

t_50HzSkip 0.6679 0.668 0.6184 0.6351 0.828 0.6631 0.7934

x_100HzMean 0.4038 0.4151 0.0921 1.0 1.0 0.3504 0.5907

x_100HzSkip 0.3887 0.4004 0.0658 1.0 1.0 0.3271 0.5795

x_50HzMean 0.4302 0.44 0.1579 0.9231 0.9903 0.3839 0.6158

x_50HzSkip 0.3962 0.4089 0.0658 0.8333 0.9901 0.3343 0.585

y_100HzMean 0.5547 0.5497 0.3816 0.9355 0.9833 0.5102 0.6661

y_100HzSkip 0.5472 0.5426 0.3684 0.9032 0.975 0.5026 0.6618

y_50HzMean 0.5547 0.5494 0.3816 0.9355 0.9833 0.5071 0.6631

y_50HzSkip 0.5585 0.553 0.3947 0.9091 0.9752 0.5137 0.6695

z_100HzMean 0.6755 0.6704 0.75 0.9661 0.9839 0.6486 0.8194

z_100HzSkip 0.6717 0.6671 0.7368 0.9655 0.9839 0.645 0.8153

z_50HzMean 0.6906 0.6845 0.7895 0.9524 0.9762 0.6609 0.8354

z_50HzSkip 0.6679 0.662 0.7632 0.9667 0.9835 0.6346 0.8209

Table 8: Results for the ED1NN classifier


concat_100HzSkip 0.8453 0.8392 0.9211 0.8642 0.9333 0.8396 0.9473

concat_200Hz 0.8528 0.8498 0.8816 0.859 0.9353 0.8501 0.9506

concat_50HzMean 0.834 0.8301 0.8684 0.8684 0.9394 0.8297 0.9439

t_100HzMean 0.6189 0.6214 0.5395 0.4659 0.7235 0.6189 0.8114

t_50HzMean 0.6189 0.6212 0.5263 0.4762 0.7381 0.6194 0.793

t_50HzSkip 0.6 0.6032 0.5132 0.4643 0.7273 0.6009 0.8022

x_100HzSkip 0.566 0.5748 0.3026 0.46 0.8247 0.5553 0.7646

x_200Hz 0.5849 0.5939 0.3289 0.5435 0.8609 0.5699 0.7942

x_50HzMean 0.5585 0.5693 0.2763 0.4375 0.8247 0.5477 0.7695

x_50HzSkip 0.5849 0.5908 0.3947 0.5263 0.8224 0.5723 0.7859

y_100HzMean 0.7019 0.6843 0.8947 0.7816 0.8613 0.6706 0.8781

y_100HzSkip 0.7283 0.7113 0.9342 0.7978 0.8714 0.7046 0.888

y_200Hz 0.7321 0.7136 0.9605 0.7935 0.8643 0.7048 0.896

y_50HzMean 0.7132 0.6961 0.9079 0.7931 0.8696 0.6881 0.8836

y_50HzSkip 0.7094 0.6923 0.8947 0.8293 0.8955 0.6836 0.883

z_100HzMean 0.6943 0.6883 0.7368 0.8358 0.9209 0.6804 0.8696

z_100HzSkip 0.6755 0.6687 0.6974 0.7465 0.875 0.6635 0.8554

z_200Hz 0.6868 0.6814 0.7105 0.7941 0.9014 0.6747 0.8652

z_50HzMean 0.7057 0.702 0.7105 0.8308 0.9236 0.694 0.8768

z_50HzSkip 0.6642 0.66 0.6842 0.7761 0.8921 0.6517 0.8436

Table 9: Results for the MLP classifier

Reg: 100164855 46

CMP-6013Y


concat_100HzSkip 0.683 0.682 0.6711 0.6375 0.8176 0.6803 0.8826

concat_50HzMean 0.6943 0.6945 0.6316 0.6486 0.8395 0.6892 0.8795

concat_50HzSkip 0.7208 0.7223 0.6316 0.6154 0.8266 0.7231 0.8715

t_100HzMean 0.6189 0.6239 0.5132 0.4815 0.7485 0.6176 0.829

t_100HzSkip 0.5774 0.5872 0.3816 0.4531 0.7799 0.5746 0.8249

t_200Hz 0.5623 0.5683 0.4474 0.4096 0.7012 0.564 0.8251

t_50HzMean 0.6113 0.6174 0.4605 0.4861 0.7744 0.6158 0.8263

t_50HzSkip 0.5547 0.5642 0.3816 0.3718 0.7066 0.5532 0.8138

x_100HzMean 0.5962 0.6012 0.4211 0.4156 0.7368 0.5959 0.7833

x_100HzSkip 0.5811 0.5833 0.4868 0.5139 0.7697 0.5727 0.8138

x_50HzMean 0.5623 0.5678 0.3816 0.4531 0.7742 0.5483 0.7785

x_50HzSkip 0.5358 0.5329 0.4737 0.3913 0.6543 0.529 0.7561

y_100HzMean 0.6792 0.6727 0.6842 0.6667 0.8312 0.6703 0.8517

y_100HzSkip 0.6528 0.6451 0.6711 0.622 0.7974 0.6415 0.8637

y_200Hz 0.7434 0.7387 0.7237 0.7143 0.8659 0.7354 0.9

y_50HzMean 0.7208 0.719 0.6711 0.7286 0.8805 0.7184 0.8907

y_50HzSkip 0.7358 0.725 0.8158 0.7294 0.8526 0.7219 0.8992

z_100HzMean 0.6981 0.6936 0.7368 0.6829 0.8323 0.6922 0.8676

z_100HzSkip 0.6755 0.6701 0.75 0.7808 0.8841 0.6617 0.863

z_200Hz 0.6642 0.6546 0.75 0.7703 0.875 0.6505 0.8597

z_50HzMean 0.6566 0.6504 0.6842 0.6582 0.8188 0.6436 0.8603

z_50HzSkip 0.6906 0.6839 0.7632 0.7532 0.8681 0.6746 0.8661

Table 10: Results for the Random Forest classifier


concat_100HzSkip 0.8302 0.827 0.8158 0.8052 0.9133 0.8273 0.9465

concat_50HzMean 0.8075 0.8059 0.7763 0.8082 0.9172 0.8056 0.9433

concat_50HzSkip 0.8151 0.8174 0.7368 0.8116 0.9249 0.8157 0.9478

t_100HzMean 0.634 0.6414 0.4737 0.4932 0.7811 0.6375 0.8495

t_100HzSkip 0.6226 0.6337 0.4079 0.4844 0.8024 0.6247 0.8609

t_200Hz 0.6189 0.6243 0.5263 0.4598 0.7251 0.612 0.8559

t_50HzSkip 0.6038 0.6086 0.5132 0.4432 0.7118 0.595 0.8368

x_100HzMean 0.6151 0.6169 0.5395 0.494 0.7439 0.6123 0.8217

x_100HzSkip 0.5736 0.5712 0.5526 0.4286 0.6627 0.5747 0.8153

x_50HzMean 0.566 0.564 0.5263 0.4255 0.6707 0.562 0.8053

x_50HzSkip 0.6453 0.637 0.7105 0.5625 0.7358 0.6352 0.8455

y_100HzMean 0.8038 0.7956 0.8684 0.7857 0.8909 0.7947 0.9378

y_200Hz 0.8264 0.82 0.8816 0.8072 0.9048 0.8201 0.9444

y_50HzMean 0.8113 0.8006 0.9211 0.7527 0.8631 0.8009 0.9292

y_50HzSkip 0.8415 0.8356 0.8947 0.8608 0.9337 0.8352 0.9474

z_100HzMean 0.683 0.6701 0.8158 0.6596 0.7881 0.6684 0.8854

z_100HzSkip 0.7736 0.7665 0.8553 0.7471 0.8642 0.7656 0.9134

z_200Hz 0.7509 0.7397 0.8816 0.7204 0.8354 0.7398 0.9069

z_50HzMean 0.6981 0.6876 0.7895 0.6593 0.8013 0.6858 0.8845

z_50HzSkip 0.7245 0.7124 0.8816 0.7053 0.817 0.7129 0.9032

Table 11: Results for the Rotation Forest classifier

Reg: 100164855 47

CMP-6013Y

Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzSkip 0.8302 0.8267 0.8684 0.7857 0.8953 0.8283 0.9253

concat_200Hz 0.8264 0.8224 0.8684 0.7952 0.9 0.8238 0.9243

t_100HzMean 0.5774 0.5824 0.4605 0.4118 0.7024 0.5767 0.7731

t_200Hz 0.5698 0.5796 0.3684 0.3836 0.7321 0.573 0.7648

t_50HzMean 0.5925 0.5956 0.5263 0.4255 0.6842 0.5821 0.7898

t_50HzSkip 0.6038 0.6059 0.5658 0.43 0.6724 0.5929 0.8025

x_100HzMean 0.4491 0.4559 0.2368 0.36 0.7594 0.4404 0.6485

x_100HzSkip 0.4604 0.4663 0.2632 0.4 0.7727 0.4519 0.6457

x_50HzMean 0.4604 0.4632 0.2895 0.3929 0.7463 0.4473 0.6615

x_50HzSkip 0.4755 0.4784 0.2895 0.5 0.8254 0.4588 0.6525

y_100HzMean 0.6566 0.6353 0.9079 0.7841 0.8468 0.6209 0.8148

y_100HzSkip 0.6604 0.6395 0.9079 0.7841 0.848 0.6267 0.8179

y_200Hz 0.6415 0.6222 0.8947 0.7816 0.843 0.6173 0.8187

y_50HzMean 0.634 0.6095 0.9079 0.7753 0.8319 0.5792 0.8091

y_50HzSkip 0.6415 0.6179 0.8947 0.8095 0.8644 0.5909 0.8101

z_100HzMean 0.5019 0.5007 0.3947 0.7143 0.8957 0.4811 0.7068

z_100HzSkip 0.4868 0.4858 0.3816 0.6444 0.8621 0.47 0.6998

z_50HzMean 0.5094 0.5034 0.4868 0.7551 0.8909 0.4834 0.7012

z_50HzSkip 0.4906 0.4876 0.3947 0.6667 0.8696 0.4661 0.683

Table 12: Results for the SMO classifier


concat_100HzSkip 0.7925 0.7897 0.8289 0.8077 0.9074 0.7902 0.9529

concat_200Hz 0.7811 0.7798 0.7895 0.7895 0.9018 0.7806 0.9488

concat_50HzMean 0.8113 0.8115 0.8026 0.7722 0.8953 0.8125 0.9501

concat_50HzSkip 0.8038 0.8024 0.8289 0.7975 0.9036 0.8028 0.952

t_100HzMean 0.6943 0.6967 0.6579 0.5882 0.7929 0.6896 0.9075

t_100HzSkip 0.7094 0.7105 0.6974 0.6163 0.8036 0.7039 0.9078

t_50HzMean 0.6868 0.6883 0.6579 0.5882 0.7904 0.6838 0.9064

t_50HzSkip 0.7094 0.7114 0.6842 0.6047 0.8 0.7057 0.9103

x_100HzMean 0.717 0.7175 0.6842 0.5909 0.7931 0.7195 0.9254

x_100HzSkip 0.7245 0.7262 0.6711 0.6145 0.815 0.7279 0.9268

x_50HzSkip 0.7434 0.7433 0.7368 0.6222 0.8057 0.7433 0.9285

y_100HzMean 0.9245 0.9221 0.9605 0.8795 0.9451 0.9252 0.9844

y_50HzSkip 0.917 0.9136 0.9605 0.8795 0.9444 0.9168 0.9872

z_100HzMean 0.8226 0.8138 0.9342 0.8353 0.913 0.8155 0.9596

z_100HzSkip 0.8415 0.8346 0.9342 0.8659 0.9325 0.836 0.9602

z_200Hz 0.834 0.8258 0.9342 0.8353 0.9146 0.8277 0.959

z_50HzMean 0.8566 0.848 0.9737 0.8506 0.9217 0.8494 0.9649

z_50HzSkip 0.8491 0.8416 0.9474 0.8372 0.9162 0.8442 0.9622

Table 13: Results for the TSF classifier

Reg: 100164855 48

CMP-6013Y


concat_200Hz 0.8604 0.8536 0.9474 0.8 0.8966 0.857 0.973

concat_50HzMean 0.883 0.8777 0.9342 0.8068 0.9056 0.8815 0.9751

concat_50HzSkip 0.8755 0.8695 0.9474 0.8471 0.9249 0.8716 0.9716

t_100HzMean 0.6038 0.6113 0.4605 0.4375 0.7353 0.5988 0.8448

t_100HzSkip 0.6264 0.6286 0.5789 0.5 0.7349 0.6177 0.8457

t_200Hz 0.6151 0.6186 0.5526 0.4828 0.7289 0.6052 0.8517

t_50HzMean 0.634 0.6349 0.6184 0.5054 0.7246 0.6206 0.8544

t_50HzSkip 0.6453 0.6485 0.5789 0.4944 0.7384 0.6434 0.859

x_100HzMean 0.6453 0.647 0.5526 0.5 0.7544 0.6467 0.8671

x_200Hz 0.6528 0.6546 0.5395 0.494 0.7586 0.6548 0.8571

x_50HzMean 0.6755 0.6746 0.6053 0.5227 0.76 0.6765 0.8657

x_50HzSkip 0.6528 0.6528 0.5658 0.5059 0.7558 0.6553 0.8508

y_100HzMean 0.8264 0.8149 0.9474 0.7826 0.8802 0.8141 0.958

y_100HzSkip 0.8151 0.8017 0.9474 0.7912 0.8834 0.7969 0.9557

y_50HzMean 0.8189 0.8074 0.9474 0.8276 0.9062 0.8057 0.9547

y_50HzSkip 0.7925 0.7774 0.9474 0.7826 0.8734 0.7701 0.9532

z_100HzMean 0.7774 0.7711 0.8158 0.7045 0.8471 0.772 0.928

z_100HzSkip 0.7509 0.7397 0.8553 0.7065 0.8323 0.7386 0.9172

z_200Hz 0.766 0.7576 0.8289 0.7079 0.8434 0.7594 0.9218

Table 14: Results for the XGBoost classifier

Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCAll 0.2868 0.25 1.0 0.2868 0.0 0.1114 0.4901

Table 15: Results for the ZeroR classifier

Reg: 100164855 49

CMP-6013Y

Proj

ects

ched

ule

show

nfo

rsem

este

rwee

knu

mbe

rs

12

34

56

78

910

1112

Chr

istm

asB

reak

12

34

56

78

910

Eas

terB

reak

1112

1314

Proj

ectp

ropo

sal

Lite

ratu

rere

view

Lite

ratu

rere

view

deliv

ery

Des

ign

Des

ign

com

plet

ion

Cod

ing

Test

ing

Cod

ede

liver

y

Fina

lrep

ortw

ritin

g

Insp

ectio

npr

epar

atio

n

Figure 8: Original project Gantt chart

Reg: 100164855 50

CMP-6013Y

Proj

ects

ched

ule

show

nfo

rsem

este

rwee

knu

mbe

rs

12

34

56

78

910

1112

Chr

istm

asB

reak

12

34

56

78

910

Eas

terB

reak

1112

1314

Proj

ectp

ropo

sal

Lite

ratu

rere

view

Lite

ratu

rere

view

deliv

ery

Prot

otyp

eD

esig

n&

Impl

emen

tatio

n

Pro

toty

pede

liver

y

Intr

oduc

em

ore

exer

cise

s

Exp

erim

entin

gw

ithda

tase

tvar

iatio

ns

Proj

ecte

xten

sion

s

Fina

lIm

plem

enta

tion

Del

iver

y

Fina

lrep

ortw

ritin

g

Insp

ectio

npr

epar

atio

n

Pro

ject

Com

plet

ed

Figure 9: Revised project Gantt chart

Reg: 100164855 51

CMP-6013Y

References

Ahmadi, M. et al. (2016). Novel feature extraction, selection and fusion for effective

malware family classification. In Proceedings of the Sixth ACM Conference on Data

and Application Security and Privacy, pages 183–194. ACM.

Awad, M. and Khanna, R. (2015). Support vector machines for classification. Efficient

Machine Learning, pages 39–66.

Bagnall, A. et al. (2016). Uea & ucr time series classification repository. http:

// timeseriesclassification.com/ .

Bagnall, A. et al. (2017). The great time series classification bake off: a review and

experimental evaluation of recent algorithmic advances. Data Mining and Knowledge

Discovery, 31(3):606–660.

Bostrom, A. and Bagnall, A. (2015). Binary shapelet transform for multiclass time

series classification.

Breiman, L. (2001). Random forests. Machine Learning, 45:5–32.

Chang, K. et al. (2007). Tracking free-weight exercises. International Conference on

Ubiquitous Computing, 4717:19–37.

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceed-

ings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery

and Data Mining, pages 785–794. ACM.

Deng, H. et al. (2013). A time series forest for classification and feature extraction.

Information Sciences, 239:142–153.

Gardner, M. and Dorling, S. (1998). Artificial neural networks (the multilayer percep-

tron)âATa review of applications in the atmospheric sciences. Atmospheric Environ-

ment, 32:2627–2636.

Reg: 100164855 52

http://timeseriesclassification.com/

http://timeseriesclassification.com/

CMP-6013Y

Ghimire, B. et al. (2012). An assessment of the effectiveness of a random forest clas-

sifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote

Sensing, 67:93–104.

Goh, D. and Razikin, K. (2015). Is gamification effective in motivating exercise? Inter-

national Conference on Human-Computer Interaction, 9170:608–617.

Hills, J. et al. (2014). Classification of time series by shapelet transformation. Data

Mining and Knowledge Discovery, 28(4):851–881.

Hsu, C.-W. et al. (2002). A comparison of methods for multiclass support vector ma-

chines. IEEE Transactions on Neural Networks, 13(2):415–425.

Krishna, G. et al. (2013). Performance analysis and evaluation of different data mining

algorithms used for cancer classification. International Journal of Advanced Research

in Artificial Intelligence, 2.

Kwapisz, J. et al. (2010). Activity recognition using cell phone accelerometers.

SIGKDD Explorations, 12:74–82.

Lin, J. et al. (2012). Rotation-invariant similarity in time series using bag-of-patterns

representation. Journal of Intelligent Information Systems, 39(2):287–315.

Lines, J. (2015). Time Series Classification through Transformation and Ensembles.

PhD thesis, Computing.

Lines, J. (2018a). Initial project title and description. https:// 3yp.cmp.uea.ac.uk/

projects/ 851/ .

Lines, J. (2018b). An introduction to time series classification and the uea code reposi-

tory.

Lines, J. et al. (2016). Hive-cote: The hierarchical vote collective of transformation-

based ensembles for time series classification. 2016 IEEE 16th International Confer-

ence on Data Mining (ICDM).

Oracle Corporation (1995). Java programming language. https:// www .java.com/ en/ .

Reg: 100164855 53

https://3yp.cmp.uea.ac.uk/projects/851/

https://3yp.cmp.uea.ac.uk/projects/851/

https://www.java.com/en/

CMP-6013Y

Pal, M. (2005). Random forest classifier for remote sensing classification. International

Journal of Remote Sensing, 26(1):217–222.

Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support

vector machines.

Quinlan, J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann.

Rodriguez, J. et al. (2006). Rotation forest: A new classifier ensemble method. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 28:1619–1630.

Rossum, G. (1990). Python programming language. https:// www .python.org/ .

Schäfer, P. (2015). The boss is concerned with time series classification in the presence

of noise. Data Mining and Knowledge Discovery, 29(6):1505–1530.

Schapire, R. E. (2013). Explaining adaboost. Empirical Inference, pages 37–52.

Schjerve, I. et al. (2008). Both aerobic endurance and strength training programmes

improve cardiovascular health in obese adults. Clinical Science, 115(9):283–293.

The XGBoost Contributors (2014). Xgboost documentation. https:

// xgboost.readthedocs.io/ en/ latest/ . Accessed: 15/04/2019.

UC Irvine (1987). Uc irvine machine learning repository. https:// archive.ics.uci.edu/

ml/ datasets.php.

University of East Anglia (2019). Enable your research with high performance comput-

ing at uea. https:// rscs.uea.ac.uk/ new-high-performance-computing-cluster .

University of Waikato (1997). Waikato environment for knowledge analysis. https:

// www .cs.waikato.ac.nz/ ~ml/ weka/ .

Vieyra Software (2014). Physics toolbox sensor suite. https:// www .vieyrasoftware.net/

physics-toolbox-sensor-suite.

Reg: 100164855 54

https://www.python.org/

https://xgboost.readthedocs.io/en/latest/

https://xgboost.readthedocs.io/en/latest/

https://archive.ics.uci.edu/ml/datasets.php

https://archive.ics.uci.edu/ml/datasets.php

https://rscs.uea.ac.uk/new-high-performance-computing-cluster

https://www.cs.waikato.ac.nz/~ml/weka/

https://www.cs.waikato.ac.nz/~ml/weka/

https://www.vieyrasoftware.net/physics-toolbox-sensor-suite

https://www.vieyrasoftware.net/physics-toolbox-sensor-suite

CMP-6013Y

Witten, I. H. et al. (1999). Weka: Practical machine learning tools and techniques with

java implementations. https:// researchcommons.waikato.ac.nz/ bitstream/ handle/

10289/ 1040/ uow-cs-wp-1999-11.pdf .

Witten, I. H. et al. (2011). Data mining:practical machine learning tools and techniques.

https:// www .cs.waikato.ac.nz/ ~ml/ weka/ book .html .

Wu, X. et al. (2008). Top 10 algorithms in data mining. Knowledge and Information

Systems, 14:1–37.

Reg: 100164855 55

https://researchcommons.waikato.ac.nz/bitstream/handle/10289/1040/uow-cs-wp-1999-11.pdf

https://researchcommons.waikato.ac.nz/bitstream/handle/10289/1040/uow-cs-wp-1999-11.pdf

https://www.cs.waikato.ac.nz/~ml/weka/book.html

Documents

Using Time Series Classiﬁcation in the Gym with Smart