Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Registration number 100164855
2019
Using Time Series Classification in the Gymwith Smart Watches to Gamify and
Encourage Exercise
Supervised by Dr Jason Lines
University of East Anglia
Faculty of Science
School of Computing Sciences
Abstract
This papers assesses the performance of currently available classification algorithms
in the context of human activity recognition for common bodyweight exercises, such
as Rotation Forest and Dynamic Time Warping. It compares generic machine learning
algorithms to the more specific algorithms designed for use with time series data, such as
the data collected for this paper. It uses data collected from a three-axis accelerometer
in a modern smartphone for four different exercises; the pull-up, push-up, sit-up and
squat. It compares numerous variations of this data such as combining axis and/or
downsampling the data in order to see what effects they have on the performance of the
classification algorithms tested. This was all done with the aim of producing a model
that can correctly classify, as many exercises as possible, while maintaining the best
performance possible.
The results of this paper shows that the performance of the classification algorithms
and data variations used vary a huge amount, with accuracies ranging from ~28% up
to ~94%, depending on the combination of classification algorithm and data variation
used. This highlights the importance of taking the time to make sure that the optimal
techniques and data are being used, depending on the purpose and application for which
the results will be used.
Acknowledgements
I would like to thank my supervisor and lecturer, Dr. Jason Lines, for all the support,
advice and encouragement he has provided me throughout the duration of this project
for without him, this project would not have been possible.
I would also like to thank Dr. Pierre Chardaire for all the time he has invested into
the organisation of the third year projects module, providing all students with useful
information regarding the flow of the project and appropriate use of LATEX.
CMP-6013Y
Contents
1 Introduction 8
2 Background 9
2.1 Health and Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Human Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Gamifying Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Time Series Classification . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Design & Planning 13
3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Prototype Results 16
5 Implementation 17
5.1 Pseudocode & File Structure . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3 Data Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4.1 ZeroR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4.2 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4.3 Rotation Forest . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4.4 C4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4.5 Sequential Minimal Optimisation with SVM . . . . . . . . . . 23
5.4.6 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4.7 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4.8 Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . 24
Reg: 100164855 iii
CMP-6013Y
5.4.9 Time Series Forest . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4.10 BOSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4.11 HIVE-COTE . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6 Analysis of Results 25
6.1 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2 BOSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.3 C4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.4 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.4.1 1NN & 30% Warping . . . . . . . . . . . . . . . . . . . . . . . 27
6.4.2 1NN & 60% Warping . . . . . . . . . . . . . . . . . . . . . . . 28
6.4.3 3NN & 30% Warping . . . . . . . . . . . . . . . . . . . . . . . 28
6.4.4 3NN & 60% Warping . . . . . . . . . . . . . . . . . . . . . . . 28
6.5 Euclidean Distance 1NN . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.6 Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.7 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.8 Rotation Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.9 Sequential Minimal Optimisation with SVM . . . . . . . . . . . . . . . 31
6.10 Time Series Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.11 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.12 ZeroR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.13 EE, Flat-COTE & HIVE-COTE - NEEDS FINISH . . . . . . . . . . . 33
6.14 Results Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7 Project Evaluation 36
7.1 Problems Encountered . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.2 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8 Conclusion 39
9 Appendix 41
Reg: 100164855 iv
CMP-6013Y
References 52
Reg: 100164855 v
CMP-6013Y
List of Figures
1 ED v DTW when calculating the distance between two series . . . . . . 12
2 The file structure used in this project . . . . . . . . . . . . . . . . . . . 18
3 Example showing the ar f f files for the data variations . . . . . . . . . 21
4 Critical difference diagram using accuracy for the classifiers . . . . . . 35
5 Critical difference diagram using accuracy for the data-sets . . . . . . . 35
6 Prototype pseudocode showing: data being loaded, the classifier being
created and trained and then classifying the test data . . . . . . . . . . . 41
7 Pseudocode showing: looping through the classifiers, looping through
the data-sets, building the classifier, classifying test data, outputting results 42
8 Original project Gantt chart . . . . . . . . . . . . . . . . . . . . . . . 50
9 Revised project Gantt chart . . . . . . . . . . . . . . . . . . . . . . . . 51
Reg: 100164855 vi
CMP-6013Y
List of Tables
1 MoSCoW analysis for the project . . . . . . . . . . . . . . . . . . . . . 15
2 Overview of the number of instances for each class . . . . . . . . . . . 19
3 Results for the AdaBoost classifier . . . . . . . . . . . . . . . . . . . . 43
4 Results for the BOSS classifier . . . . . . . . . . . . . . . . . . . . . . 44
5 Results for the C4.5 classifier . . . . . . . . . . . . . . . . . . . . . . . 44
6 Results for the DTW1NN @ 30% classifier . . . . . . . . . . . . . . . 45
7 Results for the DTW3NN @ 30% classifier . . . . . . . . . . . . . . . 45
8 Results for the ED1NN classifier . . . . . . . . . . . . . . . . . . . . . 46
9 Results for the MLP classifier . . . . . . . . . . . . . . . . . . . . . . . 46
10 Results for the Random Forest classifier . . . . . . . . . . . . . . . . . 47
11 Results for the Rotation Forest classifier . . . . . . . . . . . . . . . . . 47
12 Results for the SMO classifier . . . . . . . . . . . . . . . . . . . . . . 48
13 Results for the TSF classifier . . . . . . . . . . . . . . . . . . . . . . . 48
14 Results for the XGBoost classifier . . . . . . . . . . . . . . . . . . . . 49
15 Results for the ZeroR classifier . . . . . . . . . . . . . . . . . . . . . . 49
Reg: 100164855 vii
CMP-6013Y
1 Introduction
This project will focus on the area of human activity recognition using time series
classification, specifically on finer detail ‘Gym’ movements such as push-ups, sit-ups
etc. The project will be conducted by recording accelerometer data from a cell phone
and/or smart watch and trying to classify the data as one of the aforementioned exer-
cises. Different variations of the recorded data will be used to not only find the best
results but to also try and use data that closely resembles how it would be in the real
world. When classifying the data, multiple different algorithms will be used as there
is no algorithm that best suites all data sets and therefore it must worked out which
algorithm gives the best results for the data sets collected.
The purpose of this project is to, if possible to accurately classify these exercises, use
this in order to try and encourage exercise in the real world through the use of gamifying
it. It is widely know that fitness is an important part of leading a healthy life in more
ways than one and encouraging exercise, not necessarily in the ways covered in this
paper, has the potential to positively impact a large number of peoples lives.
This projects goal can be seen to be quite large, and while it is, it can be split into
different sections as it includes multiple areas of interest such as the human activity
recognition problem, the time series classification (TSC) problem and the gamification
problem in order to be more clearly defined. The overarching aim of this project is to
critically assess whether finer detail exercises can be identified using accelerometer data
from a cell phone and/or smart watch. Some examples of these finer detail exercises that
could be included are, but not limited to, push-ups, sit-ups, pull-ups, bench press and
deadlift. The other aim of this project is to hopefully analyse the effect that gamification
has on exercise. These two aims can be further boiled down to smaller objectives. Ex-
amples of such objectives include assessing whether the exercises can be identified just
using the sensors in either the cell phone or the smart watch or are both required, further
assess whether the exercises can not only be identified but counted and compared, as-
sess whether the identification of the exercises is person dependent or independent and
to analyse how effective gamification is on encouraging exercise.
Reg: 100164855 8
CMP-6013Y
2 Background
2.1 Health and Fitness
A 2015 report found that 58% of women and 68% of men were either overweight or
obese, with the obesity of the population rising from 15% in 1993 to 27% in 2015 1.
The same report a year later found that in 2016 there were 617,000 hospital admissions
where obesity was a factor which is an increase of 18% on 2015 2. In 2008 an investiga-
tion was conducted into the effects of aerobic endurance training and strength training
on cardiovascular health. They split the subjects into three groups, each designated a
specific type of training, and put them on a twelve week program which consisted of
exercise three times per week. From this they found that ‘both aerobic exercise training
at either high or moderate intensities and high-intensity strength training improve en-
dothelial function and decrease the cardiovascular risk profile in obese adults. However,
high-intensity aerobic interval training results in a greater improvement in endothelial
function and a decrease in the cardiovascular risk profile’. For the moderate-intensity
and high-intensity aerobic training body weight decreased by 3% and 2% respectively
and body fat also decreased by 2.5% and 2.2% respectively. While the strength training
did not show any decrease in either body weight or body fat, it did show an increase of
10% for VO2 max as well as an increase in strength of 25% (Schjerve et al., 2008).
The modern world is becoming more and more work oriented with greater pressure
on people to devote more time towards their careers, this leaves people with less recre-
ational time and therefore less time concerned with physical well-being, as for the ma-
jority of people exercise is not what they want to spend their limited recreational time
doing. The motivation behind this project stems from the fact that exercise, as outlined
above, is an important contributor to physical and psychological well-being and comes
in a close second after diet. Regular exercise is well known to reduce the risk of many
chronic diseases, such as cardiovascular diseases, obesity, diabetes, etc.
1National Health Service (2017). Statistics on obesity, physical activity and diet.
2National Health Service (2018). Statistics on obesity, physical activity and diet.
Reg: 100164855 9
CMP-6013Y
2.2 Human Activity Recognition
Activity recognition is an ever-growing field, especially with expanding industry of
wearable tech which includes smart watches and even the release of so-called smart
glasses. Within this field however, the majority of research focuses on exercises that
would be considered whole body movements such as walking, running, cycling and
swimming. During 2007 a project was carried out on the recognition of free-weight
exercises such as the bench press, deadlift and overhead press. They used two different
classification models, Naive Bayes and Hidden Markov, which both resulted in around
a 90% accuracy of correctly identifying the types of exercise. In this project instead
of a smart watch being used a three-axis accelerometer was incorporated into a work-
out glove as well as an accelerometer on a user’s waist to track body posture (Chang
et al., 2007). Four years later a similar experiment was carried out, however only on
whole-body exercises, using accelerometers in cell phones on twenty-nine volunteers
with android-based cell phones and also found that most activities were recognised cor-
rectly 90% of the time (Kwapisz et al., 2010). This may indicate that the best results
can be achieved with a combination of data from sensors in smartphones and wearable
sensors.
2.3 Gamifying Exercise
In 2015 a study was done into the effects of gamification on exercise. Prior to the
study subjects were asked to ‘complete a questionnaire survey eliciting their exercise
habits in terms of the type of exercise and frequency they engaged in them, their atti-
tudes and their level of enjoyment of the exercises they performed’. They were then
introduced to the gamification service and asked to use it as part of their exercise rou-
tine for a month. After this they completed questionnaire survey that was similar to
the first one. From this investigation they found that their attitude toward exercise had
changed a improved significantly, while also their perception towards enjoying exercise
improved. The participants on average also improved their exercise habits by increasing
the amount of time spent exercising. However this was not the case for all participants as
‘there were also those who felt that the gamification features overemphasised the com-
Reg: 100164855 10
CMP-6013Y
petitive aspect. This caused them to feel demotivated by the scores they had achieved’.
This study is similar to the latter stages of this project and could be used as an indicator
of the type of results that may be achieved (Goh and Razikin, 2015).
2.4 Time Series Classification
A time series is a series of data points indexed in time order, usually these data points
are taken at equally spaced out intervals. Time series classification (TSC) is whereby a
time series pattern is assigned to a specific category. When using TSC we require a set
of n time series:
T = {T 1,T 2, ...,T n}
where each time series is made up of m real-valued, ordered observations:
T i =< t i,1, t i,2, ..., t i,m >
and a class label ci. Given T, the TSC problem is to find a function that maps from the
space of possible time series to the space of possible class values. For simplicity, in our
definition we assume that all series of T are the same length (Lines, 2015).
For this project we can use TSC to assign a time series pattern to a particular exercise
or movement, and then use this to recognise when it carried out again based on the time
series pattern. It is mentioned by Hills et al. (2014) that TSC problems have a specific
challenge which is how to measure the similarity between series. This means finding
the algorithm that matches the time series pattern to the correct category that has the
best performance and as TSC problems arise across a diverse range of domains, there is
no single approach that can be singled out as the best.
For a long time the general consensus was that the gold standard in TSC was dynamic
time warping (DTW), a nearest neighbour (NN) classifier. However, recent work has
demonstrated that many newer approaches significantly outperform this standard, with
the best being collection of transformation ensembles (COTE). NN classification works
by calculating the distance (or similarity) between two time series. Euclidean distance
(ED) and DTW are both examples of NN classification. The former is, simply put, the
Reg: 100164855 11
CMP-6013Y
point-wise distance between two series, whereas DTW allows for "warping" between
two series to facilitate small misalignments in the data (Lines, 2018b).
Figure 1: ED v DTW when calculating the distance between two series
Elastic ensemble (EE) is a combination of eleven NN classifiers, including ED and
full window DTW, that use whole series elastic distance measures in the time domain
and with first order derivatives. It is shown that this algorithm, using a voting scheme
that weights according to cross-validation training set accuracy, is significantly more
accurate than any single component (Bagnall et al., 2017).
In Bagnall et al. (2017) COTE is the only classifier they were aware of that explicitly
ensembles over different representations. We use the classifier structure called flat-
COTE. This involves pooling 35 classifiers into a single ensemble with votes weighted
by train set cross validation accuracy. There is however one difference: the Shapelet
Transform (ST) described in Bostrom and Bagnall (2015) is used rather than the version
in Hills et al. (2014).
A range of other classifiers are also covered in this paper, see Section 5.4 for the full
list.
Reg: 100164855 12
CMP-6013Y
3 Design & Planning
3.1 Design
The title of this project, Using Time Series Classification in the Gym with Smart
Watches to Gamify and Encourage Exercise, identifies three main parts which are time
series classification, human activity recognition in the gym and using gamification to
promote exercise. It states in the project description that it could be broken down into
five main steps which are as follows; ‘Formalising the problem, Interacting with the
software of a wearable device, Collecting data for experimentation, Implementation
and testing of algorithms to predict behaviour, Extensions, such as gamification, further
trials, unsupervised detection of exercise, etc’ (Lines, 2018a). I decided early on that I
did not want to restrict this project to just using a wearable device like a smart watch,
but also include cell phones, as not only did it limit the results we may see but the large
majority of people do not own or use a smart watch device on a regular basis which
therefore further limits the applications of this project in the real word.
3.1.1 Methodology
For this project I felt that following an agile methodology was most appropriate
mainly due to the time constraints placed upon the project. This sort of methodology
allows me to remain somewhat flexible on what needs to be done by what time, as the
project can always undergo minor changes to allow for this. For example if a stage were
to finish ahead or behind schedule then the work plan could be adjusted by finding a
way to allocate more or less time to other stages without it causing too many problems
meaning the project can continue.
3.1.2 Literature Review
Conducting a literature review allowed me to expand my knowledge in each of the
topic areas covered in this project, it also allowed me to identify gaps within the topic
areas as well as providing ideas that could be used to improve this project. From the
literature review I came to the conclusion that there seems to be a lack of knowledge
Reg: 100164855 13
CMP-6013Y
in the area of free-weight and bodyweight exercises as the majority of human activity
recognition studies, related to exercises, are focused on whole-body movements such as
running etc. This led me to the decision that this project should focus itself on the areas
where there is a lack of knowledge. By doing this, the project will be more original and
not just a copy of another and will be more likely to have real world applications.
3.1.3 Extension
The gamification extension of this project could carried out in a number of different
ways, the most obvious being a mobile app or web-based app that gets updated when
recognised exercise has been completed. The implementation of the app could also
be a number things, ranging from a video game to a competition with friends with
a prize pool to the winner. However, what they would have in common is that they
would feature some sort of in-app reward that based upon the exercise completed, the
reward could vary depending amount of exercise completed and/or more optimistically
the intensity of which the exercise was performed e.g. 10 push-ups in a shorter time
period would be worth more or harder exercises are also worth more.
3.1.4 Constraints
With any project there will always be constraints, things that limit a projects desired
outcome, and this project is no exception. One of the main constraints of this project is
time. This project has a specified deadline of when everything must be completed which
limits the amount of time that can be spent on each phase of the project as outlined in
the Gantt charts at the end of the Appendix. This could mean that for example in the
coding phase that a non-optimal algorithm is used as there is not enough time to develop
a more efficient version. To combat this, efficient time management is an absolute must
to ensure that no time is wasted as well as starting with existing algorithms in the code
base rather than producing one from scratch. Another constraint is cost, while not being
the biggest constraint on this project due to it being a student project, it is most definitely
still a factor as it could affect the amount or even the quality of the smart watches being
used which could in turn affect any results obtained. Short of using personal finances,
there is nothing that can really be done about this. A project specific constraint could be
Reg: 100164855 14
CMP-6013Y
the number of different exercises that I or any volunteer is able to perform depending on
the availability of equipment at their chosen facility. This could limit the results of the
project to smaller number of exercises which could in turn affect the gamification stage
as some exercises may not be rewarded properly due to them not being recognised.
Must Have:
• TSC of exercises using recorded data.
• Include two basic bodyweight exer-
cises.
• Evaluation of algorithms used.
Should Have:
• Include additional bodyweight exer-
cises.
• Use at least five different algorithms.
Could Have:
• Evaluation of data variations used.
• Include non-bodyweight exercises.
• Include a gamification component.
• Use data from both a cell phone and/or
a smart watch.
Won’t Have:
• Automatic data recording based on lo-
cation.
• Suggest exercise form improvements
using recorded data.
Table 1: MoSCoW analysis for the project
Reg: 100164855 15
CMP-6013Y
3.2 Planning
After designing the project, the next stage is planning. This is where work plan should
be set out of all parts in the project and how long each part should take, by doing this
you can give yourself more specific targets on what needs to accomplished and by when.
You could further this by comparing it to how the project would go in an ideal world,
which could be used as a measure of how successful the project was. An excellent way
to do all this is through the use of a Gantt chart. To see the projects original and revised
Gantt chart please see the end of the Appendix.
Comparing the two Gantt charts the first noticeable difference is more project spe-
cific headings instead of the previous generic headings allowing me to better follow
what part of the project currently needs work. The other changes are the addition of a
Gantt bar for any research and implementation of the optional project extensions and the
milestone signifying the end of the project as well as the length of the some Gantt bars
being shortened including the ‘Prototype Design & Implementation’ (previously called
design), ‘Introduce more exercises’, ‘Experimenting with data set variations’ and ‘Final
report writing’. I feel the revised Gantt chart gives a much clearer and more detailed
view of the project which will allow me, and has allowed me, to keep much better track
of the progress being made to try and ensure that things are completed on time.
4 Prototype Results
For the prototype, data was only collected for 2 different exercises which were push-
ups and sit-ups. For each exercise, a relatively small amount of data of approximately
40 instances was collected, which was equivalent to roughly 2 minutes. This data was
then split as evenly as possible between two groups, train and test, in which it was split
further so that each axis had its own ar f f file.
Within the prototype I decided to only use Euclidean distance (ED) and dynamic
time warping (DTW) nearest neighbour classifiers on the data. On the first run I used
ED classifier on each of the x, y, z axes and then also on t and I achieved accuracy
results of 75.00%, 67.50%, 95.00% and 85.00% respectively, for details on the axes
see Section 5.3. After this I moved on to using a DTW classifier, bear in mind that
Reg: 100164855 16
CMP-6013Y
0% warping is equivalent to ED. I first ran it with 10% warping and achieved 100%
accuracy for both the x and t axes while the y and z gave results of 92.50% and 97.50%
respectively. The results for DTW, at the tested warping allowances of 10% through
100% at 10% increments, are consistently better than its ED counterpart. The t data set
gives accuracy of 100% for all DTW test whereas the x axis only gives 100% accuracy
with 10% warping and otherwise provides values of 97.50%. The y axis hovers around
the 92.50% - 95.00% mark depending on the warping. z axis gives an accuracy of
97.50% on all DTW tests. See Appendix for pseudocode relating to the how the results
are obtained.
From the results outlined above we can see that DTW is the better choice out of the 2
test classifiers as easily outperforms ED at all levels of warping. It also appears that the
t data set is the best choice as it gives 100% across the board for the DTW tests.
5 Implementation
5.1 Pseudocode & File Structure
Figure 7 in the Appendix, is the pseudocode of the code that was ran in order to
obtain all the results necessary. The implementation of this was done in Java, Ora-
cle Corporation (1995), using the University of East Anglia’s time series classification
repository, Bagnall et al. (2016), which is built on top of the Weka framework (Univer-
sity of Waikato, 1997).
Figure 2, below, shows the file structure that was used in this project. This is relevant
to the implementation of the pseudocode as it allowed for efficient looping through
directories instead of having to load all the data into a data structure at the start of each
run. The data collected from the smartphone app was given an appropriate name and
folder where it was stored. The Python script, see Section 5.3, outputted its ar f f files
into the ‘Formatted Data’ folder where they were checked, given a suitable name, see
Figure 3, and moved to the correct subfolder in ‘Usable Data’ ready for use.
Reg: 100164855 17
CMP-6013Y
Figure 2: The file structure used in this project
5.2 Data Collection
As with the prototype, the same smartphone was used and all the data collected is my
own. The app used to record the data records values for each of the x, y and z axes in
the accelerometer separately and has an additional value t for total acceleration. If the
phone is lying face up on a flat surface in front of you with the camera end of the phone
Reg: 100164855 18
CMP-6013Y
away from you then the x axis typically refers to sideways motion (left is negative, right
is positive), the y axis typically refers to forward or backward motion (away from you
is positive, towards is negative) and the z axis typically refers to upwards or downwards
motion (upwards is positive, downwards is negative). This app also allows you to change
the collection rate which it uses to record the data between two preset options which are
‘Slowest’ (~15Hz) and ‘Fastest’ (~200Hz) which correspond to the slowest/fastest that
the sensors are able to collect data on that specific device, so the values above are for
my device, being a OnePlus 3. The app also offers another option where you are able
to set the collection rate to a custom value between the ‘Slowest’ and ‘Fastest’ presets,
however, this option is only available in the premium version of the app. When the data
was recorded, it was all done so using the ‘Fastest’ preset in order to gather as much
data as possible.
I decided on having 4 different exercises, the push-up, the sit-up, the pull-up and
the squat, which are the 4 main bodyweight exercises as they are all exercises that the
majority of people can do which again doesn’t limit this project to a smaller than need
be subset of the population. The process of collecting the data was simple; the data
recording was started and device placed in an available pocket, the chosen movement
was then carried out for as many repetitions as possible in one go, the data recording
was stopped. This was repeated for each exercise until approximately 6 minutes worth
of data had been recorded which equated to approximately 140 instances for each class,
as opposed to the prototype which used 2 minutes. A breakdown of this can be seen in
Table 2 below.
Class No. Train Instances No. Test Instances Instance Distributionpushup 63 59 0.5164, 0.4836
situp 76 66 0.5352, 0.4648
squats 84 76 0.5250, 0.4750
pullup 69 64 0.5188, 0.4812
Total 292 265 0.5242, 0.4758
Table 2: Overview of the number of instances for each class
Reg: 100164855 19
CMP-6013Y
5.3 Data Variations
After the data had been collected it had to be split up and formatted correctly in order
to allow for the most experiments to be run, with the hope of achieving better results.
Unlike in the prototype, where this was done by hand as there was relatively little data
in comparison, it was carried out by a custom script written in Python (Rossum, 1990).
This Python script took the data, which had been exported from the app in CSV format,
and created the ar f f files which can be understood by Weka. There were a number of
different ar f f files produced this way. Each of the x, y, z and t axes had their own ar f f
file, there was also an ar f f file for each axis when data had been downsampled by a
factor 2 and by a factor of 4 as well as an ar f f file for the concatenated data.
There were only 4 ar f f files for the standard data, one for each axis. The standard
data was recorded at 200Hz and split in such a way that instance each was given 500
attributes, equivalent to 2.5 seconds, which I chose as it similar to how long a single
repetition may take to complete. The instances for each exercises were added in their
groups, however, the order is irrelevant as each instance is treated separately.
The ar f f file for the combined data was formatted such that all the x, y, z, and t
values were concatenated together for each instance, giving 2000 attributes.
The data was also downsampled from 200Hz by two different factors, 2 and 4. This
meant that the data downsampled by a factor of 2, was equivalent to 100Hz and had 250
attributes per instance and when downsampled by a factor of 4, was equivalent to 50Hz
with 125 attributes per instance. The concatenated data was also downsampled using
the same factors giving, giving an ar f f file with each 1000 and 500 attributes. Also, 2
different methods for the downsampling were used which were decimation and average
decimation. In the former, if you are downsampling by a factor of 4 you simply take
every 4th value and discard the rest. In the latter, again downsampling by a factor of 4,
you instead take the mean of each 4 values as the new values.
In Figure 3 below the names take the format: ‘axis_collectionrate_{exercises}_usage’
where ‘Skip’ or ‘Mean’ is added to collection rate depending on what type of downsam-
pling, if any, is used.
Reg: 100164855 20
CMP-6013Y
Figure 3: Example showing the ar f f files for the data variations
5.4 Classifiers
This section gives an informal overview into each of the classification algorithms that
I chose to use. I chose these classifiers as they are a range of different types of classifiers
including generic ones and time series specific ones. I did this with the aim of providing
a better overview on the best classifier and type of classifier for the type of data gathered
for this project. All classifiers, other than Dynamic Time Warping, were used ‘as is’,
with no optional configuration/optimisation taking place.
The algorithms: ED, Dynamic Time Warping, Elastic Ensemble and Flat-COTE were
also tested, however, they have already been given an overview in Section 2.4 and there-
fore I saw no reason to include them again here.
5.4.1 ZeroR
ZeroR is one of the most primitive classifiers available. It works by predicting the
majority class from the training data but it is still sometimes used as baseline classifier
(Witten et al., 1999). For data with categorical class values it uses the mode and for
numeric class values it uses the mean. In Krishna et al. (2013), over an average of 3
data sets relating to cancer classification, it achieved an average accuracy of 60%.
Reg: 100164855 21
CMP-6013Y
5.4.2 Random Forest
Random Forest is an ensemble of decision trees, generally trained using the bagging
method (Breiman, 2001). It adds randomness to these decision trees by using a random
subset of the available attributes and/or training data for each decision tree and not all
of them, thereby increasing the diversity of the model. Random Forest is currently one
of the most popular classifiers, and often used as a benchmark, due to it performing
well for any given data set as well as it being inexpensive train and quick to run. Two
examples of this are in Ghimire et al. (2012) where it achieved an accuracy 92% in land-
cover classification and in Pal (2005) where it achieved an average accuracy of 88.37%
for remote sensing classification.
5.4.3 Rotation Forest
Rotation Forest is a method for generating classifier ensembles based on feature ex-
traction. The training data for an individual classifier is created by randomly splitting
the attribute set into k subsets. k axis rotations take place to form the new attributes for
an individual classifier, with the idea of using rotations to encourage individual classi-
fier accuracy and diversity in the ensemble. Rotation Forest has been compared against
bagging, AdaBoost, and Random Forest using 33 random data sets from the UCI repos-
itory, UC Irvine (1987), and found that ‘Rotation Forest outperformed all three methods
by a large margin’ (Rodriguez et al., 2006).
5.4.4 C4.5
C4.5 is a classifier that generates a decision tree which uses the theory of informa-
tion gain and entropy (Quinlan, 1993). In its simplest form it says that the amount
of information gained is inversely proportional to the probability of an event happen-
ing. Therefore the attribute with the highest information gain will be split first, and is
continued recursively until all data is classified. It became very popular in 2008 after
being ranked number 1 in ‘Top 10 Algorithms in Data Mining’, Wu et al. (2008), and 3
years later the authors of Weka described it as a landmark decision tree program that is
probably the most widely used in practice to date (Witten et al., 2011).
Reg: 100164855 22
CMP-6013Y
5.4.5 Sequential Minimal Optimisation with SVM
SMO is a support vector machine (SVM) in which the training problem is solved
using John Platt’s sequential minimal optimisation (SMO) algorithm (Platt, 1998). A
support vector machine is an algorithm that tries to find a hyperplane (or line) that
separates the data into its classes (Awad and Khanna, 2015). While support vector
machines were originally designed to support only two-class problems there are ways
of extending them to multiclass problems, like the one presented in this paper. The two
most common methods of doing this are ‘one-against-one’ and ‘one-against-all’ (Hsu
et al., 2002). In the former a binary classifier is trained for each pair of classes and the
outputs are combined whereas in the latter for k classes, k binary classifiers are trained
and each determines whether the test instance is the same class as itself or one of the
other classes and the classifier with the largest output is taken as the actual class of the
test case.
5.4.6 XGBoost
XGBoost is an open-source implementation of gradient boosting that use classifi-
cation and regression trees (CART) as base models designed to be ‘highly efficient,
flexible and portable’ (The XGBoost Contributors, 2014). It implements an additional
regularisation term which penalises the complexity of the model and helps to reduce
over-fitting (Chen and Guestrin, 2016). Using data provided by Microsoft for a mal-
ware classification challenge hosted on Kaggle, an accuracy of 99.77% was achieved on
a combination of all categories while employing bagging and parameter optimisation
(Ahmadi et al., 2016).
5.4.7 AdaBoost
AdaBoost, or Adaptive Boosting, is a boosting algorithm where the output of the
weak classifiers is combined into a weighted sum that represents the final output of
the classifier (Schapire, 2013). It works by initialising the weights equally and getting
predictions for the training data on this. It then gets an overall weight for the classifier
and modifies the instance weight based on if it was correctly classified or not before
Reg: 100164855 23
CMP-6013Y
normalising so that the weights add up to one.
5.4.8 Multilayer Perceptron
Multilayer Perceptron (MLP) is a deep, artificial neural network that uses one or more
hidden layers and a nonlinear activation function on its nodes. The learning occurs
by changing connection weights, using backpropagation, after each piece of data is
processed in order to minimise error. They generally take longer to train than other
types of classifier because of this (Gardner and Dorling, 1998).
5.4.9 Time Series Forest
Time Series Forest (TSF) is a tree based ensemble method that was designed for time
series classification. It uses a combination of entropy gain and distance to evaluate the
possible splits, and randomly samples attributes at each node in the tree. The overall
prediction of a given instance is done using majority voting of all trees in the ensemble
(Deng et al., 2013).
5.4.10 BOSS
Bag of SFA symbols (BOSS) is a dictionary based ensemble classifier. It works by
extracting patterns from a time series and carries out ‘low pass filtering and quantisation’
to reduce the noise in these patterns. The classification is then done by comparing the
new noise-reduced patterns and can achieve up to a 10% higher accuracy than its rivals
(e.g. Bag of patterns (BOP), Lin et al. (2012)) as well as run up to 13-times as fast
(Schäfer, 2015).
5.4.11 HIVE-COTE
Hierarchical Vote Collective of Transformation-Based Ensembles (HIVE-COTE) is,
put simply, an improved version of Flat-COTE. It has been improved by defining a new
hierarchical probabilistic voting structure, defining a new spectral ensemble classifier,
and the addition of a dictionary-based & interval-based classifier. This new classifier
is significantly more accurate than Flat-COTE, achieving a significantly better average
Reg: 100164855 24
CMP-6013Y
rank of 1.6353 compared to 2.8588 respectively, when tested against 85 UCR data-sets
(Lines et al., 2016).
6 Analysis of Results
This section of the paper covers the results that were obtained after running all exper-
iments described in Section 5.3 and 5.4. The results will first be described by classifier
with a table showing the results for common metrics, rounded to 4 decimal places, for
the data-sets followed by a summary of all the classifiers and data-sets in comparison
to each other. The table for Section 6.1 will show the results for all data sets, however,
the subsequent tables will show slightly less as I feel it is redundant to show so much
information each time.
The table will contain the accuracy as should always be the first place to start when
evaluating the results of a classifier, as well as the balanced accuracy as there is no harm
in including it as it tends to be more interpretable, however, it tends to be more use-
ful for problems with a large number of classes or a class imbalance. Sensitivity and
specificity, also known as the true positive rate and true negative rate respectively, are
normally used for binary classification problems but can be applied to multiclass prob-
lems, like this one, through the use of averaging. The precision, F1 and the Area Under
the Receiver Operating Characteristic (AUROC) curve are also included as further ways
to any differences between the data-sets, however these are again preferred for binary
classification problems.
6.1 AdaBoost
Looking at Table 3 in the Appendix the first thing that becomes apparent is that Ad-
aBoost performs poorly across the board, with the accuracy ranging from 30% up to
47% . The t and concat axes perform significantly better on accuracy than the other
3 axes with all data-sets having an accuracy in the mid 40s compared to the low 30s
respectively. The data downsampled to 50Hz appears to perform better on average than
then the 100Hz data which was similar to the original 200Hz data. The ‘Mean’ data vs
the ‘Skip’ data is seen to have performed similarly, possibly with ‘Mean’ edging ahead.
Reg: 100164855 25
CMP-6013Y
The balanced accuracy tells the same story but with accuracy values ranging from 28%
up to 43%.
The sensitivity for the t and concat axes are once again significantly higher than the
rest, with a few exceptions this time being ‘x_100HzSkip’ etc. Also, in the t and concat
axes, the trend that 50Hz is better than 100Hz remains true, while for the others axes
it varies. The precision, like the accuracy, is low across the board with t and concat
axes instead coming in the middle of the pack. The specificity has a wide range of
values, from 12% up to 72%, with an almost random looking distribution if not for
worse values for the first 2 axes. The F1 scores are also low throughout, with the higher
values appearing for the first 2 axes. The AUROC values are relatively stable, hovering
around the mid 50s to mid 60s mark, and once again t and concat come in first place.
6.2 BOSS
The results for the BOSS classifier, Table 4 in the Appendix, has had all data-sets
from y and z axes apart from 1 removed as they all performed incredibly similarly. The
table has a wide range accuracy values, from 59% up to 94% depending on the data-
set. The concat axis performs the best with all accuracies being 90%+ followed by
the y and z axes, which had very similar results for all data-sets (86% - 88%). The t
axis performed the worst with accuracies averaged in the mid 60s. For all data-sets the
balanced accuracy remains incredibly close to the normal accuracy meaning all classes
were predicted with similar success. Both 50Hz and 100Hz performed similarly on all
data-sets bar concat and t where 100Hz was better; 200Hz on the other hand varies
between the data-sets from worst in concat to best in t.
The sensitivity follow the same trend as the accuracy, however, for ‘x_100HzMean’
and below it is lower than the accuracy. The precision appears high over all data-sets, in-
cluding t which had the worst accuracy. The specificity comes in very high throughout,
with all values being above 90% apart from ‘x_100HzSkip’ at a high 89%, and having
no discernible pattern. Like with precision, the AUROC values are all quite high, but
with all values being above 81% instead of 89%, and the clear loser being the t axis.
Reg: 100164855 26
CMP-6013Y
6.3 C4.5
Table 5 in the Appendix, for the C4.5 classifier, has had 1 data-set from each axes
removed apart from z which has 1 remaining. We can see that the concat and y axes
perform the best in both accuracy and balanced accuracy, while the t and x axes trail
by approximately 10% around the 50% mark . It appears as though 50Hz performs
significantly better, on average, than both 100Hz and 200Hz while it appears that ‘Mean’
tends to perform slightly better than ‘Skip’.
The sensitivity displays a large range of values, from 30% for ‘t_200Hz’ to 75%
for ‘y_50HzSkip’, and is usually approximately 10-15% lower than the accuracy. The
precision also has varying results between axes, with concat and y appearing the best,
however, even within each axis there is a fair amount of variation e.g. the y axis ranges
from 48-64%. Over all the data-sets the specificity tends to be higher than sensitivity by
about 25%, and with no obvious pattern. The AUROC values all appear to group around
the mid 60s to mid 70s area, with concat and y looking the best, but not by a significant
amount.
6.4 Dynamic Time Warping
6.4.1 1NN & 30% Warping
The results for DTW1NN @ 30%, shown in Table 6 in the Appendix, has had 7
data-sets removed, 1 from each axes, apart from t and z which had 2 removed. We can
see immediately that concat performs the best, according to the accuracy and balanced
accuracy, followed by the y axis with accuracies of about 88% and 85% respectively.
Both t and x perform very similarly using these metrics, at around 70%. The difference
between 50Hz and 100Hz seems to vary between axes, with some showing that 50Hz
is better, and others showing 100Hz as better, and one that displays no real difference.
The difference between ‘Mean’ and ‘Skip’ also shows the same trend.
The sensitivity shows itself as being very high for the concat, y and z axes, with
concat even achieving 100% on 2 different data-sets. The remaining 4 metrics display a
similar trend to that of accuracy, with concat performing the best and with y being next
etc. However, the trend is less obvious for the AUROC values as, while the t and x axes
Reg: 100164855 27
CMP-6013Y
are lower than the other axes, they are not lower to the same extent.
6.4.2 1NN & 60% Warping
The difference in the results for this variation compared to the last are very small
for all of the data-sets and metrics, with the averages changes being of about 1%, and
therefore I see no reason to include a table for it. The most notable change was to the
accuracy of the ‘y_200Hz’ data-set which increased by around 3.5%.
6.4.3 3NN & 30% Warping
Table 7 in the Appendix for the DTW1NN @ 30% classifier has had the 200Hz data-
set for each axis removed as they each performed almost identically to its corresponding
‘100HzSkip‘. The table shows us that the concat and y axes perform equally well with
accuracy and balanced accuracy values of around 86% which is significantly different to
the next best axis, which is z with an average accuracy of approximately 79%. The table
also shows that, for the t axis 50Hz performs better than 100Hz, the opposite is true
for the z axis, and there is no significant for the other 3 axes. There is also no obvious
difference between the results of ‘Mean’ and ‘Skip’ that works for all data-sets, as the
best performing one varies between the data-sets and what level of downsampling was
used.
Like with the results in Section 6.4.1 the sensitivity shows itself as being very high
for the concat, y and z axe, with concat edging ahead, at around 98%, and getting 100%
on 2 data-sets again, while y and z are identical for almost every data-set. Both the
precision and the specificity display the same trend of concat performing equally as
well as z, followed closely by y and both t and x performing significantly worse than the
rest. The F1 and AUROCs trend differs from the previous trend in that it shows concat
and y performing equally well, followed closely by z.
6.4.4 3NN & 60% Warping
As with the 1NN variation of DTW, the difference in the results for this variation
compared to the last are very small for all of the data-sets and metrics, with the aver-
Reg: 100164855 28
CMP-6013Y
ages changes again being of about 1%, so I decided to leave the table out again. The
most notable change this time was to the accuracy of the t_100HzMean’ data-set which
increased by around 4%.
6.5 Euclidean Distance 1NN
Table 8 in the Appendix, for ED1NN has had the 200Hz data-set for each axis re-
moved as they each performed almost identically to its corresponding ‘50HzSkip‘, as
well as the ‘concat_100HzSkip’ which performed similarly to ‘concat_100HzMean’.
We can see from the table the concat axis performed significantly better than the other
axes, with an average accuracy and balanced accuracy of approximately 80%, compared
to the next best which was the z axis with 67%. The x axis performed the worse by a
significant margin with accuracies around 40%. It appears as if there is no significant
difference between the 50Hz and 100Hz results for any of the axes, however, there does
appear to be a slight performance increase for ‘Mean’ in comparison to ‘Skip’ for all of
the axes.
For the most part the sensitivity results seemed to follow a pattern in which it would
remain somewhat close to its corresponding accuracy value, anywhere from a drop of
5% to an increase f 10%. The exception to this however are the x and y axes which
show a significant drop in the value compared to their accuracy, with the y axis dropping
around 15% and the x axis dropping around 30%. The precision stays consistently high
through all data-sets, apart from the t axis, with its lowest value being 83%, and 2
data-sets managing to score 100%. However, the data-sets where 100% precision was
achieved, there was an extremely low sensitivity value, of approximately 7.5%, meaning
there was a low false positive rate, but a high false negative rate. The specificity was
also high across all data-sets, which supports a high precision. The F1 and AUROC
values don’t really contain any information of note as they just follow the same trend as
the accuracy.
Reg: 100164855 29
CMP-6013Y
6.6 Multilayer Perceptron
Table 9 in the Appendix, for the MLP classifier, has had 4 data-sets removed; ‘con-
cat_50HzSkip’, ‘t_100HzSkip’, ‘t_200Hz’ and ‘x_100HzMean’ as each of them per-
forms similar to another data-set in the same axis as it. We can see that the accuracy and
balanced accuracy clearly shows that once again the concat axis performs the best by
a significant difference, with the y axis being the next best with average accuracies of
around 84% and 71% respectively. The x axis performed the worst with an average of
56%. The results do not show any significant differences between the 50Hz and 100Hz
data-sets or between the ‘Mean’ and ‘Skip’ data-sets.
The sensitivity, like the previous classifier, had a high range of values that largely
followed the same trend as the accuracy with the exception of the y axis. The y axis had
significantly lower accuracies than concat, however, achieved very similar sensitivity
scores. The AUROC values were generally high for all the data-sets, only really just
dropping below 80% for the x axis which is unsurprising as it also achieved the lowest
accuracies. The precision, specificity and F1 all show nothing nothing out of the ordi-
nary other than that the z axis achieved higher precision values than the y axis, which is
where it is regaining the ground it lost to y in sensitivity.
6.7 Random Forest
Table 10 in the Appendix, for the Random Forest classifier, has only had 2 data-sets
removed which are ‘concat_200Hz’ and ‘x_200Hz’ as they both achieved very similar
results to that of their corresponding ‘50HzSkip’ data-set. The accuracy and balanced
accuracy seems to be more closely grouped over all data-sets than in previous classifiers,
the range of values only going from 53% up to 76%. However, even given this we can
still see that concat and y are the best performing axes, with accuracies around 70%,
and that x is the worst performing axis at around 56%. Also, there doesn’t seem to be
any significant difference between the 50Hz and 100Hz data-sets, except for the y axis
where 50Hz performs better by about 6%. The difference between ‘Mean’ and ‘Skip’ is
small, with the majority of data-sets showing a slight improvement for ‘Mean’, usually
around 2%.
Reg: 100164855 30
CMP-6013Y
The sensitivity for each of the data-sets seems to remain relatively true to the same
trend as accuracy, usually only deviating by about 5$ each time. The specificity, like
the accuracy, is just as closely grouped, ranging from 65% to 88%, however, it instead
shows the z axis as performing the best, with concat and y performing similarly. The
precision, F1 and AUROC all show the same trends as the accuracy, with neither any
extreme high values nor extreme low values.
6.8 Rotation Forest
Table 11 in the Appendix, for the Rotation Forest classifier, has had 4 data-sets re-
moved; the 200Hz data-set from concat and x as they are similar to its ‘100HzSkip’
counterpart, ‘50HzMean’ from y which is similar to ‘100HzMean’ and ‘100HzSkip’
from y which is similar to ‘100HzMean’. The values for accuracies and balanced accu-
racies, unlike Random Forest, are not as closely grouped with values ranging from 56%
to 84%. It seems to show the concat and y axes as the best, performing equally well
with accuracies around 81%, and the x axis once again performs the worst at around
59%. From this table, it also appears there is no significant difference between 50Hz
and 100Hz data-sets, with the better one varying between axes. However, it does seem
to show that ‘Mean’ performs better than ‘Skip’ for most of the data-sets.
The sensitivity shows instead shows that only the y axis is the best performing axis
for this classifier, followed by z and then by concat. The AUROC values were gen-
erally high for all the data-sets, with the lowest value being just over 80% for the
‘x_50HzMean’ data-set. The precision, specificity and F1 all show the same trend as
accuracy does, just with a little bit more variation in the values for data-sets in the same
axis.
6.9 Sequential Minimal Optimisation with SVM
Table 12 in the Appendix, for the SVM classifier, has had 6 data-sets removed; 3
from concat which were similar to both remaining data-sets, the 200Hz data-set from x
which was similar to ‘x_100Hzmean’, the 200Hz data-set from z which was similar to
‘z_50Hzmean’ as well as ‘t_100HzSkip’ which was similar to its ‘Mean’ counterpart.
Reg: 100164855 31
CMP-6013Y
The accuracy and balanced accuracy values of this classifier has a large range of values
over all the data-sets, from the lowest of 44% (x_100HzMean) up to the highest of 83%
(concat_100HzSkip). There does not seem to be any significant difference between the
50Hz and 100Hz data-sets as well no obvious pattern that either ‘Mean’ or ‘Skip’ is the
superior variation.
The sensitivity appears to follow the the same pattern as the accuracy and balanced
accuracy but with more extreme values, with the exception of the y axis. According
the sensitivity, the y axis performs the best with values around the 90% mark, which is
better than the concat axis by approximately 3%. The precision, F1 and AUROC all
show the same pattern as the accuracy, with the performance values being equally as
closely grouped but just larger (for AUROC). The specificity on the other hand, does
not tell the same story. Using these values the z axis performs equally as well concat
even though it has an average accuracy that is 33% lower, while the t axis performs the
worst as opposed to being in the middle of the pack for the accuracy.
6.10 Time Series Forest
Table 13 in the Appendix for Time Series Forest has had 6 data-sets remove; ‘t_200hz’
which was similar to ‘t_100HzMean’, 200Hz and 50HzMean data-sets for x which were
similar to 100HzSkip and 50HzMean, 200Hz, 100HzSkip from y which were similar to
100HzMean. The accuracies and balanced accuracies place the y axis as the best per-
former by a significant margin at an average of 92%, followed by the z axis at around
84%, while the t axis comes in last place with accuracy values around 70%. The down-
sampled data-sets, 50Hz and 100Hz, do not show any significant differences that are
consistent across all data-sets, with 50Hz appearing slightly better for x and z but the
same of the other data-sets.
The remaining 5 metrics in this table all follow the same trend as the accuracy, not
really providing any extra information, with the AUROC values being very closely
grouped for all data-sets, with its lowest value being just above 90%.
Reg: 100164855 32
CMP-6013Y
6.11 XGBoost
Table 14 in the Appendix showing the results for the XGBoost classifier, has had 5
data-sets removed; 100HzSkip from the concat axis which was similar to 50HzMean,
100HzSkip from x which was similar to 100HzMean, 200Hz from y which was similar
to 100HzMean, and both the 50Hz ‘Mean’ and ‘Skip’ data-sets from z which were
similar to the 200Hz and 100HzMean data-sets respectively. The table clearly shows
that the concat axis performs the best for this classifier by a significant margin, with the
next best being the y axis with an average accuracy, and balanced accuracy, of around
7% less. The ‘Mean’ data-sets appear to have slightly outperformed ‘Skip’ as well as
the 100Hz outperforming the 50Hz for the majority of the data-sets, however, it does
not look like the difference between them is significant.
As with the results for the previous classifier, the remaining 5 metrics do not reveal
any additional information on top of the accuracy and balanced accuracy, with the only
difference being that the AUROC values are less tightly grouped and drop as low as
84%.
6.12 ZeroR
As we can see from the Table, 15 in the Appendix, the results for the ZeroR classifier
has been condensed into a single row. This is because of the way it works, see Section
5.4.1, and the data-sets that were used in this project it will always achieve exactly the
same results.
The data-sets used, while slightly varied, all contained the same number of classes
and therefore the classifier will predict the same class. We can see in Table2 that the
most common class is ‘squat’ with 76 instances out of 265, which is equivalent to the
accuracy shown in the table of ~28.68%.
6.13 EE, Flat-COTE & HIVE-COTE - NEEDS FINISH
Although I had planned to also test these classifier, they were taking too long to
warrant gathering the results. This was an error on my part as there were measures I
could have taken to reduce the runtime of each these, see Section 7.1. The runtime
Reg: 100164855 33
CMP-6013Y
complexity of Elastic Ensemble is O(n2m2), and the runtime complexity of both Flat-
COTE and HIVE-COTE is O(n2m4) compared to the O(nm(n-w)) complexity of the
BOSS classifier, which took the longest out of the classifiers completed; where m is the
length of the series, n is the number of series and w is the number of subseries.
It is unfortunate that I was not able to collect any results for these classifiers as they
would likely have performed very well, giving some good results, as they have demon-
strated to be capable, albeit time consuming classifiers, with other time series data-sets
(Bagnall et al. (2017) and Lines et al. (2016)).
6.14 Results Summary
For the purposes of this project I decided on using ED1NN as the base classifier, this is
because it is the simplest form of classifier that is designed for time series classification.
I also decided not to include critical difference diagrams using balanced accuracy in
this paper, as they were incredibly similar to the ones shown above. This is because the
balanced accuracy achieved on all the data-sets and all the classifiers was very similar
to the standard accuracy, which in turn is likely because there is not a very large number
of classes, 4, or a significant class imbalance, with the largest class accounting for 28%
of the data.
We can see in Figure 4 that the BOSS classifier is rated the best, showing it to have
an average rank of 2.84, and that the ZeroR classifier is rated the worst, being ranked
15th for all data-sets. However, while BOSS is the highest rated, the diagrams does
not show a significant difference between BOSS or the 8th best classifier, XGBoost
with an average rank of 6.66. Six out of the top eight classifiers are classifiers that
have been designed for time series classification (TSC) problems, the exceptions being
XGBoost and Rotation Forest, and only one out of the bottom eight classifiers being a
TSC classifier (ED1NN). This strongly suggests the general rule that classifiers designed
with TSC problems in mind significantly outperform more ‘general use’ classifiers.
Reg: 100164855 34
CMP-6013Y
Figure 4: Critical difference diagram using accuracy for the classifiers
Figure 5: Critical difference diagram using accuracy for the data-sets
Using Figure 5, we first see that there is a huge amount of overlap. between the
Reg: 100164855 35
CMP-6013Y
majority of the classifiers, from the large cliques formed. We also see that the order
of the data-set rankings leave the data-sets grouped by their axis, with concat being
the best option and x being the worst. For each of the axis it also appears that the
downsampled data performs better, on average, than the standard 200Hz data. However,
there does not seem to be any consistent pattern that shows either 50Hz or 100Hz to be
better over all the axes, although there may be a very slight performance increase for the
data downsampled using the average decimation method, or ‘Mean’, in comparison to
normal decimation, or ‘Skip’. Figure 5 shows the best data-set to be ‘concat_50Hz’ with
an average rank of 4.3667, however, it also shows that there is no significant difference
between it and all the other data-sets from the concat, y, and z axes. It shows that the t
and x axes perform significantly worse than the top two axis, concat and y.
The data-sets and classifier that achieved the highest accuracy was ‘concat_100hzmean’
combined with the BOSS classifier, with ~94.7%, while if you combine the best classi-
fier and best data-set according to the figures you get an accuracy of ~91.7%. The crit-
ical difference diagrams are very good at conveying averages, however, if this project
was used in the real world you would likely only want the classifier and data-set that
provides the actual best performance. In the real world, the time performance of the
each of the classifiers would need to be taken into account, depending on the applica-
tion it may not be feasible to use a specific classifier if it takes too long. For example,
when carrying out these experiments there were three classifiers where results were not
obtained due to the runtime of them as well the BOSS classifier, while giving better
results, took significantly longer to run compared to the next best classifier, DTW1NN
@ 30%, which ran approximately 15 times faster.
7 Project Evaluation
7.1 Problems Encountered
The first problem I encountered came as part of the literature review. This project was
relatively broad in terms of its different components so there is a large amount of litera-
ture available and while this is true and a good thing in the most part, it became difficult
Reg: 100164855 36
CMP-6013Y
to find literature that was more than just related but actually similar to this project. It
was mentioned previously that human activity recognition and exercise are both large
fields and a lot of studies have been done in these areas however the majority of them
were done on whole body movements such as running and the like. This did allow me
to find a gap in current knowledge, it also means that there was little information for
me to use as a guide and/or benchmark e.g. what my results should look like or the
algorithm with the highest accuracy. There was little that could be done to overcome
this problem other than making sure that research is carried out thoroughly by using ap-
propriate websites such as sciencedirect.com and link.springer.com in conjunction with
using optimal search terms for the topic.
When I went to begin collecting the data which would be used to train the classifiers
an then to test the performance of each classifiers, I ran into another problem. As I
had to decided to focus only on data recorded using a smartphone in order not limit the
applications of the project, it forced a decision to be made regarding how the data would
be recorded and harvested from the cell phone. The most sensible option to accomplish
this was to use an existing app that could access sensor data. After doing some research
and testing various apps that are able to do this it was clear that the best option was an
app called ‘Physics Toolbox Sensor Suite’, Vieyra Software (2014), which gave me the
ability to record and export this data in the form of a comma-separated values (CSV)
file and choose between different collection rates, which is talked about in more detail
in Section 5.2.
Another problem I encountered, that occurred directly after I finished record the data
for the prototype, was formatting the data. For TSC to be carried out using the data
it has to be correctly formatted into a attribute-relation file format (ARFF). This file
format requires that every instance in the data set has the same number of attributes as
each other. In the case of the data I was collecting for the prototype it means, each
instance was single exercise (either a push-up or a sit-up) and each instance has to have
the same number of attributes of which each is a single value in the data. For example
2.5 seconds of data at a collection rate of 200Hz means the instance has 500 attributes.
Each instance needs to be like this which can be achieved through the trimming the
data to the same length, which introduces the question of when to trim the data or what
Reg: 100164855 37
CMP-6013Y
number of attributes provides the best results. For the prototype I trimmed the data to
151 attributes plus the class attribute which was made easier by following a set of rules
when recording the data. The problem I found when formatting this data was the sheer
amount of time it took, making it a completely impractical method when working with
large amounts of data or data that needs to processed automatically. After the prototype
I decided that the best solution to this was to create a bespoke Python script that takes the
data from the CSV files and outputs all the ar f f files necessary, already in the correct
format, see Section 5.3 for more detail.
A problem that only made itself known to me after the prototype was the runtime of
the some algorithms was much longer than expected, which was definitely not helped
by running them on my personal machine. This led me to the decision that certain clas-
sifiers, while they may yield good results, were too time consuming to warrant running,
as talked about in Section 6.13. A possible, and obvious, solution in hindsight would
have been to make use of the UEA high performance computing cluster instead of my
personal machine (University of East Anglia, 2019).
7.2 Outcome
As mentioned previously in this paper, there were many aspects that contributed to
this project, which opened up room for a specific area to be focused more heavily. Early
on in the project I decided to focus on the human activity recognition aspect, as not
only was it the area that I found most interesting, but it is also quite a broad topic
in itself. This means that the outcomes of this project are not limited to one specific
field, but could be applied to many, one being the extension suggested for this project;
Gamification. I am happy with the choice I made regarding focusing the project into a
specific area, considering the results that were achieved and the information yielded.
When evaluating a project it is essential to look at the quality, and amount of, work
done to what could have been done in an ideal world. A common way of doing this
is to compare it against the requirements outlined at the beginning. To do this we can
look the MoSCoW table in section 3.1.4. From this we can see that all of the points in
both the ‘Must Have’ and ‘Should Have’ have been met, we can also see that the first
‘Could Have’ point has been met. Overall, based on this, and the previous paragraph, I
Reg: 100164855 38
CMP-6013Y
would say that this project has been a success as the majority of the requirements were
satisfied as well as useful information being gained from the results. The project was
also a success on paper as it was finished and delivered on time, including all supporting
material, which was helped hugely by the constant referral to the the revised Gantt
chart shown in the Appendix. Please the attached notebook for a record of supervisor
meetings and milestones reached.
7.3 Further Research
The field of human activity recognition in the context of exercise is a field that has
a huge amount of potential for different things to investigate and in possible real world
applications. An example further work, and real world application, that has been men-
tioned in this project but not carried out, is the use of the exercise recognition in an effort
to try and gamify and encourage exercise, hopefully with the aim of improving the peo-
ples general fitness. Another thing that could be researched further, and would likely
need to be for the aforementioned application, is to include a much larger database of
exercises that can be recognised, as to not limit people to a small number of exercises
that they can use to earn rewards etc. The recognition of exercises could also be applied
to personal training, or maybe even injury prevention and recovery. If the quality of
exercise recognition could get to a high enough level, it could be used check the ‘form’
in which an exercise is carried out which would have applications in the areas just men-
tioned as it could inform the user of what they may be doing wrong and how to improve
with the aim of maximising muscle usage or preventing injury etc.
8 Conclusion
The aim of this project was to evaluate the performance of existing classification
methods for time series data of non whole-body exercises in order to see whether it is
feasible for it to have applications in the real world. This project also had the extension
goal of seeing how possible, if all, it is to gamify exercise, however, this aspect was
dropped in favour of a more in detail look at human activity recognition.
Reg: 100164855 39
CMP-6013Y
The results that were found, supported the fact that the current level of the field is at a
reasonable level in order to start thinking about and working on real world applications.
We found, for the exercises in this paper that we could achieve up to 94.7% accuracy
rating when using a more favourable classification algorithm and data variation. There
are no doubt better classification algorithms and/or data variations that could be em-
ployed, some of which may have been mentioned in Section 6.13, in order to get better
results.
In the future I have no doubt that this field will continue to grow as more people realise
the huge potential it has to positively impact peoples lives as well as, more appealing
to some, make money. The development of increasingly powerful computer hardware,
and more importantly more intelligent & efficient algorithms for classifiers will open up
a new level of performance, in terms of both accuracy etc., and time efficiency.
Reg: 100164855 40
CMP-6013Y
9 Appendix
Algorithm 1 Classifying Test DataInput: trainData.ar f f and testData.ar f f
Output: The number of instances classified correctly and the number of instances tried
to classify
1: trainData← trainData.ar f f
2: classi f ier← define classifier type here e.g. ED1NN
3: build the classi f ier and train it with trainData
4: testData← testData.ar f f
5: correctCount ← 0
6: for each instance in testData do7: classi f iedInstance← prediction of the class of instance
8: actualClass← the actual class of instance
9: if classi f iedInstance = actualClass then10: correctCount = correctCount + 1
11: end if12: end for13: return correctCount and number of instances contained in testData
Figure 6: Prototype pseudocode showing: data being loaded, the classifier being
created and trained and then classifying the test data
Reg: 100164855 41
CMP-6013Y
Algorithm 2 Classifying all data-sets with all classifiersInput: None
Output: Results for each classifier and data-set written to a file
1: Let C be a list of classifiers
2: Let data be a list of training ar f f files and its corresponding test ar f f file
3: for each classi f ier in C do4: for each dataPair in data do5: trainingData← dataPair.train
6: testData← dataPair.test
7: build classi f ier with trainingData
8: results← object storing all results information for current dataPair
9: for each instance in testData do10: dist ← distribution of predicted classes for instance using classi f ier
11: prediction← class with highest value in dist
12: actual ← the actual class of instance
13: results.add(actual, dist, prediction)
14: end for15: resultMetrics← calculate all result metrics using results
16: write resultMetrics to appropriate file
17: end for18: end for
Figure 7: Pseudocode showing: looping through the classifiers, looping through the
data-sets, building the classifier, classifying test data, outputting results
Reg: 100164855 42
CMP-6013Y
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.4453 0.4141 0.8684 0.3646 0.3114 0.3017 0.652
concat_100HzSkip 0.4415 0.4113 0.8421 0.3616 0.3193 0.2986 0.652
concat_200Hz 0.4642 0.432 0.8947 0.3716 0.3235 0.3171 0.6695
concat_50HzMean 0.4717 0.4396 0.8947 0.3757 0.3353 0.3223 0.6771
concat_50HzSkip 0.4415 0.4093 0.8947 0.3676 0.2952 0.2981 0.6419
t_100HzMean 0.4453 0.4141 0.8684 0.3646 0.3114 0.3017 0.652
t_100HzSkip 0.4415 0.4113 0.8421 0.3616 0.3193 0.2986 0.652
t_200Hz 0.4642 0.432 0.8947 0.3716 0.3235 0.3171 0.6695
t_50HzMean 0.4717 0.4396 0.8947 0.3757 0.3353 0.3223 0.6771
t_50HzSkip 0.4415 0.4093 0.8947 0.3676 0.2952 0.2981 0.6419
x_100HzMean 0.3208 0.3248 0.375 0.4706 0.6932 0.2133 0.5682
x_100HzSkip 0.3358 0.3484 0.9697 0.3404 0.1678 0.2179 0.582
x_200Hz 0.317 0.321 0.375 0.4615 0.6818 0.211 0.5656
x_50HzMean 0.3019 0.3048 0.2344 0.375 0.7222 0.1838 0.5325
x_50HzSkip 0.3358 0.3484 0.9697 0.3404 0.1678 0.2179 0.582
y_100HzMean 0.3094 0.2827 0.8026 0.2891 0.1228 0.1953 0.5528
y_100HzSkip 0.3094 0.2827 0.8026 0.2877 0.1221 0.1956 0.55
y_200Hz 0.3094 0.2827 0.8026 0.2877 0.1221 0.1956 0.55
y_50HzMean 0.3283 0.3116 0.4737 0.3871 0.4722 0.2137 0.5455
y_50HzSkip 0.3283 0.3156 0.3684 0.459 0.6413 0.2114 0.5493
z_100HzMean 0.3208 0.3095 0.3289 0.3571 0.5714 0.2006 0.5518
z_100HzSkip 0.3321 0.3209 0.3289 0.4464 0.6702 0.2092 0.5475
z_200Hz 0.3321 0.3194 0.3684 0.4 0.5882 0.2108 0.5372
z_50HzMean 0.3396 0.327 0.3684 0.4179 0.6139 0.2153 0.5399
z_50HzSkip 0.3208 0.31 0.3158 0.3692 0.598 0.1998 0.5505
Table 3: Results for the AdaBoost classifier
Reg: 100164855 43
CMP-6013Y
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.9472 0.9447 0.9868 0.9615 0.9832 0.9458 0.9915
concat_100HzSkip 0.9245 0.9206 0.9737 0.881 0.9448 0.924 0.9908
concat_200Hz 0.9019 0.8963 0.9737 0.8605 0.9322 0.9003 0.9936
concat_50HzMean 0.917 0.9131 0.9737 0.9024 0.9548 0.9152 0.9705
concat_50HzSkip 0.9245 0.9226 0.9605 0.9481 0.9773 0.9228 0.9798
t_100HzMean 0.6491 0.6492 0.7105 0.8438 0.9219 0.6492 0.8759
t_100HzSkip 0.6453 0.6473 0.6974 0.8548 0.9291 0.6428 0.8689
t_200Hz 0.6868 0.6835 0.8026 0.8592 0.9237 0.6821 0.9037
t_50HzMean 0.5962 0.5994 0.6053 0.8364 0.9256 0.5988 0.8139
t_50HzSkip 0.6189 0.6192 0.7105 0.8438 0.9167 0.6142 0.8528
x_100HzMean 0.7245 0.7297 0.6316 0.8276 0.9351 0.7166 0.8903
x_100HzSkip 0.7019 0.7074 0.5921 0.7377 0.8981 0.6968 0.8958
x_200Hz 0.717 0.721 0.6316 0.7619 0.9045 0.7121 0.9027
x_50HzMean 0.7472 0.7546 0.6053 0.8846 0.962 0.7448 0.9086
x_50HzSkip 0.7208 0.7268 0.5789 0.8302 0.9423 0.7176 0.9012
y_100HzMean 0.8755 0.8724 0.9605 0.9241 0.9636 0.8722 0.9686
Table 4: Results for the BOSS classifier
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.5887 0.5871 0.5132 0.5493 0.7852 0.5839 0.7326
concat_100HzSkip 0.6189 0.6164 0.5658 0.589 0.8013 0.6157 0.7348
concat_200Hz 0.5925 0.5933 0.4868 0.5211 0.7792 0.5914 0.7468
concat_50HzMean 0.6151 0.6168 0.5132 0.6724 0.8671 0.6172 0.7446
t_100HzMean 0.4755 0.4761 0.3816 0.3333 0.6258 0.472 0.685
t_100HzSkip 0.4566 0.4599 0.3158 0.3158 0.651 0.4565 0.6697
t_200Hz 0.4981 0.5049 0.3026 0.3151 0.6855 0.4991 0.6906
t_50HzSkip 0.5509 0.557 0.3947 0.4615 0.7682 0.5526 0.7017
x_100HzMean 0.5358 0.5397 0.3947 0.4545 0.7568 0.536 0.6739
x_200Hz 0.5283 0.5317 0.4211 0.4051 0.6968 0.5312 0.7369
x_50HzMean 0.4377 0.4386 0.3684 0.3294 0.6069 0.4486 0.6486
x_50HzSkip 0.4528 0.4497 0.4474 0.3434 0.5695 0.4534 0.6206
y_100HzMean 0.5887 0.5781 0.7368 0.56 0.6944 0.5821 0.7355
y_200Hz 0.5472 0.5445 0.4868 0.4805 0.7297 0.5502 0.6646
y_50HzMean 0.6 0.5931 0.6842 0.6047 0.7589 0.5975 0.7588
y_50HzSkip 0.6415 0.6311 0.75 0.6404 0.7793 0.6349 0.7419
z_100HzMean 0.5245 0.5196 0.5263 0.597 0.7857 0.5169 0.7112
Table 5: Results for the C4.5 classifier
Reg: 100164855 44
CMP-6013Y
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.8943 0.8889 1.0 0.9268 0.9641 0.8883 0.9552
concat_100HzSkip 0.883 0.8775 0.9868 0.9259 0.9636 0.8768 0.9484
concat_200Hz 0.8717 0.8679 0.9474 0.9114 0.9578 0.8669 0.9358
concat_50HzMean 0.9094 0.9047 1.0 0.9383 0.9706 0.9045 0.961
t_100HzSkip 0.7057 0.7076 0.6579 0.6173 0.8155 0.703 0.813
t_200Hz 0.6868 0.6867 0.6711 0.593 0.7892 0.6794 0.8085
t_50HzMean 0.717 0.716 0.7368 0.6747 0.8323 0.7058 0.8365
x_100HzMean 0.7132 0.7083 0.7763 0.6146 0.7784 0.6989 0.8337
x_100HzSkip 0.7208 0.7129 0.8158 0.6739 0.8113 0.7024 0.8445
x_200Hz 0.7094 0.7011 0.8289 0.63 0.7716 0.6907 0.8364
x_50HzSkip 0.7019 0.6981 0.75 0.6628 0.8165 0.6795 0.8314
y_100HzMean 0.8566 0.8477 0.9605 0.8295 0.9112 0.8492 0.9164
y_200Hz 0.8604 0.8532 0.9474 0.8471 0.9231 0.8546 0.9198
y_50HzMean 0.8566 0.8481 0.9474 0.878 0.9394 0.8488 0.9171
y_50HzSkip 0.8642 0.858 0.9342 0.8256 0.9133 0.8605 0.9161
z_100HzSkip 0.834 0.826 0.9474 0.9351 0.9675 0.8233 0.923
z_200Hz 0.8038 0.7945 0.9474 0.8372 0.9097 0.7897 0.9052
z_50HzSkip 0.8 0.7902 0.9474 0.9231 0.9589 0.7836 0.909
Table 6: Results for the DTW1NN @ 30% classifier
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.8717 0.8664 0.9737 0.9024 0.9515 0.8657 0.9699
concat_100HzSkip 0.8642 0.8579 0.9737 0.881 0.9394 0.8582 0.9635
concat_50HzMean 0.8792 0.8728 1.0 0.8941 0.9458 0.8717 0.9721
concat_50HzSkip 0.8679 0.8611 1.0 0.8539 0.9222 0.8596 0.9693
t_100HzMean 0.717 0.7176 0.7105 0.6067 0.7953 0.7109 0.895
t_100HzSkip 0.7283 0.73 0.7237 0.6044 0.7931 0.7212 0.8862
t_50HzMean 0.7321 0.7325 0.7368 0.6154 0.7977 0.725 0.9002
t_50HzSkip 0.7434 0.7447 0.7368 0.6437 0.8198 0.739 0.8893
x_100HzMean 0.7019 0.6904 0.8816 0.6262 0.7484 0.6703 0.8978
x_100HzSkip 0.6943 0.6821 0.8684 0.6286 0.7516 0.6617 0.8936
x_50HzMean 0.7132 0.7059 0.8289 0.6562 0.7925 0.6877 0.8849
x_50HzSkip 0.6868 0.678 0.8026 0.6421 0.7806 0.6586 0.8831
y_100HzMean 0.8717 0.8639 0.9605 0.8391 0.9186 0.866 0.9663
y_100HzSkip 0.8642 0.857 0.9474 0.8571 0.929 0.8581 0.9604
y_50HzMean 0.8717 0.8648 0.9474 0.9 0.9521 0.866 0.9616
y_50HzSkip 0.8642 0.8564 0.9474 0.8571 0.929 0.8579 0.964
z_100HzMean 0.8226 0.8129 0.9474 0.8889 0.9419 0.8104 0.9409
z_100HzSkip 0.8189 0.8097 0.9474 0.8889 0.9416 0.8065 0.9393
z_50HzMean 0.7887 0.7778 0.9474 0.9114 0.9514 0.7707 0.9313
z_50HzSkip 0.7925 0.7812 0.9342 0.8765 0.9329 0.7759 0.9416
Table 7: Results for the DTW3NN @ 30% classifier
Reg: 100164855 45
CMP-6013Y
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.8 0.7939 0.8684 0.9429 0.9733 0.7925 0.8891
concat_50HzMean 0.8226 0.8152 0.9211 0.9459 0.9737 0.8138 0.9084
concat_50HzSkip 0.7962 0.7915 0.8421 0.9552 0.98 0.7901 0.8842
t_100HzMean 0.683 0.6819 0.6579 0.641 0.8239 0.6792 0.8033
t_100HzSkip 0.6755 0.6737 0.6579 0.6329 0.8165 0.6711 0.8
t_50HzMean 0.6868 0.6846 0.6842 0.6341 0.8125 0.6819 0.8086
t_50HzSkip 0.6679 0.668 0.6184 0.6351 0.828 0.6631 0.7934
x_100HzMean 0.4038 0.4151 0.0921 1.0 1.0 0.3504 0.5907
x_100HzSkip 0.3887 0.4004 0.0658 1.0 1.0 0.3271 0.5795
x_50HzMean 0.4302 0.44 0.1579 0.9231 0.9903 0.3839 0.6158
x_50HzSkip 0.3962 0.4089 0.0658 0.8333 0.9901 0.3343 0.585
y_100HzMean 0.5547 0.5497 0.3816 0.9355 0.9833 0.5102 0.6661
y_100HzSkip 0.5472 0.5426 0.3684 0.9032 0.975 0.5026 0.6618
y_50HzMean 0.5547 0.5494 0.3816 0.9355 0.9833 0.5071 0.6631
y_50HzSkip 0.5585 0.553 0.3947 0.9091 0.9752 0.5137 0.6695
z_100HzMean 0.6755 0.6704 0.75 0.9661 0.9839 0.6486 0.8194
z_100HzSkip 0.6717 0.6671 0.7368 0.9655 0.9839 0.645 0.8153
z_50HzMean 0.6906 0.6845 0.7895 0.9524 0.9762 0.6609 0.8354
z_50HzSkip 0.6679 0.662 0.7632 0.9667 0.9835 0.6346 0.8209
Table 8: Results for the ED1NN classifier
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.8491 0.8447 0.9079 0.8734 0.9398 0.8448 0.9464
concat_100HzSkip 0.8453 0.8392 0.9211 0.8642 0.9333 0.8396 0.9473
concat_200Hz 0.8528 0.8498 0.8816 0.859 0.9353 0.8501 0.9506
concat_50HzMean 0.834 0.8301 0.8684 0.8684 0.9394 0.8297 0.9439
t_100HzMean 0.6189 0.6214 0.5395 0.4659 0.7235 0.6189 0.8114
t_50HzMean 0.6189 0.6212 0.5263 0.4762 0.7381 0.6194 0.793
t_50HzSkip 0.6 0.6032 0.5132 0.4643 0.7273 0.6009 0.8022
x_100HzSkip 0.566 0.5748 0.3026 0.46 0.8247 0.5553 0.7646
x_200Hz 0.5849 0.5939 0.3289 0.5435 0.8609 0.5699 0.7942
x_50HzMean 0.5585 0.5693 0.2763 0.4375 0.8247 0.5477 0.7695
x_50HzSkip 0.5849 0.5908 0.3947 0.5263 0.8224 0.5723 0.7859
y_100HzMean 0.7019 0.6843 0.8947 0.7816 0.8613 0.6706 0.8781
y_100HzSkip 0.7283 0.7113 0.9342 0.7978 0.8714 0.7046 0.888
y_200Hz 0.7321 0.7136 0.9605 0.7935 0.8643 0.7048 0.896
y_50HzMean 0.7132 0.6961 0.9079 0.7931 0.8696 0.6881 0.8836
y_50HzSkip 0.7094 0.6923 0.8947 0.8293 0.8955 0.6836 0.883
z_100HzMean 0.6943 0.6883 0.7368 0.8358 0.9209 0.6804 0.8696
z_100HzSkip 0.6755 0.6687 0.6974 0.7465 0.875 0.6635 0.8554
z_200Hz 0.6868 0.6814 0.7105 0.7941 0.9014 0.6747 0.8652
z_50HzMean 0.7057 0.702 0.7105 0.8308 0.9236 0.694 0.8768
z_50HzSkip 0.6642 0.66 0.6842 0.7761 0.8921 0.6517 0.8436
Table 9: Results for the MLP classifier
Reg: 100164855 46
CMP-6013Y
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.7623 0.7632 0.7105 0.72 0.8757 0.7602 0.9077
concat_100HzSkip 0.683 0.682 0.6711 0.6375 0.8176 0.6803 0.8826
concat_50HzMean 0.6943 0.6945 0.6316 0.6486 0.8395 0.6892 0.8795
concat_50HzSkip 0.7208 0.7223 0.6316 0.6154 0.8266 0.7231 0.8715
t_100HzMean 0.6189 0.6239 0.5132 0.4815 0.7485 0.6176 0.829
t_100HzSkip 0.5774 0.5872 0.3816 0.4531 0.7799 0.5746 0.8249
t_200Hz 0.5623 0.5683 0.4474 0.4096 0.7012 0.564 0.8251
t_50HzMean 0.6113 0.6174 0.4605 0.4861 0.7744 0.6158 0.8263
t_50HzSkip 0.5547 0.5642 0.3816 0.3718 0.7066 0.5532 0.8138
x_100HzMean 0.5962 0.6012 0.4211 0.4156 0.7368 0.5959 0.7833
x_100HzSkip 0.5811 0.5833 0.4868 0.5139 0.7697 0.5727 0.8138
x_50HzMean 0.5623 0.5678 0.3816 0.4531 0.7742 0.5483 0.7785
x_50HzSkip 0.5358 0.5329 0.4737 0.3913 0.6543 0.529 0.7561
y_100HzMean 0.6792 0.6727 0.6842 0.6667 0.8312 0.6703 0.8517
y_100HzSkip 0.6528 0.6451 0.6711 0.622 0.7974 0.6415 0.8637
y_200Hz 0.7434 0.7387 0.7237 0.7143 0.8659 0.7354 0.9
y_50HzMean 0.7208 0.719 0.6711 0.7286 0.8805 0.7184 0.8907
y_50HzSkip 0.7358 0.725 0.8158 0.7294 0.8526 0.7219 0.8992
z_100HzMean 0.6981 0.6936 0.7368 0.6829 0.8323 0.6922 0.8676
z_100HzSkip 0.6755 0.6701 0.75 0.7808 0.8841 0.6617 0.863
z_200Hz 0.6642 0.6546 0.75 0.7703 0.875 0.6505 0.8597
z_50HzMean 0.6566 0.6504 0.6842 0.6582 0.8188 0.6436 0.8603
z_50HzSkip 0.6906 0.6839 0.7632 0.7532 0.8681 0.6746 0.8661
Table 10: Results for the Random Forest classifier
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.7925 0.7934 0.7105 0.7297 0.8864 0.7932 0.9429
concat_100HzSkip 0.8302 0.827 0.8158 0.8052 0.9133 0.8273 0.9465
concat_50HzMean 0.8075 0.8059 0.7763 0.8082 0.9172 0.8056 0.9433
concat_50HzSkip 0.8151 0.8174 0.7368 0.8116 0.9249 0.8157 0.9478
t_100HzMean 0.634 0.6414 0.4737 0.4932 0.7811 0.6375 0.8495
t_100HzSkip 0.6226 0.6337 0.4079 0.4844 0.8024 0.6247 0.8609
t_200Hz 0.6189 0.6243 0.5263 0.4598 0.7251 0.612 0.8559
t_50HzSkip 0.6038 0.6086 0.5132 0.4432 0.7118 0.595 0.8368
x_100HzMean 0.6151 0.6169 0.5395 0.494 0.7439 0.6123 0.8217
x_100HzSkip 0.5736 0.5712 0.5526 0.4286 0.6627 0.5747 0.8153
x_50HzMean 0.566 0.564 0.5263 0.4255 0.6707 0.562 0.8053
x_50HzSkip 0.6453 0.637 0.7105 0.5625 0.7358 0.6352 0.8455
y_100HzMean 0.8038 0.7956 0.8684 0.7857 0.8909 0.7947 0.9378
y_200Hz 0.8264 0.82 0.8816 0.8072 0.9048 0.8201 0.9444
y_50HzMean 0.8113 0.8006 0.9211 0.7527 0.8631 0.8009 0.9292
y_50HzSkip 0.8415 0.8356 0.8947 0.8608 0.9337 0.8352 0.9474
z_100HzMean 0.683 0.6701 0.8158 0.6596 0.7881 0.6684 0.8854
z_100HzSkip 0.7736 0.7665 0.8553 0.7471 0.8642 0.7656 0.9134
z_200Hz 0.7509 0.7397 0.8816 0.7204 0.8354 0.7398 0.9069
z_50HzMean 0.6981 0.6876 0.7895 0.6593 0.8013 0.6858 0.8845
z_50HzSkip 0.7245 0.7124 0.8816 0.7053 0.817 0.7129 0.9032
Table 11: Results for the Rotation Forest classifier
Reg: 100164855 47
CMP-6013Y
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzSkip 0.8302 0.8267 0.8684 0.7857 0.8953 0.8283 0.9253
concat_200Hz 0.8264 0.8224 0.8684 0.7952 0.9 0.8238 0.9243
t_100HzMean 0.5774 0.5824 0.4605 0.4118 0.7024 0.5767 0.7731
t_200Hz 0.5698 0.5796 0.3684 0.3836 0.7321 0.573 0.7648
t_50HzMean 0.5925 0.5956 0.5263 0.4255 0.6842 0.5821 0.7898
t_50HzSkip 0.6038 0.6059 0.5658 0.43 0.6724 0.5929 0.8025
x_100HzMean 0.4491 0.4559 0.2368 0.36 0.7594 0.4404 0.6485
x_100HzSkip 0.4604 0.4663 0.2632 0.4 0.7727 0.4519 0.6457
x_50HzMean 0.4604 0.4632 0.2895 0.3929 0.7463 0.4473 0.6615
x_50HzSkip 0.4755 0.4784 0.2895 0.5 0.8254 0.4588 0.6525
y_100HzMean 0.6566 0.6353 0.9079 0.7841 0.8468 0.6209 0.8148
y_100HzSkip 0.6604 0.6395 0.9079 0.7841 0.848 0.6267 0.8179
y_200Hz 0.6415 0.6222 0.8947 0.7816 0.843 0.6173 0.8187
y_50HzMean 0.634 0.6095 0.9079 0.7753 0.8319 0.5792 0.8091
y_50HzSkip 0.6415 0.6179 0.8947 0.8095 0.8644 0.5909 0.8101
z_100HzMean 0.5019 0.5007 0.3947 0.7143 0.8957 0.4811 0.7068
z_100HzSkip 0.4868 0.4858 0.3816 0.6444 0.8621 0.47 0.6998
z_50HzMean 0.5094 0.5034 0.4868 0.7551 0.8909 0.4834 0.7012
z_50HzSkip 0.4906 0.4876 0.3947 0.6667 0.8696 0.4661 0.683
Table 12: Results for the SMO classifier
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.8038 0.8024 0.8158 0.7848 0.8988 0.8042 0.9492
concat_100HzSkip 0.7925 0.7897 0.8289 0.8077 0.9074 0.7902 0.9529
concat_200Hz 0.7811 0.7798 0.7895 0.7895 0.9018 0.7806 0.9488
concat_50HzMean 0.8113 0.8115 0.8026 0.7722 0.8953 0.8125 0.9501
concat_50HzSkip 0.8038 0.8024 0.8289 0.7975 0.9036 0.8028 0.952
t_100HzMean 0.6943 0.6967 0.6579 0.5882 0.7929 0.6896 0.9075
t_100HzSkip 0.7094 0.7105 0.6974 0.6163 0.8036 0.7039 0.9078
t_50HzMean 0.6868 0.6883 0.6579 0.5882 0.7904 0.6838 0.9064
t_50HzSkip 0.7094 0.7114 0.6842 0.6047 0.8 0.7057 0.9103
x_100HzMean 0.717 0.7175 0.6842 0.5909 0.7931 0.7195 0.9254
x_100HzSkip 0.7245 0.7262 0.6711 0.6145 0.815 0.7279 0.9268
x_50HzSkip 0.7434 0.7433 0.7368 0.6222 0.8057 0.7433 0.9285
y_100HzMean 0.9245 0.9221 0.9605 0.8795 0.9451 0.9252 0.9844
y_50HzSkip 0.917 0.9136 0.9605 0.8795 0.9444 0.9168 0.9872
z_100HzMean 0.8226 0.8138 0.9342 0.8353 0.913 0.8155 0.9596
z_100HzSkip 0.8415 0.8346 0.9342 0.8659 0.9325 0.836 0.9602
z_200Hz 0.834 0.8258 0.9342 0.8353 0.9146 0.8277 0.959
z_50HzMean 0.8566 0.848 0.9737 0.8506 0.9217 0.8494 0.9649
z_50HzSkip 0.8491 0.8416 0.9474 0.8372 0.9162 0.8442 0.9622
Table 13: Results for the TSF classifier
Reg: 100164855 48
CMP-6013Y
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCconcat_100HzMean 0.8943 0.8882 0.9605 0.8295 0.9162 0.8918 0.9755
concat_200Hz 0.8604 0.8536 0.9474 0.8 0.8966 0.857 0.973
concat_50HzMean 0.883 0.8777 0.9342 0.8068 0.9056 0.8815 0.9751
concat_50HzSkip 0.8755 0.8695 0.9474 0.8471 0.9249 0.8716 0.9716
t_100HzMean 0.6038 0.6113 0.4605 0.4375 0.7353 0.5988 0.8448
t_100HzSkip 0.6264 0.6286 0.5789 0.5 0.7349 0.6177 0.8457
t_200Hz 0.6151 0.6186 0.5526 0.4828 0.7289 0.6052 0.8517
t_50HzMean 0.634 0.6349 0.6184 0.5054 0.7246 0.6206 0.8544
t_50HzSkip 0.6453 0.6485 0.5789 0.4944 0.7384 0.6434 0.859
x_100HzMean 0.6453 0.647 0.5526 0.5 0.7544 0.6467 0.8671
x_200Hz 0.6528 0.6546 0.5395 0.494 0.7586 0.6548 0.8571
x_50HzMean 0.6755 0.6746 0.6053 0.5227 0.76 0.6765 0.8657
x_50HzSkip 0.6528 0.6528 0.5658 0.5059 0.7558 0.6553 0.8508
y_100HzMean 0.8264 0.8149 0.9474 0.7826 0.8802 0.8141 0.958
y_100HzSkip 0.8151 0.8017 0.9474 0.7912 0.8834 0.7969 0.9557
y_50HzMean 0.8189 0.8074 0.9474 0.8276 0.9062 0.8057 0.9547
y_50HzSkip 0.7925 0.7774 0.9474 0.7826 0.8734 0.7701 0.9532
z_100HzMean 0.7774 0.7711 0.8158 0.7045 0.8471 0.772 0.928
z_100HzSkip 0.7509 0.7397 0.8553 0.7065 0.8323 0.7386 0.9172
z_200Hz 0.766 0.7576 0.8289 0.7079 0.8434 0.7594 0.9218
Table 14: Results for the XGBoost classifier
Data-Set Accuracy Bal Accuracy Sensitivity Precision Specificity F1 AUROCAll 0.2868 0.25 1.0 0.2868 0.0 0.1114 0.4901
Table 15: Results for the ZeroR classifier
Reg: 100164855 49
CMP-6013Y
Proj
ects
ched
ule
show
nfo
rsem
este
rwee
knu
mbe
rs
12
34
56
78
910
1112
Chr
istm
asB
reak
12
34
56
78
910
Eas
terB
reak
1112
1314
Proj
ectp
ropo
sal
Lite
ratu
rere
view
Lite
ratu
rere
view
deliv
ery
Des
ign
Des
ign
com
plet
ion
Cod
ing
Test
ing
Cod
ede
liver
y
Fina
lrep
ortw
ritin
g
Insp
ectio
npr
epar
atio
n
Figure 8: Original project Gantt chart
Reg: 100164855 50
CMP-6013Y
Proj
ects
ched
ule
show
nfo
rsem
este
rwee
knu
mbe
rs
12
34
56
78
910
1112
Chr
istm
asB
reak
12
34
56
78
910
Eas
terB
reak
1112
1314
Proj
ectp
ropo
sal
Lite
ratu
rere
view
Lite
ratu
rere
view
deliv
ery
Prot
otyp
eD
esig
n&
Impl
emen
tatio
n
Pro
toty
pede
liver
y
Intr
oduc
em
ore
exer
cise
s
Exp
erim
entin
gw
ithda
tase
tvar
iatio
ns
Proj
ecte
xten
sion
s
Fina
lIm
plem
enta
tion
Del
iver
y
Fina
lrep
ortw
ritin
g
Insp
ectio
npr
epar
atio
n
Pro
ject
Com
plet
ed
Figure 9: Revised project Gantt chart
Reg: 100164855 51
CMP-6013Y
References
Ahmadi, M. et al. (2016). Novel feature extraction, selection and fusion for effective
malware family classification. In Proceedings of the Sixth ACM Conference on Data
and Application Security and Privacy, pages 183–194. ACM.
Awad, M. and Khanna, R. (2015). Support vector machines for classification. Efficient
Machine Learning, pages 39–66.
Bagnall, A. et al. (2016). Uea & ucr time series classification repository. http:
// timeseriesclassification.com/ .
Bagnall, A. et al. (2017). The great time series classification bake off: a review and
experimental evaluation of recent algorithmic advances. Data Mining and Knowledge
Discovery, 31(3):606–660.
Bostrom, A. and Bagnall, A. (2015). Binary shapelet transform for multiclass time
series classification.
Breiman, L. (2001). Random forests. Machine Learning, 45:5–32.
Chang, K. et al. (2007). Tracking free-weight exercises. International Conference on
Ubiquitous Computing, 4717:19–37.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceed-
ings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pages 785–794. ACM.
Deng, H. et al. (2013). A time series forest for classification and feature extraction.
Information Sciences, 239:142–153.
Gardner, M. and Dorling, S. (1998). Artificial neural networks (the multilayer percep-
tron)âATa review of applications in the atmospheric sciences. Atmospheric Environ-
ment, 32:2627–2636.
Reg: 100164855 52
CMP-6013Y
Ghimire, B. et al. (2012). An assessment of the effectiveness of a random forest clas-
sifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote
Sensing, 67:93–104.
Goh, D. and Razikin, K. (2015). Is gamification effective in motivating exercise? Inter-
national Conference on Human-Computer Interaction, 9170:608–617.
Hills, J. et al. (2014). Classification of time series by shapelet transformation. Data
Mining and Knowledge Discovery, 28(4):851–881.
Hsu, C.-W. et al. (2002). A comparison of methods for multiclass support vector ma-
chines. IEEE Transactions on Neural Networks, 13(2):415–425.
Krishna, G. et al. (2013). Performance analysis and evaluation of different data mining
algorithms used for cancer classification. International Journal of Advanced Research
in Artificial Intelligence, 2.
Kwapisz, J. et al. (2010). Activity recognition using cell phone accelerometers.
SIGKDD Explorations, 12:74–82.
Lin, J. et al. (2012). Rotation-invariant similarity in time series using bag-of-patterns
representation. Journal of Intelligent Information Systems, 39(2):287–315.
Lines, J. (2015). Time Series Classification through Transformation and Ensembles.
PhD thesis, Computing.
Lines, J. (2018a). Initial project title and description. https:// 3yp.cmp.uea.ac.uk/
projects/ 851/ .
Lines, J. (2018b). An introduction to time series classification and the uea code reposi-
tory.
Lines, J. et al. (2016). Hive-cote: The hierarchical vote collective of transformation-
based ensembles for time series classification. 2016 IEEE 16th International Confer-
ence on Data Mining (ICDM).
Oracle Corporation (1995). Java programming language. https:// www .java.com/ en/ .
Reg: 100164855 53
CMP-6013Y
Pal, M. (2005). Random forest classifier for remote sensing classification. International
Journal of Remote Sensing, 26(1):217–222.
Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support
vector machines.
Quinlan, J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann.
Rodriguez, J. et al. (2006). Rotation forest: A new classifier ensemble method. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 28:1619–1630.
Rossum, G. (1990). Python programming language. https:// www .python.org/ .
Schäfer, P. (2015). The boss is concerned with time series classification in the presence
of noise. Data Mining and Knowledge Discovery, 29(6):1505–1530.
Schapire, R. E. (2013). Explaining adaboost. Empirical Inference, pages 37–52.
Schjerve, I. et al. (2008). Both aerobic endurance and strength training programmes
improve cardiovascular health in obese adults. Clinical Science, 115(9):283–293.
The XGBoost Contributors (2014). Xgboost documentation. https:
// xgboost.readthedocs.io/ en/ latest/ . Accessed: 15/04/2019.
UC Irvine (1987). Uc irvine machine learning repository. https:// archive.ics.uci.edu/
ml/ datasets.php.
University of East Anglia (2019). Enable your research with high performance comput-
ing at uea. https:// rscs.uea.ac.uk/ new-high-performance-computing-cluster .
University of Waikato (1997). Waikato environment for knowledge analysis. https:
// www .cs.waikato.ac.nz/ ~ml/ weka/ .
Vieyra Software (2014). Physics toolbox sensor suite. https:// www .vieyrasoftware.net/
physics-toolbox-sensor-suite.
Reg: 100164855 54
CMP-6013Y
Witten, I. H. et al. (1999). Weka: Practical machine learning tools and techniques with
java implementations. https:// researchcommons.waikato.ac.nz/ bitstream/ handle/
10289/ 1040/ uow-cs-wp-1999-11.pdf .
Witten, I. H. et al. (2011). Data mining:practical machine learning tools and techniques.
https:// www .cs.waikato.ac.nz/ ~ml/ weka/ book .html .
Wu, X. et al. (2008). Top 10 algorithms in data mining. Knowledge and Information
Systems, 14:1–37.
Reg: 100164855 55