10
Tailoring A Personal E-Bike Incentive Platform Using Machine Learning Techniques Christian Gorenflo and Priyank Jaini David R. Cheriton School of Computer Science University of Waterloo Abstract— Gamification and social norms have been shown to be important incentives in various application areas. For this project an android smart watch app has been designed and implemented based on the rich data set collected from the e-bike sensors. This data is used to classify the participants into different classes calculated by an online clustering algorithm. Riders get awarded points for their usage of the e-bike. The smart watch users can then compare their current performance to others via leader boards. I. INTRODUCTION 1 Global Warming will be a very real threat to modern society in the coming years. The damage done by severe weather conditions will raise the government expenses dramatically. Additionally, the increase in the average global temperature will have dire consequences for harvesting crops all over the world. In light of the recent agreement at the 2015 United Nations Climate Change Conference to limit the warm- ing ”to well below 2 C” [1] and the current state of an increase of already about 1 C [2], it is paramount to take action. It will probably not be possible to reach the goal by improving technology alone, a change in consumer behaviour will also have to accompany it. Climate change is mainly driven by the excessive emission of greenhouse gasses like CO (carbon monox- ide), CO 2 (carbon dioxide) and CH 4 (methane). The transportation sector is a major producer of these gasses as most vehicles are fuel driven and exhaust (primarily) carbon oxides in the process. In this project, we focus on the sector of privately owned vehicles – mainly cars and bicycles. 1 DISCLAIMER—One author of this report, Christian Gorenflo, uses the data of the WeBike project for another course project. Therefore parts of the Introduction and the WeBike project section might resemble or equal these sections in the other report. For the other project, a trip prediction model based on the time of day, day of week, and month is developed, which has no intersection with the original research done for this project. Nowadays, many people use their car even for short trips that could easily be done riding a bike or even on foot. As long as the price of gas is low enough that these trip don’t become noticeable financially, convenience and comfort will far outweigh any regard for the environment. However, raising gas prices to a point where short trips financially hurt people would elevate cars to a luxury item and in turn lead too huge ramifications for society. On the other hand changing the perception of a majority of the population for the dangers of climate change before severe consequences actually happen—it would be too late then—is an unrealistic thought. Therefore other methods to shape the common behaviour have to be found. In summary, we identify three possible venues to decrease exhaust gas production: Technological improvements Monetary incentives Non-monetary incentives Technological improvements can work, provided they don’t suffer from disadvantages in comparison to current solutions. Namely, they need to be at least as comfortable and convenient as established technology and be available at a comparable price, if they don’t have any additional appeal (e.g., as a status symbol). However, if technology changes to drastically, con- sumers will mistrust it regardless of equal benefits to known solutions. For example, electric cars have a high buy-in cost and shorter range than traditional cars. Even though lower energy prices might make up for the price difference in the long run, the perceived financial benefit and greater convenience due to wider range of gas fueled cars hamper the adoption of this new technology. As stated before, monetary incentives don’t work well by penalizing existing solutions. Rather, new tech- nology should come with a benefit to increase adoption. This can either be achieved by decreased production

Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

Tailoring A Personal E-Bike Incentive Platform Using MachineLearning Techniques

Christian Gorenflo and Priyank JainiDavid R. Cheriton School of Computer Science

University of Waterloo

Abstract— Gamification and social norms have beenshown to be important incentives in various applicationareas. For this project an android smart watch app hasbeen designed and implemented based on the rich dataset collected from the e-bike sensors. This data is used toclassify the participants into different classes calculatedby an online clustering algorithm. Riders get awardedpoints for their usage of the e-bike. The smart watchusers can then compare their current performance toothers via leader boards.

I. INTRODUCTION1

Global Warming will be a very real threat to modernsociety in the coming years. The damage done bysevere weather conditions will raise the governmentexpenses dramatically. Additionally, the increase in theaverage global temperature will have dire consequencesfor harvesting crops all over the world.

In light of the recent agreement at the 2015 UnitedNations Climate Change Conference to limit the warm-ing ”to well below 2◦C” [1] and the current state of anincrease of already about 1◦C [2], it is paramount totake action. It will probably not be possible to reachthe goal by improving technology alone, a changein consumer behaviour will also have to accompanyit. Climate change is mainly driven by the excessiveemission of greenhouse gasses like CO (carbon monox-ide), CO2 (carbon dioxide) and CH4 (methane). Thetransportation sector is a major producer of these gassesas most vehicles are fuel driven and exhaust (primarily)carbon oxides in the process. In this project, we focuson the sector of privately owned vehicles – mainly carsand bicycles.

1DISCLAIMER—One author of this report, Christian Gorenflo,uses the data of the WeBike project for another course project.Therefore parts of the Introduction and the WeBike project sectionmight resemble or equal these sections in the other report. For theother project, a trip prediction model based on the time of day, dayof week, and month is developed, which has no intersection withthe original research done for this project.

Nowadays, many people use their car even for shorttrips that could easily be done riding a bike or evenon foot. As long as the price of gas is low enoughthat these trip don’t become noticeable financially,convenience and comfort will far outweigh any regardfor the environment. However, raising gas prices to apoint where short trips financially hurt people wouldelevate cars to a luxury item and in turn lead too hugeramifications for society. On the other hand changingthe perception of a majority of the population for thedangers of climate change before severe consequencesactually happen—it would be too late then—is anunrealistic thought. Therefore other methods to shapethe common behaviour have to be found.

In summary, we identify three possible venues todecrease exhaust gas production:

• Technological improvements• Monetary incentives• Non-monetary incentives

Technological improvements can work, providedthey don’t suffer from disadvantages in comparison tocurrent solutions. Namely, they need to be at least ascomfortable and convenient as established technologyand be available at a comparable price, if they don’thave any additional appeal (e.g., as a status symbol).However, if technology changes to drastically, con-sumers will mistrust it regardless of equal benefits toknown solutions. For example, electric cars have a highbuy-in cost and shorter range than traditional cars. Eventhough lower energy prices might make up for theprice difference in the long run, the perceived financialbenefit and greater convenience due to wider rangeof gas fueled cars hamper the adoption of this newtechnology.

As stated before, monetary incentives don’t workwell by penalizing existing solutions. Rather, new tech-nology should come with a benefit to increase adoption.This can either be achieved by decreased production

Page 2: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

costs for the producer, lower maintenance costs for theconsumer or state subsidization.

Lastly, change can be effectuated through non-monetary incentives. These base mainly on social en-gineering. That means that policies are implemented toeither induce negative emotions for negative ecologicalbehavioral patterns or positive emotions for environ-mentally friendly behavioral patterns.

This paper is based on the work on the WeBikeproject [3] [4], which uses technological improvementsto increase the convenience of a car alternative. Weadd in non-monetary incentives based on gamificationto study the impact on participants of the study.

In section II we describe the set-up and current statusof the WeBike project.

Then in section III, we show the new hardwareelement we introduce into the system and how itconnects to the existing architecture.

Following that, section IV explains potential ap-plication, that could be implemented with the newarchitecture.

We go into detail of the clustering algorithm (sectionV) underlying the chosen application for this paper,show the application implementation (section VI) anddescribe the problems we faced.

In section VII we talk about the lessons we learnedwhile doing this project.

We draw our conclusion in section VIII and finallysketch out future Work in section IX.

II. WEBIKE PROJECT1

A. Electric bicycles

As stated before, many people shun from using theirbicycle to do short trips out of a perceived inconve-nience compared to using their car. This fact can bepartially mitigated by making bicycles more appealing.One Solution is to equip bicycles with an electric motorto support the efforts of the rider. Even so—for electriccars—skepticism towards this technology born fromrange anxiety exists in the general population. To studythe concerns and the adoption behaviour of electricbicycles, a project called WeBike [3][4] by the ISS4Egroup at the University of Waterloo started in mid 2014.

After a survey, a fleet of about 30 electric bicycles(e-bikes) were distributed among University faculty andstudents who participated in the study. These e-bikesare equipped with a battery that has a capacity tosupport the rider for about 40km and a sensor kitthat is attached to the battery. This sensor kit consistsmainly of a Samsung Galaxy S3 smart phone with its

built-in sensors (GPS, clock, gyroscope, accelerometer,magnetometer) and additional sensors for measuringambient temperature and charge/discharge current andvoltage. The sensor kit is automatically charged directlyfrom the battery.

Fig. 1. Electric bicycle with battery and sensor kit

The battery plus sensor kit can be removed from thee-bike and carried with the rider in order to chargeit from a power supply. In order to preserve energyand therefore the supported range of the e-bike, thesensors are currently only activated twice per minutefor 2 seconds. This resolution is high enough to detectcharging events and riding trips. From these data points,intermediate time intervals can be interpolated.

B. Architecture

The gathered data is first saved to the smart phone’sinternal storage space. Then, whenever study partici-pants take their e-bikes to the University of Waterloocampus, the smart phone connects to the University’sWI-FI network to upload the sensor data to a server,where the raw data is stored in a MySQL database.

This raw data is used to detect trips, which are thenstored in different tables per rider in order to get quickaccess to all data points belonging to a particular trip.

C. Web platform for rider statistics

The participants of the study have access to basicstatistics of their riding behaviour. They can view ahistogram of the distance traveled per day over a periodof several weeks. They can also choose to have a mapof trips for a particular day displayed. Both of thesestatistics are derived from the GPS sensor of the smartphone.

Page 3: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

Fig. 2. The smart phone in the sensor kit attached to the batteryconnects to the University’s WI-FI network to upload the collectedsensor data as a batch to the server database

Fig. 3. A histogram of distance traveled per day is shown on theweb

Additionally, an estimation of the state of charge(SOC) of the battery over time is provided. This cannotbe measured directly but must be calculated fromcharge/discharge current and voltage.

D. WeBike Data Exploration

In this section, we give an outline of the data col-lected in the WeBike project. Subsequently, we discussan online algorithm that can used for the data forvarious learning tasks.

1) Data Exploration: Figure4 shows a histogram forthe trip frequency for different times in a day. We can

see that the peak hours are during morning and lateevening indicating a usage amongst commuters.

Fig. 4. Trip timings in a day

Figure5, shows the frequency of average speed forthe users of the electric bike. The figure suggests thatmost of the users are conservative riders with only afew riders having a very high average speed.

Fig. 5. Average Speed Frequency

Figure6 shows a histogram for the trip frequencyin a month. Such data exploration also allows usto make resonable assumptions for the classificationalgorithm. In a later section we will try to estimate thedistribution of the data assuming a Gaussian MixtureModel (GMM).

Fig. 6. Trip frequency in a month

Page 4: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

III. SMART WATCH

Smart watches and fitness trackers—though stillrather uncommon—have become increasingly popularin the recent years with a multitude of major techcompanies providing their own version. These gadgetsare characterized by a digital display which can showdynamic content and a plethora of sensors. As we pre-dict their popularity to further pick up in the future, welook to benefit from this development by introducingsmart watches into our system architecture. There aretwo major benefits gained from this:

• Using the smart watch’s sensors in addition to thee-bike’s sensor kit to either get additional metrics(e.g., heart rate) or improve existing one’s byhaving redundant sensors (e.g., GPS)

• Providing immediate feedback to the e-bike ridervia the smart watch’s display

A. Introduction into existing Architecture

For the purposes of this project, a single smart watchof the type Motorola Moto 360 is introduced into thesystem as a prototype. Like the smart phones in thesensor kit, it runs Android as its operating system,making it easier to integrate.

Instead of connecting via WI-FI to the server, thesmart watch is paired with a specific smart phonevia Bluetooth, therefore receiving updates and sendingsensor updates only indirectly over the phone to theserver.

B. Data Transfer

As mentioned before, both smart phone and smartwatch run Android and have therefore access to thesame APIs. We use the Teleport API [18] whichencapsulates the GoogleApiClient with predefinedsetups to create the connection between the two de-vices. This creates a listener service on the smartphoneto which multiple clients—in out case a single smartwatch—subscribe. This makes for an easy event driven(e.g., start or end of a trip) data transfer.

IV. POTENTIAL APPLICATIONS

Throughout the course, we have seen several appli-cations using big data to develop intelligent transportsystems [5],[6],[7],[8],[9],[11]. This project also has apotential for a wide range of applications for electricbikes. In this section we outline some of the possibleapplications that can be developed using big data ana-lytics and WeBike data.

Fig. 7. Smart watches integrated into the architecture. Each watchis connected to a specific smart phone of an e-bike’s sensor kit viaBluetooth.[17]

A. Rider Classification

The data collected by the sensors of WeBike can beused for several types of classification.

• Using the trip logs and the timings of the trip,the riders can be classified into different cate-gories like communters, weekend bikers or both.Commuters would be people who ride their bikedaily for commuting to work. This would entailscheduled repetitive trips. Weekend bikers wouldmostly ride over weekends to go on biking trails.

• Further, using the data collected about speed,acceleration and battery, we can further classifyriders as aggresive or conservative. The aggresiveriders would be users who ride bikes with higherspeed and more acceleration. This classificationcan also help to establish a relation between riderbehavior and bike performance

• Such classification will help to develop leaderboard statistics. Each user would compete amongstother users in the same class. The leader boardswould cater to different classes differently. Thiswould help the user to gauge their performance incomparison to other users in the same class.

B. Route Planning

Since the sensors collect data about individuals, theGPS data can be used for route planning. Using the triplogs, we can determine those destinations that are oftenvisited by the person. Once this information is learned,we can suggest real time alerts to the user about theroutes. This information can also be used to suggest

Page 5: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

alternate routes by comparing the trip logs of differentusers with same/nearby destinations.

Fig. 8. For every selected day, riders can view a map of their tripson the web

C. Tracking Body Vitals

The data collected via smartwatch can help to mon-itor body vitals of the riders. This will help the ridersto keep track of their daily routine. This can also helpto give riders alert while they are riding. For exampleif a rider is riding aggressively for a substantial period,it would have affect on the rider’s heart rate and bloodpressure. The smartwatch can give an alert to the riderindicating the rider to slow down.

D. Relationship between Bike Performace and RiderBehaviour

Electric bikes use batteries. The performance of abattery is related to how the bike is used. The datacollected about the currents, voltage and battery chargecan be used to learn optimal rider behavior to maximizebattery performance. This can later be extended to useother features like tyre pressure and road conditions toanalyze the effect on battery.

All of the above applications will help to furtherprovide incentives to users to use e-bikes more often.The leader boards would provide inherent motivation touse e-bikes for daily commute. This will also promotehealthy living. Further, applications like route planningand alerts, tracking body vitals and bike performancestrives to develop an intelligent system for e-bikes,tailor made to suit the needs of users.

V. BAYESIAN MOMENT MATCHINGALGORITHM

All the data collected by various sensors in WeBikeproject is real time data. The need is, therefore, tohave algorithms that can classify the data and generate

meaningful desired information in real time. This is tosay that the project requires a scalable online algorithmfor learning and classification. In this section, we de-scribe a new online algorithm for clustering called theBayesian Moment Matching algorithm. We will explainthe algorithm in the following sections and show resultson simulated datasets. Subsequently, we will explaindifferent areas where we can use the algorithm for theWeBike project.

A. Introduction

In this section we describe a Bayesian momentmatching technique to estimate the parameters of theGaussian Mixture Model(GMM).The Gaussian MixtureModel (GMM) is one of the most popular frameworksfor soft clustering. Most of the existing algorithmsto estimate the parameters of the GMM assume abatch setting since they typically do multiple passesover the data. We develop an online algorithm thatincrementally estimates the parameters of the GMMmodel in a single pass. This is particularly useful in thestreaming setting where data is not stored due to spaceand/or privacy reasons. The algorithm takes a Bayesianapproach whereby the posterior over the parameters isupdated after each data point. Since the exact posterioris intractable, an approximate posterior is computedby matching the first and second order moments. Theapproach compares favorably to existing parameterestimation techniques both in terms of accuracy andtime on synthetic data.

The key challenge for parameter estimation in Gaus-sian Mixture Models is to update the underlying dis-tribution in real time as more data is received. If weuse exact Bayesian learning, the computation growsexponentially in the amount of data. In practice, variousapproximation algorithms have been proposed. Varia-tional Bayes [11], Expectation Maximization(EM)[13]and Gibbs sampling [12] stand out as the most popularclasses of algorithms. However, they lack robustnessin the sense that Variational Bayes and EM algorithmmay get stuck in local optima and Gibbs samplingis a stochastic procedure in which convergence isdifficult to assess. Additional techniques have beenproposed including Belief Propagation[15], ExpectationPropagation[14] and Spectral Learning[16].

We have developed an online Bayesian momentmatching technique for GMMs that approximates theposterior after each observation with a simpler distri-bution by matching a few moments of the posterior.The approach can process a dataset in one sweep with

Page 6: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

a constant amount of computation per data point.

B. Background

A Gaussian Mixture Model (GMM) is a parametricprobability density function represented as a weightedsum of Gaussian component densities. Given a set ofdata points S = {y1, y2, y3, ....yn} we want to estimatethe parameters of the gaussian mixutre model fromwhich the data has been derived.More formally,

Let Y be a random variable

P(Y = yi|Θ) =

M∑i=1

wi N1(yi|µi, σ2i )

Θ = {(w1, µ1, σ21), (w2, µ2, σ

22), ..., (wM , µM , σ

2M )}

We want to find the estimates

Θ = {(w1, µ1, σ21), (w2, µ2, σ22), ..., (wM , µM , σ2M )}

of Θ.We can calculate the number of components M by

using cross-validation.Θ can be estimated by computing the posterior

Pn(Θ) = Pr(Θ|y1:n) using bayesian learning. How-ever, the number of terms in the posterior grows expo-nentially as a new data point is observed. This makesexact bayesian learning intractable. In the next section,we describe an algorithm to circumvent this problem.

In Mathematics, a moment is, loosely speaking, aquantitative measure of the shape of a distribution. Letf(y;β) be a distribution over a m-dimensional randomvariable y = {y1, ..., ym} and µj(y) be a monomialof degree j of the form µj(y) =

∏i yni

i such that∑i ni = j. Then, the jth order moment Mµj

(f) isthe expectation of µj(y) with respect to f :

Mµj(f) =

∫yµj(y)f(y) dy.

For the Gaussian mixture model we only need to cal-culate the first two moments of a gaussian distribution.

C. Bayesian Moment Matching

In this section, we describe the moment matchingalgorithm in detail. For some distributions f , thereexists an alternative parametrization of the distributionbased on a set of sufficient moments. For example forthe normal distribution, it is easy to set up a linearsystem of equations using My and My2 to calculateµ(mean) and σ2(variance). Making this definition moreconcrete - for some distributions f there exists a set ofmonomials S(f) such that for ∀µ ∈ S(f), knowing

Mµ(f) allows us to calculate the parameters of f e.g.for the normal distribution S(f) = {y, y2}.

Bayesian moment matching algorithm approximatesthe posterior after each observation with fewer termsin order to prevent the number of terms to grow ex-ponentially. Algorithm 1 describes a generic procedureto approximate the posterior Pi after each observationwith a simpler distribution Qi by moment matching.More specifically, a set of moments sufficient to defineQi are matched to the moments of the exact posteriorPi. In Line 4, we calculate the exact posterior Pi(Θ) ∝Pr(yi|Θ)×Pi−1(Θ) based on the ith data point. Then,we compute some moments of Pi (Line 5). Specifically,we compute the moments S(f) that are sufficient todefine a distribution in the family f. For each µ ∈ S(f),we calculate the moments Mµ(Pi) of the posteriorexactly.

Next, we compute the parameters α and β based onthe set of sufficient moments (Line 6). This determinesa specific distribution Qi in the family f that we useto approximate Pi (Line 7). Note that the moments inthe sufficient set S(f) of the approximate posterior arethe same as those of the exact posterior. However, themoments outside of the sufficient set are not necessarilythe same (i.e., ∀µ′

/∈ S(f),Mµ′(Qi) may differ fromMµ′(Pi).

Algorithm 1 Bayesian Moment Matching1. Let f(Θ|α, β) be a family of distribution withparameters α and β2. Initialize the prior P0(Θ)for i = 1 to N do

Evaluate Pi(Θ) from Pi−1(Θ)∀µ ∈S(f), calculate Eµ[Pi(Θ)]Compute α and β using Eµ[Pi(Θ)]Approximate Pi(Θ) using Qi(Θ) = f(Θ|α, β)

end for

D. Results on Synthetic Data

In this section, we compare the results of estimatingthe parameters of the underlying distribution fromwhich the data has been derived for the BayesianMoment Matching (BMM) algorithm and ExpectationMaximization (EM) algorithm. We will compare theresults of BMM against both an Offline and Onlineversion of EM.We will test all the algorithms on twokinds of datasets — separated and overlap. In separatedataset the components of GMM are well separatedwhereas in overlapping dataset the components of the

Page 7: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

GMM are overlapping thus making density estimationharder. However, before we present the results, weillustrate Algorithm 1 explained in the previous sectionwith a simple example here. Let us assume that theonly parameter we want to estimate is µ of the GMM.

Let f(Θ|α, β) = N(µ|µ, σ2)Prior : P0(µ) = N(µ|µ0, σ20)

Likelihood : P(y1|µ) =1

4N(y1|α1µ, σ

21)

+3

4N(y1|α2µ, σ

22)

Posterior : P1(µ|y1) = c1N(µ|µa, σ2a)+ c2N(µ|µb, σ2b )

⇒ S(f) = {µ, µ2}⇒M1 = Eµ[P1(µ|y1)] = c1µa + c2µb

⇒M2 = Eµ2 [P1(µ|y1)] = c1(µ2a + σ2a) + c2(µ

2b + σ2b )

⇒ µ1 = M1 and σ21 = M2 −M21

⇒ P1(µ) ' N(µ|µ1, σ21)

In the next section we present some results of learningthe parameters of the gaussian mixture model on syn-thetic datasets. We look at the mean squared error andthe likelihood over a test data to see performance.

1) Comparison with Offline EM: Figure 9 showsthat the mean squared error converges to 0 for Bayesianmoment matching algorithm. This figure illustrates thepoint that given enough data the bayesian momentmatching algorithm will return the true parameters ofthe gaussian mixture model from which the data hasbeen derived.

Fig. 9. Mean squared error for the estimates

Figure 10 and Figure 11 show that the bayesianmoment matching algorithm outperforms the EM algo-rithm over two different synthetic datasets. The compar-ison is made on the ability of the algorithms to return

Fig. 10. Comparison of EM and Bayesian Moment Matching

Fig. 11. Comparison of EM with Bayesian Moment Matching

the true parameters of the underlying distributions. Al-though, both the Offline EM and the Bayesian MomentMatching algorithm converge to the correct estimates,the two figures above show that the Bayesian MomentMatching algorithm converges faster with more robustconfidence bounds than Offline EM. Another advantageof Bayesian Moment Matching algorithm is that whileEM algorithm returns a point estimate, the BMMalgorithm returns a distribution over the parameters thatare being learned. This associates a model over theparameters and gives a sense of confidence over theparameters learned by the algorithm.

2) Comparison with Online EM: In the previoussections we highlighted the need for an online learningalgorithm for classification. We developed an onlinealgorithm based on bayesian moment matching. In thissection, we compare the results of the Bayesian Mo-ment Matching algorithm with another online algorithm- Online Expectation Maximization. We compare thetwo methods on their ability to explain the data theyare modelling.

Page 8: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

Fig. 12. Comparison of Online EM and Bayesian MomentMatching on an overlapping dataset. An overlapping dataset isderived from a gaussian mixture model whose components areoverlapping with each other. Such a data helps to give a measureif an algorithm can detect clusters even if they are overlapping.

Fig. 13. Comparison of Online EM and Bayesian Moment Match-ing on an overlapping dataset. A seperate dataset is derived from agaussian mixture model whose components are well separated fromeach other

From 12 and Figure 13, we see that the BayesianMoment Matching algorithm not only is capable ofbetter explaining the data - owing to it’s larger like-lihood values but it converges faster than the OnlineExpectation Maximization algorithm.

VI. LEADER BOARDS

We decided in the scope of this project to startwith the implementation of leader boards as a formof gamification based on rider classification. In thefollowing we will explain why we made that choice,why we think the classification plays an important partand detail how we implemented this feature.

A. Choice of Feature

As stated in the Introduction, the purpose of thisproject is to extend the WeBike project to study be-havioural changes through non-monetary incentives.Leader boards work on two axes: Gamification throughcompetition with other riders and simultaneously agree-ing with social norm, since the performance of otherriders is always visible.

Secondly, it is not yet clear how well the studyparticipants will adopt an additional gadget for them tocarry around. Therefore we found it prudent to choosea feature where they could choose when to use insteadof being prompted by the device (i.e. route planningalerts).

Lastly, the current sensors aren’t very reliable mostof the time, especially the GPS sensor of the smartphone and the heart rate sensor of the smart watch.While the the GPS data could be improved by utilizingthe redundancy of the smart watch GPS sensor, wewere not able to improve upon heart rate sensor data.Therefore tracking of body vitals proofed to not befeasible.

That means choosing a relatively simple feature suchas leader boards to introduce the new device into thestudy appears to be the best course of action.

B. Value of Classification

For the social engineering concepts of gamificationand social norm to work, riders must be able torealistically relate and be able to compete with otherriders. It follows that riders have to be grouped intosubsets so they see there actions matter instead offeeling meaningless because they are stuck on rank10000 and after. This might not be of great importancefor a small participant group as in our study, but be-comes increasingly relevant when this implementationbecomes publicly available for everyone to sign up. Inorder to keep people invested these subsets have toconsist of riders with similar characteristics in orderto be comparable. Therefore the rider classification isa valuable tool to keep participants engaged.

C. Implementation

There are several points we needed to consider:choosing a metric for the point reward for riders,dealing with changing rider behaviour, keeping over-or underachievers invested, including late comers, con-sidering seasonal differences in behaviour.

For the first run, we simply chose distance traveledas our metric for rewarding points. As stated in sectionV, we couldn’t test the implementation with real datain the end, so have no result in the validity of this naiveapproach. Nevertheless, should this become a concernfor a working implementation, the metric can be ex-changed and the points recalculated as a backgroundprocess without outage of the service provided to theparticipants.

We propose to introduce seasons of three monthlength and differing leagues into the leader board

Page 9: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

implementation. This solves several problems at once.Every Rider starts with a clean slate into a season,so newly signed up riders can compete with longtimeparticipants. At the end of the season the best ridersget promoted to a higher league and the last in rankget relegated to a lower one, therefore ensuring com-petitors on the same level and continuous engagementof the participants. Additionally, before the end of theseason, the riders can be classified anew, based ontheir behaviour during that season, therefore ensuringriders with similar characteristics in the same leaderboard. Lastly, riders could choose to ”hibernate” duringseasons (i.e., winter) when they won’t use their bike.

Fig. 14. Leader board view with simulated data (shown in anandroid wear emulator). The list is scrollable by swiping up ordown, shows the rank, name of the participant, their points andrecent development in the list.

VII. LESSONS LEARNED

This project was a first for us in many differentaspects. Neither of us was familiar with the WeBikeproject and specifically its data. We haven’t done dataanalysis on real world (noisy) sensor data before. Be-sides, developing a distributed Android app, specificallywith a wearable device in the loop has been a newchallenge.

We learned that there is a huge difference betweenthe conception of a big data software project (albeit inour case the project is a big data prototype project with100-200 GB of data). A trade-off between upfront plan-ning in order to decrease the cost of doing ineffectivedata analysis calculations and rapid iteration in order torespond to unforeseen challenges (e.g., too noisy data,breaking changes in an Android API update, restrictedaccess to servers) has to be made.

The biggest takeaway for us is how important it is todevelop a intimate understanding of the underlying dataso we are able to judge the sensibility of an analysis.If the amount of data is big enough, nearly anythingcan be deduced from it. Therefore it is paramount tocritically question findings and always check for actualsignificance.

VIII. CONCLUSION

We have created a proof of concept for a non-monetary incentive application running an a wearabledevice. It is built into a distributed system architec-ture consisting of a server, multiple smart phones andone smart watch as a prototype based on an existingstructure. The communication between the various partsof the system is carried via WI-FI and Bluetooth.We detailed all parts of our system. We chose aseasonal leader board implementation as the incentiveon the platform and explained our reasoning behindthe choice. However, the level of (GPS) sensor failuresmade it impossible to build clusters and classify rid-ers without severely altering the clustering algorithm,which was beyond the scope of this project. Thereforewe could only test the application in simulations andnot in a real-life environment.

IX. FUTURE WORK

A. Classification

In V-D, we demonstrated empirically that thebayesian moment matching algorithm performs rea-sonably well. We also highlighted the need of anonline algorithm for learning and classification for theWeBike project. However, the data collected by thesensors is very noisy. Often, the data contains emptypoints. This makes the application of the bayesianmoment matching algorithm on the data troublesome.In the future, we would require the model for bayesianmoment matching to include a noise model to accountfor noisy data. Further, techniques need to be developedto tackle noisy data in real time so that the algorithmreturns meaningful results.

B. Smart watch app

Since we could not classify the participants in thescope of this project, the app can only be seen as a proofof concept as we only tried it with simulated data. In thefuture, we want to develop the app further to achieve areal roll-out to a greater number of study participantswhen more smart watches will be included in theWeBike project. After this roll-out the actual study of

Page 10: Tailoring A Personal E-Bike Incentive Platform Using ...blizzard.cs.uwaterloo.ca/iss4e/wp-content/uploads/... · Tailoring A Personal E-Bike Incentive Platform Using Machine Learning

changing rider behaviour can be started. Before thathowever, the current choice for the smart watch typehast to be reviewed and changed if a more suitableproduct with better sensor quality is available.

C. Further features

As we concentrated of one possible use case, otherfeatures detailed in section IV could be added step bystep to offer participants a richer experience and makethe platform more versatile.

X. ACKNOWLEDGEMENT

We would like to express our gratitude to Prof.Paulo Alencar for guiding us throughout the term withthis project. We further acknowledge the guidance pro-vided by Prof. S. Keshav, ISS4E Lab and Prof. PascalPoupart, Artificial Intelligence and Computational andHealth Informatics Lab. Further, we thank Prof. S.Keshav for providing us with access to the WeBikeproject data. We would also like to thank the ISS4Egroup members. Lastly, we thank all the people whohave helped us during the course of this project.

REFERENCES

[1] United Nations. Adoption of the Paris Agreement. 2015United Nations Climate Change Conference, 2015.

[2] IPCC, 2013: Summary for Policymakers. In: Climate Change2013: The Physical Science Basis.Contribution of WorkingGroup I to the Fifth Assessment Report of the Intergovern-mental Panel on Climate Change [Stocker, T.F., D. Qin, G.-K.Plattner, M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y.Xia, V. Bex and P.M. Midgley (eds.)]. Cambridge UniversityPress, Cambridge, United Kingdom and New York, NY, USA

[3] Tommy Carpenter, Measuring & Mitigating Electric VehicleAdoption Barriers, Doctoral thesis University of Waterloo,2015

[4] WeBike web page, http://blizzard.cs.uwaterloo.ca/iss4e/?page_id=3661, 2015

[5] R. Li, A. Kido, S. Wang, Evaluation index development forintelligent transportation system in smart community based onbig data, Advances in Mechanical Engineering 7 (2) (2015)541651.

[6] C. D. Cottrill, S. Derrible, Leveraging big data for thedevelopment of transport sustain-ability indicators, Journal ofUrban Technology 22 (1) (2015) 4564.

[7] . Fiosina, M. Fiosins, J. M uller, Big data processing and min-ing for the future ict-based smart transportation managementsystem.

[8] C. E. Otero, M. Rossi, A. Peter, R. Haber, Determininghuman-perceived level of safety in transportation systemsusing big data analytics, in: Proceedings on the InternationalCon- ference on Internet Computing (ICOMP), The Steer-ing Committee of The World Congress in Computer Sci-ence, Computer Engineering and Applied Computing (World-Comp), 2014, p. 1

[9] X. Xu, W. Dou, An assistant decision-supporting methodfor urban transportation plan- ning over big traffic data, in:Human Centered Computing, Springer, 2015, pp. 251264.

[10] K. F. Yan, Using crowdsourcing to establish the big data ofthe intelligent transportation system, in: Advanced MaterialsResearch, Vol. 791, Trans Tech Publ, 2013, pp. 21182121

[11] David M Blei, Andrew Y Ng, and Michael I Jordan. Latentdirichlet allocation. Journal of Machine Learning, 3:9931022,2003.

[12] Thomas L Griffiths and Mark Steyvers. Finding scientifictopics. Proceedings of the National academy of Sciences ofthe United States of America, 101(Suppl 1):52285235, 2004.

[13] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Max-imum likelihood from incomplete data via the EM algorithm.J. Roy. Statist. Soc. B, 39(1):138

[14] Thomas Minka and John Lafferty. Expectation-propagationfor the generative aspect model. In Proceedings of the Eigh-teenth conference on Un-certainty in artificial intelligence,pages 352359. Morgan Kaufmann Publishers Inc., 2002.

[15] Zeng, William K Cheung, and Jiming Liu. Learning topicmodels by belief propagation. Pat- tern Analysis and MachineIntelligence, IEEE Transactions on, 35(5):11211134, 2013.

[16] Anima Anandkumar, Dean P Foster, Daniel Hsu, ShamKakade, and Yi-Kai Liu. A spectral algorithm for latentdirichlet allocation. In NIPS, pages 926934, 2012.

[17] Parts of this illustration are licensed under CreativeCommons Attribution 3.0 http://creativecommons.org/licenses/by/3.0/us/: Sherrinford,Smartwatch, https://thenounproject.com/term/smartwatch/162395/

[18] Mariuxtheone, Teleport, GitHub, https://github.com/Mariuxtheone/Teleport, 2015