Semantic Aggregation of Memorable Activities exam/PR13Ranv… · Semantic Aggregation of Memorable Activities Jean-Eudes Ranvier I&C, EPFL Abstract—Human memory is prone to forgetting

1

Semantic Aggregation of Memorable ActivitiesJean-Eudes Ranvier

I&C, EPFL

Abstract—Human memory is prone to forgetting and theavalanche of user-centric information gathered by nowadaysdevices can be used to keep track of memorable events. Wepropose to combine virtual and physical sensors from mobiledevices to infer digital memories of user activities. In thisapproach, sensors data are processed through a space and energyefficient algorithm to recognize basic activities. We then usesemantic reasoning to aggregate these activities into the digitalequivalent of a human episodic memory which can, in turn, befurther aggregated.

Index Terms—Activity recognition, data fusion, semantic ag-gregation, episodic memory

I. INTRODUCTION

THE evolution of mobile devices in the last decade led toan increasing quantity of information produced every day

by end users. This information is stored on the local deviceor on remote platforms which are independent and opaque toeach other, therefore preventing the querying or the long-termarchival of the different pieces of data.

On the other hand, neuroscientists have been recently study-ing the episodic memory. Introduced in 1972 by Tulving, thismemory system enables human beings to remember past ex-periences. Conway, in [4], defines the organisation of episodicmemories and how they can be accessed.

Proposal submitted to committee: August 21th, 2013; Can-didacy exam date: August 28th, 2013; Candidacy exam com-mittee: Prof. Boi Faltings, Prof. Karl Aberer, Dr Pearl Pu.

This research plan has been approved:

Date: ————————————

Doctoral candidate: ————————————(name and signature)

Thesis director: ————————————(name and signature)

Thesis co-director: ————————————(if applicable) (name and signature)

Doct. prog. director:————————————(B. Falsafi) (signature)

EDIC-ru/05.05.2009

Based on these two observations, we want to study therelation between the memories that can be generated fromboth virtual and physical sensors of a mobile device and thememories that the user retains. This study leads to severalquestions such as: How to generate activities from raw sensordata? How to semantically tag these activities and model theway they intertwine?

Related workGathering and indexing all personal information on a singlesystem is a problem that has already been covered by severalprojects. In 1945, Vannevar Bush in his article ”as we maythink” [6] had already envisioned the Memex: A hypotheticalsystem which would allow the user to compress and store allher books, records, communication. More recently, Haystack[8] allow for a semistructured data model by which it allowsthe user to use a standard RDF representation in which shecan create her own properties. In reality mining [9], Eagleand Pentland propose to sense and analyze complex socialbehaviours based on mobile phones log data. However, noapproached proposed by the research community allowsto combine virtual and physical sensors data into digitalmemories of human activities and contexts.

II. EPISODIC MEMORY MODEL

Defined by Turving in 1942, the episodic memory representsthe system responsible for the storage and recollection of pastexperiences (e.g. I went to the supermarket last saturday). Thissystem is to be put in parallel with the semantic memory,responsible for the storage of structured record of facts (e.g.The LSIR offices are located in the BC building of EPFL).In [4], Conway theorizes the way the episodic memories arestored and retrieved and introduces the idea that episodicmemories are stored in memory in a way that preserves thetemporal dimension. Conway presents a decomposition modelprovided in figure 1.

The atomic elements are the Episodic Elements (EE). Theyrepresent specific details of an experience. These EE can becomposed with a contextual frame to form a Simple EpisodicMemory (SEM) (EE without frames are called free radicalsin memory as for example the memory for jokes). The SEMrepresent the memory at fine grain level. SEM can be accessedthrough their frame (That is what happens when one tries toremember a particular event) or directly through an EE (in caseof incidental access). In turn, SEM can be aggregated withina more general contextual frame forming a more ComplexEpisodic Memory (CEM) which is a more coarse grained viewof the memory. Conway continues the aggregation processwhile describing the autobiographical memory model in which

2

Fig. 1. Episodic memory model

CEM are seen as part-of more general events, themselves part-of lifetime periods which are part-of the conceptual self.

III. FROM EPISODIC MEMORY TO DIGITAL MEMORIES

A. Physiological and activity context recognition in wearablecomputing

In [1], Krause et al. want to capture the state of a user usinga wearable device. They propose a non-intrusive, unsupervisedand online system which, based on sensors outputs, identifiesthe activity context of the user. The device used in this paper isan armband, worn at all time by the user and which possessesdifferent sensors:

• 2D-accelerometer detects arm and body movements• galvanic skin response measures the skin conductivity

to assess the level of sweat of the user mixed with heremotional stimuli

• heat flux detector detects the amount of heat the bodydissipates

• skin thermometer measures internal temperature• near-body ambient thermometer detects changes in the

user’s close environment temperature• Heart beat detector measure the pulse of the user.MethodThe sensors’ output is processed using a machine learning

algorithm. The authors first develop an offline algorithm beforetransforming it into an online algorithm. The offline algorithmis composed of 4 phases: The data gathered by the sensors isfirst preprocessed. The accelerometer with a high samplingrate is processed in the frequency domain by applying a FastFourier Transform while the rest of the sensors are gathered ata lower rate. The data is normalized to have unit coordinate-wise variance and a Principle Component Analysis (PCA) isperformed. This technique consists in projecting the data in adifferent base and selecting only the subset of the componentswhich accounts the most for its variance (i.e. select onlythe components which characterize the data the best). Thistechnique has the advantage of 1. reducing the dimensionalityof the data, easing its processing and 2. eliminating noise(provided that this noise does not perturb too much thevariance of the dataset). Using PCA is especially useful inthis case, since the authors states that heart rate and energyrelease possess a linear relationship. The results show that thefive first components resulting from the PCA are enough todescribe the original dataset accurately enough. Therefore onlythese five components are considered.

Once, the data is preprocessed, it is fed into a Kohonen map(Self organizing map), which pushes further the dimensionalityreduction by processing the 5-dimensional input into a 2-dimensional self organizing map of 20x23 hexagonal cells.

An additional clustering is performed on the vectors code-book output by the Kohonen map in order to achieve repre-sentative cluster sizes. A K-mean algorithm is used with Kchosen as the square root of the number of vectors in thecodebook as prescribed by standard heuristic in absence offurther information. The Davies-Bouldin index (DBI) in 1 isa metric of quality of the clustering and is used to refine thenumber of clusters. The DBI between two clusters can be seenas the ration between the sum of the dispersion of both clustersand the distance between those clusters.

DBIi,j =Dispi +DispjDistancei,j

(1)

The last step of the algorithm removes transient states outof the data. Indeed, the author claim that the preprocessingof the data, namely the PCA technique, smooths the contextchanges, resulting in extra clusters when applying the K-mean. By definition, these transient states are between twocontext states and are very brief. In other words, transientstates have a very low probability of looping on themselves.To this purpose, the authors define a Markov model built onthe cluster transition probabilities. Clusters with a loopingprobability (i.e. the probability to go from this cluster to itself)below a certain threshold are removed and their ”probabilityshare” is distributed among its predecessor using formula 2.

P (X|Y ) =P (X|Y ) + P (X|R) ∗ P (R|Y )∑

P (R|i)(2)

X and Y are clusters with a looping probability above thethreshold and R is cluster with a looping probability below thethreshold. P (R|Y ) is the conditional probability of transferfrom Y to X . It represents the probability to be in clusterX at time t when having been in cluster Y at time t-1. Someadditional iterations of K-mean are finally necessary to takeinto account the vector-points, which were associated with theremoved clusters and consequently a new markov model needsto be built.

This technique of transient states detection is relevant to usto detect transient states while changing location. Althoughthe time a user spend at a location cannot decide if anevent can be memorable or not (i.e. a short event can bemore memorable than a long one), it can decide if the eventcannot be memorable. Indeed, if we exclude free radicals(from the episodic memory model), events requires a frameto be transformed into a memory and this frame is stronglydependant on the location. Therefore, we need to rule outtransient states resulting from location transition.

This Algorithm however, necessitates all the inputs in ad-vance in order to perform each of the steps of the system.This is clearly not suitable for implementing in a wearabledevice which should be able to analyse data online. Theauthors tackles this problem by introducing buffers. While thistechnique, which is also used in [2], is not per see an onlinealgorithm, the delay induced by the buffer period is considerednegligible.

The online version of the algorithm goes as follow: Theauthors consider the principal component direction used inthe preprocessing phase to be a property of the sensors and

3

Fig. 2. Online algorithm

their orientation. Therefore, a bootstrap phase can be usedto determine directions of the principal components. Thesedirections are then use to project the new data. However arecalibration may be necessary as soon as the componentsdo not account for enough variance in the sensor input. asdescribe in Figure 2, the Kohonen map is trained with thedataset which has been collected during the last buffer periodalong with a sample of the previous buffers. These samplecorrespond to a portion of the previous buffers, decaying as theage of the buffer increases. The data points in old buffers aretaken at equally spaced intervals. We can note that the authorsinspired themselves from the human memory regarding thebuffer technique and the decaying sampling. Indeed, humansprocess data gathered during the day at night time (buffer)and have a forgetting memory which aims at keeping track ofimportant memories while deleting unimportant ones.

Although this behaviour is important, in the papers’ contextto allow the algorithm to be one, it is not necessarily truein our case. Indeed, although this forgetting strategy allowto keep the control on the memory (hardware) usage, italso removes information which may be important in thefuture. Typically, if the user is explicitly asked a piece ofinformation, this piece should not be forgotten under thepretext that it is too old, otherwise this piece of informationcould be asked again in the future making the user experiencecumbersome. Instead, as described in section III-C, we preferto optimize the memory by compressing the data until itreaches an atomic size, we then keep this atomic elementwithout degrading it any further.

ExperimentsThe authors evaluate their system with two experiments:

The first experiment aims at evaluating the quality of theclassification of the system. To this purpose, two users, Aand B, wore the armband system for respectively 4 and 22days (74 hours and 168 hours). The armband gathered dataat a relatively low sample rate (one sample per minute). Theusers were asked to manually classify their daily activities in

order to provide a ground truth against which the system couldcompare. The metrics of evaluation are the quantity of contextsbelonging to the same category according to the user, butbelonging to two different clusters according to the system aswell as the difference between the number of clusters providedby the user and provided by the system. The authors do notpresent a quantitative study of the accuracy of the clusters,but rather display a timeline of the context annotated by theuser superposed to the clusters detected by the system. Thisvisualization allow us to assess the presence of false detectionsof change of context by the system (e.g. for working context)while other contexts are accurately detected.

We can also notice the difference between the time boundaryof context detected by the system versus the ones annotatedby the user. The authors are also interested in comparing thenumber of contexts defined by the user and the number ofclusters provided by the system. Although these numbers arerarely equal, we can note that the number of clusters detectedalways differ by up-to-one from the real number of contextsdetermined by the users. Finally, the authors tested the onlineand offline version of the algorithm and concluded that theyperform similarly but without giving any further information.

This study was also an occasion to test the fitness of thedevice and the users reported it to be unobtrusive. However,having to wear an extra device continuously might be cumber-some. Although the device presented in [1] is only a prototype,the armband is quite noticeable and it might be unsuitable towear it on certain occasion. In addition to that, smart-phonesalready provide a cheap and evermore complete suit of virtualand physical sensors which is adequate for our purpose.The second study is designed to investigate the performanceof different aspects of the system, such as the sample rate ofthe sensors, influence of the buffer size, sensor quality andcomputation and memory efficiency. Increasing the samplingrate of the sensors by a factor of 10 allows for a more detailedclustering due to relatively less importance given to transientstates. The impact of the decay of buffers was also evaluatedby inputting a day of captured data for three times successivelyand then use a buffer of 4 hours plus 4 additional hours forthe decaying buffers.

Although this experiment does not investigate the differentbuffer sizes, the stability of the results shows that the fractionsof the past buffers ensure the remembrance of past context.To further justify the use of buffers in the online algorithm,the authors compared the overall quantization error of theKohonen map when using the buffer and using an offline mapon the full day sample. The results shows an increase of 10%on the quantization error in the case of the online algorithmwhile the buffer size was 12% smaller than the dataset used inthe offline algorithm. However, these results are to be put inperspective given the constraints of this experiment (i.e. onlyone day of data, repetition of the data in the buffer case).

The authors also focus on the pertinence of the sensors se-lection by performing a cross-validation. They train a Kohonenmap with a 20 hours sample of all the sensors and compare itwith maps trained with one of the sensors deactivated showingthat each sensors have their utility in characterizing differentcontexts.

4

TABLE IPARAMETERS OF THE ESOINN ALGORITHM

Parameter influenceλ number of signals to process before removing noise

agemax maximum age of an edge before being deletedc1 & c2 coefficients used in the noise removal process

Finally the authors devise the computational performanceof their system and the training time of 68 seconds for 4800data points (equivalent of 80 hours of data with a samplingrate of 1 point per minute) remains acceptable. Deleting datapoints as soon as the map is trained (except the selectedsample for the decaying buffers) helps also to improve thememory consumption. This idea is relevant to our subject andwe will see in the following paper (ESOINN) how data canbe discarded.

B. ESOINN

For the purpose of our project, we need to determine thetype of location at which the user is. Since the GPS coordinatesare given as real values, we need to cluster them for tworeason: 1. define when the user is within an area consideredas a unique location and 2. limit the computational cost ofdetermining the type of the location by avoiding categorizingGPS coordinates belonging to an already classified location. Tothis purpose, we need to categorize (cluster) GPS coordinatesbelonging to the same location. The GPS data being collectedcontinuously, an online algorithm is required. It also need to beunsupervised in order to limit the user burden. Finally, giventhe unpredictability of users, we cannot decide in advance onthe number of places that she will visit. Therefore the numberof clusters must be dynamically set.

ESOINN, the online, unsupervised learning algorithm de-fined in [2] by Furao et al. fit our need and has been selected.In the papers the authors describe an ”enhanced self-organizingincremental neural network” (ESOINN) which they present asa clustering algorithm with the following properties:

• Online Does not require all the data before startingclassification

• Incremental Does not have a specific training phase andcan train with new data as they arrive

• Unsupervised Does not require training data which hasalready been labelled. It allows clustering signals withoutany ground truth. The counterpart of this property isthat it is hard to determine what each cluster representswithout manually tagging the cluster. We will see in thelast part of this paper how we plan to tackle this problem.

• Unknown number of clusters Some clustering algo-rithms require the number of clusters that need to befound as a parameter. ESOINN allow dynamic detectionof this number. The algorithm is only configured by theparameters defined in III-B.

The algorithm is seen as an Enhanced version of SOINN,which is the original clustering algorithm defined by theauthors but which presents worst performances (as shown inthe experiment section) and a more complicated structure.The final goal of the paper is to show the difference in terms

Fig. 3. ESOINN algorithm’s flow

of architecture and performance between the two algorithms.

AlgorithmThe general flow of the algorithm is described in Figure

3. The algorithm transforms input signals into networks ofnodes based on the density distribution of the input data. Eachnode represents the aggregation of one or more signals andis characterized by a weight representing the density at thenode’s location and a feature vector in the same feature spacethan the signals (i.e. signals and nodes can be compared interm of distance). Nodes can be connected by edges, whichare removed if their age exceeds a certain threshold.

For each new input signal, the algorithm updates the net-work of node. Then on a regular basis (every λ signals) acleaning task is performed to remove unnecessary edges andnodes and to distinguish overlapping clusters.

The algorithm requires also a notion of distance betweenpoints (input signal or node) in the features space. At thetime of writing the euclidean distance is implemented. It alsorequires a notion of closeness which is handled using thedensity of each node. However, the notion of density at a nodeis not trivially taken as the number of signal which found thisnode as being the closest. The density of a node is insteaddefined as a function of the mean distance from this node toits neighbours.

d̄i =1

m

m∑j=1

dist(Wi −Wj) (3)

From equation 3 we see that the denser the underlyingsignals are, the smaller the d̄i. The authors define then aquantity named point which is a function of d̄i if the nodeis one of the 2 nearest neighbours of the new input signal and

5

0 otherwise.

pi =

{ 1(1+d̄i)2

if i ∈ 2 nearest neighbours

0 otherwise.(4)

Finally, the density of a node i is defined as

hi =1

N

n∑j=1

pi (5)

, where N corresponds to the number of period λ when piwas not null and n represents the total number of inputs.

This atypical definition of the density as the advantage tokeep nodes in area not often activated to prevent their densityvalue from decreasing.

When training on a new signal, ESOINN starts by findingthe nearest node (winner) and the second nearest node (secondwinner) from the signal.

1. If the signal is further from the nodes than their similaritythreshold value (defined as the maximum distance between thenode and its neighbours) a new node is created. This is definedas a between class insertion and it stops the current iteration,continuing to the next input signal to process.

2. Otherwise, all the edges of the winner are increased by1. Then the algorithm check if an edge needs to be create.using the following rules:

• If one of the two winners is not attributed to a clusteryet, add an edge between them.

• If the two winners belong to the same subclass, add anedge between them.

• If the two winners are from different subclass but theminimum of their density are above a threshold depend-ing on the average density of the neighbourhood, add anedge between them and merge the two subclasses.

If an edge is created between two nodes already connected,the age of this edge is set to 0. The new signal being similarto the winner node, it is assimilated by this node contributingto the winner node’s feature vector and updating its densityaccording to the formula 5.

Every λ signals, the algorithm check for composite classes(with a density distribution containing more than one localmaximum). To handle this problem, the algorithm first detectsthe apexes (node with a local maximum density) and attributesa different subclass label to each of them. Label is spread toall nodes connected to the apex. In the case some nodes are inan overlapping area between two apexes, the algorithm checkif the minimum of the density of the overlapping nodes areabove a fraction of the apexes densities. If it is the case, thetwo overlapping subclasses are merges. If not, the nodes aredisconnected therefore creating two classes.

Finally nodes resulting from noise are eliminated from themodel. A node is considered as noise if it lies in a low densityarea and is weakly connected to other nodes (2 or less edgesfrom/to this node). The fewer neighbours a node has, thehigher its density must be in order to be kept in the network.

A direct consequence is that weakly connected clusterswill slowly vanish (e.g. in figure 4 ). While this property isinteresting in some cases to introduce a forgetting mechanism,it is an important shortcoming in our case. Indeed, as described

Fig. 4. Cluster disappearance in 4 λ

in the previous section, although forgetting is a property of thebrain, it is not suitable in our case to forget. Information whichmay have required some user input can be deleted and, at alater time, the user would be prompted again for the sameinformation because the cluster would have reappeared.

Another problem regards the distance measure used inESOINN. While euclidian distance is suitable for GPScoordinates, it is not possible with this distance to take intoaccount categorical data in the model. At the time of writingno practical solution has been found.

ExperimentsThe authors experiment on their algorithm using four dif-

ferent datasets: Two artificial datasets to test the soundnessof the approach and two real world datasets to assess howwell the algorithm performs with real data. The first artificialone is composed on two-dimensional vectors representingartificial clusters as depicted in Figure 5.a. The authors traintwo models. The first one is train in a so-called stationaryenvironment; The input signals are randomly selected fromthe entire dataset.The visual result is displayed in Figure 5.b.The algorithm detect five clusters and is able to distinguishthe two left-most overlapping clusters.

The authors also trained a model by inputting the signalssequentially from one cluster after the other. This trainingstrategy simulates an incremental learning where the sensorswould stay in the area of a cluster for some times, thenmove to another cluster. The visual result is displayed infigure 5.c, we notice that the networks are denser than in thestationary case. This is due to the decaying of the networkwhich is reduced in the non stationary case. Indeed, the λperiod is long enough to allow the nodes to connect betweeneach other, preventing them from being deleted. For thesetwo trainings, the period between two noise removal is of100 signals (λ = 100), the maximum age of edges is also100 (agemax = 100), and c1 = 0.001, c2 = 1. A secondexperiment on artificial dataset shows that while the originalSOINN does not detect three overlapping Gaussian clusters,ESOINN manages to distinguish the three clusters.

The authors then experiment their algorithm using two realworld datasets: The first one, a subset dataset from AT&T

6

Fig. 5. a. Artificial dataset b. Stationary training c. Non-stationary training

contains 10 classes (e.g. faces of 10 different people) and eachclass contains 10 samples. After resizing the 100 images to23x28 pixels (to have a reasonable size for the feature vector),samples are picked randomly for 1000 times per class to trainthe model. To compare the stability of ESOINN with regard toSOINN, the authors performed 1000 trainings and reported thefrequency at which the right number of classes was detected.ESOINN outperform SOINN once again. However no inter-cluster agreement has been experimented and we thereforedon’t know the number of misclassified samples.

The last experiment was conducted on the optdigits datasetwhich contains 10 classes representing handwritten digits (onthe form of a 8x8 pixel map). While SOINN detect a numberof classes between 6 to 13 (i.e. for some training, 2 differentdigits can be detected as being the same and two identicaldigits can be seen as two different ones ), ESOINN detectsbetween 10 to 13 classes (i.e. two identical digits can be seenas two different ones). Once more, the authors do not showthe number of miss-classification, it is therefore not possibleto assess if two different digits can be detected as being thesame.

To conclude, ESOINN is an online, incremental and unsu-pervised clustering algorithm without the need to have an apriori number a cluster. It displays good clustering quality andfits for processing part of the row sensor data.

C. Semantic approach for activity recognition

The two previous sections were essentially focused on theactivity recognition and information processing part of ourproject. However one interesting part, beyond the activityrecognition concerns the building of the memories from therecognized basic activities. Building these memories requiresknowledge about the relations between activities, as well theirrelation with the contextual frame (i.e. where the activityoccurred, with whom, etc). To this purpose, we add semanticsto the recognized activities in order to use general knowledgeto aggregate them in more coarse grain memories.

in [3], Chen et al. devise a knowledge-driven approachto recognize activities. Although the domain of application,Smart Homes, is different than ours, the ideas exposed inthe paper are relevant to us. The authors investigates howdomain knowledge and semantic reasoning can be used torecognize activities of daily living (ADL) with multiple levelsof granularity. They develop a modelling of the context ofan activity along with the activity itself in ontologies andthen propose a system architecture to recognize and adapt theactivity definition to the user.

Fig. 6. Conceptual activity model

Activities are strongly related to their context as an episodicelement is associated to a contextual frame. The authors definethe different elements included in a context:

• Temporal: time of the day and duration• Spatial: location and surrounding entities• Environmental: Temperature, humidity, general weather• Events: background activities, state changes of appli-

ancesThis context is explicitly included in a sensor model whichallows to encode domain knowledge directly into the sensor.Therefore, the authors define context ontologies based onclasses and properties in order to define every contextualentities and their interrelationships.

For practical reasons, the authors make equivalent a sensoractivation and a user-object interaction. In order to detectmore high-level activities, one has to aggregate several sensoractivations. The context associated with an activity is taken asthe fusion of the contexts of all the sensors involved in thisactivity.

The authors state also that the way of performing an activityis not unique. The order in which the actions are performed,or the list of items used for this activity can be modified (e.g.to make pasta, one could start the oven before putting thesaucepan with water while someone else could do the opposite.One might add some salt while another person would not).Therefore, ADL should be able to be defined generally for allusers and then to be specialized to reflect a specific user habit.To this purpose, the authors define a conceptual activity modelpresented in figure 6.

This model defines an activity by its name and descriptionand attached to it a certain number of properties:

• Context (in green)• Causal and functional relations (in blue)• Relations with other activities (in yellow)

From this model, the authors are generating an ontology. Thisontology makes the link between activities and their context.

Description logic (DL) is then used to reason againstthe context and activity ontologies. The authors define analgorithm to recognize activities or, in case this activity is

7

not over yet, to predict the potential activities. The algorithmfollows these steps:

1) Detect sensor activations and fetch the correspondingADL property in the ontology

2) Aggregate multiple sensors to define the context of theactivity

3) Build a new activity (named ATV) with two level ofabstraction: conceptual description and instance specificto the user. Defines the activity’s property according tothe previous step

4) Use description logic (equivalence and subsumption)to check if ATV is equivalent or subsume any atomicactivity concept in the ontology.

5) If yes, we need to verify if it is an abstract activity bylooking at its underlying activities.

a) If it doesn’t have any, then this is the actuallyactivity performed by the user.

b) Otherwise, additional discriminating sensor data isrequired to decide which specific activity the useris doing.

6) If not, user semantic retrieval to find the most atomicactivity subsuming ATV (i.e. its ”parent” in the hierarchyof activities)

a) If it doesn’t have any, then this is the actualactivity performed by the user, but the activity isnot completed yet.

b) Otherwise, additional discriminating sensor data isrequired to decide which specific activity the useris doing.

One important flow of this algorithm is that it can not detecttwo activities ran in parallel. Indeed, then signals from bothactivities would collide and the assistive agent could notrecognize any of the two activities. To overcome this problem,the activity model should be extended but this is outside of thescope of the paper. On the other hand, this algorithm allows fordetecting unfinished activities with an increasing confidence asthis activity comes to an end.

In order to be practical, ADL needs to be recognized whilebeing performed. Therefore a real-time continuous recognitionis necessary. The ontological activity model does not takeexplicitly into account the temporal aspects. However, sincethe properties of an activity are derived from sensors and sincesensor activations are timestamped, the temporal aspects areimplicitly taken into account. This allows to perform onlineactivity recognition by using two methods: A sliding windowwith a fix duration. The window aggregate the sensors activa-tion within its time range. If an activity is successfully detectedat time t, all the sensors activated before t are discarded andthe window slides. If no activity has been detected duringthe time window, the system provides reminder to the useron the hypothetical activity, if the user doesn’t reply, theactivity is considered aborted. The size of the window isoriginally fixed. However, as soon as an activity is detected(even if it is not finished), the size of the window is adjustedaccording to the duration property of the activity. The secondkey to continuous and real time activity recognition is thatthe recognition operation is performed each time a sensor is

Fig. 7. System architecture

triggered. This ensure that the previously described algorithmis performed recurrently as the activity is performed.

Including both a coarse grain (generic activity) and a finegrain (instance of activity which is user-specific), the algorithmis also able to assist the user in a more personal fashion.Indeed, after the generic activity has been recognized usingthe algorithm above, the

1) If it doesn’t have any, then this is the actual activityperformed by the user, but the activity is not completedyet.

2) Otherwise, additional discriminating sensor data is re-quired to decide which specific activity the user is doing.

instance of the activity that the user is performing can beretrieved from their profile. If they have a particular wayof performing the activity, recommendations can be givenaccording to his own way of doing it.

System architectureThe authors propose a system architecture for the activity

recognition system. Depicted in Figure 7, it is built aroundan assistive agent which gathers inputs from the sensors andprocesses them using the ADL models and repositories. Theagent performs coarse-grained activity recognition using theADL ontologies and a fine-grained recognition using theuser profile (containing instances of ADL classes retainingthe user’s preferences while performing an activity). It cantherefore provide both generic and personalize assistance tothe user.

ExperimentsIn order to assess the performance of their system, the

authors equipped a home with 40 sensors deployed on itemsinvolved in eights different activities (wash hands, make pasta,make a hot chocolate, watch TV, have a bath, make coffee,brush teeth, make tea). In order to test the robustness ofthe algorithm, the activities are defined in three differentways: a regular way (1), a way (2) where the actions areperformed in a different order and a last way (3) in whichsensor noise (extra activations) is introduced. The activities areperformed by three different users for two rounds leading to2(round)×3(users)×24(activities) = 144 activities. takingthe average recognition accuracy on the three user we get thatactivities performed in the first way have an accuracy of 100%while the activities performed in the second and third wayhave an accuracy of 91.66%. This gives an overall recognitionaccuracy of 94.44%. However the authors acknowledge the

8

fact that the first way of performing the activities was expectedto give perfect accuracy since it was the way the activities weredefined in the ontology. This is, according to me, the mainflow of this paper: Given the fact that the activities have tobe performed in specific ways by the user, this experimentdoes not benefit the advantage to be ran in a real smart-home environment and could have been done by simulation(simulating sensors activation).

A second experiment aims at assessing the recognitionaccuracy of fine-grained (user specific) activities. To thispurpose, 2 users perform 3 activities in 2 different way: First,the users perform the activities in the way they have beendefined in the user’s profile (specific instance of the activity).In a second time, the users perform the activities by replacingan object with a different object of the same type (e.g. replacethe whole-milk by semi-skimmed milk) to test how the systemreacts to noise. The result of this experiment is that, in bothcases the system managed first to recognize the generic activityin all cases and secondly managed to provide informationto the users regarding their preferred way of performing the(specific) activity.

IV. THESIS PLAN

Mimicking the human memory, we want to aggregateuser’s events in coherent semantic frames and build a digitalepisodic memory. To this purpose, we plan to build a two-layer architecture. The bottom layer is the most sensor-nearand is responsible for recognizing activities from raw sensordata. It is composed of an activity recognition system, mainlybased on clustering techniques and a trigger-based activitydetection system which should cover the detection of freeradicals (without contextual frame) activities (e.g. receivinga call from a relative). The upper layer is responsible fortransforming the activities previously detected into memoriesusing semantic reasoning.

The activity recognition brick contains an implementation ofthe ESOINN algorithm clustering essentially GPS coordinates.The algorithm needed several changes in order to fit our needs:

Due to ESOINN waiting for every lambda period to labelthe clusters, a strict online classification would result in inputsignals without labels in the case the closest node from thissignal would not be classified yet. To answer this problem, wekeep a window of at most lambda signals and classify themall at once at the end of the lambda period. This way, everysignal is labelled.

Another problem concerns the way ESOINN forgets nodes.Indeed, in order to prevent the network of nodes from growingtoo fast and also in order to remove nodes, ESOINN possessesa mechanism to delete nodes with a low density or low degree.As explained in the ESOINN section, this mechanism is notsuitable in our case. To prevent this behavior we disallow theremoval of a node if it is the last one of its cluster. Therefore,cluster with low connectivity will shrink until becoming asingle node. This node will then be maintained in the network.Initial results of the algorithm give an accuracy of 100% onGPS traces dataset.

A second work in progress concerns the way to semanticallytag recognized activity. Indeed, in order to be used in the

creation of memory, activities need to be semanticallyenhanced. Using the location types of the google placeAPI, we are currently running an crowd-sourced experimentgetting for each location we ask the crowd to provide anaction and an object on which the action apply (e.g. at anaquarium i can look at fishes). In order to keep the listof action and objects under control, the words are pickedfrom two structured source: The verbs are picked from theWordNet ontology, while the objects are instances of classesfrom DBpedia. This experiment should allow us to build afirst ontology of everyday activities.

Several future steps are considered at the time of writing. Thefirst one concern the creation of a dataset. Indeed, to the bestof our knowledge, there exists no dataset with a ground truthfor the activities of the users. The dataset should have bothvirtual (e.g. list of running application, state of charge) andphysical (e.g. GPS, accelerometer) data along with a activitylabel provided by the user. Gathering the dataset takes time,therefore we should start this task soon.

a second step concerns the upper layer defined above. Thereasoning leading to the creation of memories with multiplelevels of abstraction should be inspired by [3].

Another step regards the definition of a memorable event.It is still not clear what are the features that make an eventmemorable. To this purpose we plan to run a user studyonce the system is operational. This study would consist inanalysing user’s feedback in order to detect (if any) the keyfeatures that make an event memorable. This study could takethe place of a game or be part of the user input routine.

A last step concerns the way the digital memories can beaccessed by the user. How should the memory be displayedand retrieved? This question is related to the previous step inthe sense that knowing the key features of a memorable eventwould provide directions for defining a potent user experiencefor accessing specific memories.

REFERENCES

[1] A. Krause, D. P. Siewiorek, A. Smailagic, and J. Farringdon, “Unsu-pervised, dynamic identification of physiological and activity context inwearable computing.,” in ISWC, vol. 3, p. 88, 2003.

[2] S. Furao, T. Ogura, and O. Hasegawa, “An enhanced self-organizingincremental neural network for online unsupervised learning,” NeuralNetworks, vol. 20, no. 8, pp. 893–903, 2007.

[3] L. Chen, C. D. Nugent, and H. Wang, “A knowledge-driven approach toactivity recognition in smart homes,” Knowledge and Data Engineering,IEEE Transactions on, vol. 24, no. 6, pp. 961–974, 2012.

[4] M. A. Conway, “Episodic memories,” Neuropsychologia, vol. 47, no. 11,pp. 2305–2313, 2009.

[5] S. Furao and O. Hasegawa, “An incremental network for on-line unsu-pervised classification and topology learning,” Neural Networks, vol. 19,no. 1, pp. 90–106, 2006.

[6] V. Bush, “As we may think,” 1945.[7] J. Gemmell, G. Bell, R. Lueder, S. Drucker, and C. Wong, “Mylifebits:

fulfilling the memex vision,” in Proceedings of the tenth ACM interna-tional conference on Multimedia, pp. 235–238, ACM, 2002.

[8] D. R. Karger, K. Bakshi, D. Huynh, D. Quan, and V. Sinha, “Haystack: Acustomizable general-purpose information management tool for end usersof semistructured data,” in Proc. of the CIDR Conf, 2005.

[9] N. Eagle, and A. Pentland , “Reality mining: sensing complex socialsystems,” in Personal and ubiquitous computing, 2006.

Documents

Semantic Aggregation of Memorable Activities exam/PR13Ranv… · Semantic Aggregation of Memorable Activities Jean-Eudes Ranvier I&C, EPFL Abstract—Human memory is prone to forgetting