13
Rewind: Leveraging Everyday Mobile Phones for Targeted Assisted Recall Donnie H. Kim, Nicolai M. Petersen, Mohammad Rahimi, Jeff Burke, Deborh Estrin Center for Embedded Networked Sensing, UCLA {dhjkim, nmp}@cs.ucla.edu, [email protected], [email protected], [email protected] Lenore Arab David Geffen School of Medicine, UCLA [email protected] This paper presents an assisted recall system, Rewind, that employs automatic image capture on mobile phones and filtering of images for end-user viewing. The usability of image-based assisted recall systems is limited by the large number of images through which the user must navigate. Rewind is a scalable system of everyday mobile phones and supporting web services that we developed to explore how client- and server-side image processing can be used to both lower bandwidth needs and streamline user navigation. It has been in use since August 2007 as part of a pilot study super- vised by an epidemiologist to evaluate its utility for improving re- call of dietary intake, as well as other shorter and more exploratory trials. While the system is designed to accommodate a range of image processing algorithms, in this first prototype we rely on sim- ple filtering techniques to evaluate our system concept. We present performance results for a configuration in which the processing oc- curs on the server, and then compare this to processing on the mo- bile phones. Simple image processing on the phone can address the narrow-band and intermittent upload channels that characterize cellular infrastructure, while more sophisticated processing tech- niques can be implemented on the server to further reduce the num- ber of images displayed to users. Categories and Subject Descriptors J.3 [Life and Medical Sciences]: Health General Terms Design, Experimentation, Human Factor Keywords Urban Sensing, Urban Computing, Mobile phones, Automatic im- age capture, Image processing, Prioritized upload 1. INTRODUCTION People increasingly carry mobile phones with them everywhere and all the time, across cultures, countries, and demographics. These mobile phones increasingly contain imaging and location capabili- ties, (i.e., high resolution cameras and GPS). We are exploring in- novative applications that leverage the use of image and location data captured by mobile phones as they move around in the envi- ronment with their users. In this paper we document and evaluate one such application type - assisted recall. Assisted recall systems support individuals by recording aspects of the environment around them for later playback. They can assist people with healthy memories who want to track additional details, those with memory impairments or perceptual problems due to in- jury or illness, or in self-improvement where an individual wants to become more conscious of their habits. Additionally, scientists can use them as tools to study the impact of individual behaviors for which there are no direct measures suitable for longitudinal study, such as foods consumed or numbers of people interacted with [5, 15, 21]. In contrast to lifeblogging systems such as MyLifeBits [9, 10] and SenseCam [13], which attempt to collect as much data as possi- ble for future mining, Rewind, our system for image-based assisted recall, supports data collection on specific behaviors or situations. By targeting recall, we can take advantage of the application’s con- straints in capture, presentation, processing, and upload. In partic- ular, our system (i) reduces burden on the end user by employing application-specific signal processing to decrease the amount of in- formation presented, (ii) lowers bandwidth through prioritized or selective upload based on this processing (iii) pursues a conserva- tive, application-specific approach to personal data collection and retention, thus reducing privacy risks. To evaluate its utility in improving dietary recall, Rewind has been in use since August 2007 as a supplemental tool in an NIH-funded study on dietary intake. We have also designed it to explore other image-based targeted recall applications, such as the PosterSession- Capture Pilot described in Section 4.3. Our use of everyday mobile phones and web services is motivated by low cost and easy avail- ability for the populations in our specific studies, as well as a desire for extensibility to a broad range of applications in the future. The use model is simple. For example, in the dietary intake pilot, study participants launch the Rewind application on their mobile phone before eating, place the phone around their neck on a lan- yard with the camera facing out and down, and then turn it off at the end of the meal. When they visit the study facility on the next day, images captured during their meals will already have been au- tomatically uploaded and tagged with image-quality metrics. Par- ticipants log-in to a secure image viewer to browse the best of their own food images while taking a dietary survey.

Rewind: Leveraging Everyday Mobile Phones for Targeted

  • Upload
    newbu

  • View
    1.021

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Rewind: Leveraging Everyday Mobile Phones for Targeted

Rewind: Leveraging Everyday Mobile Phones for TargetedAssisted Recall

Donnie H. Kim, Nicolai M. Petersen, Mohammad Rahimi, Jeff Burke, Deborh EstrinCenter for Embedded Networked Sensing, UCLA

{dhjkim, nmp}@cs.ucla.edu, [email protected], [email protected], [email protected] Arab

David Geffen School of Medicine, [email protected]

This paper presents an assisted recall system, Rewind, thatemploysautomatic image capture on mobile phones and filtering of imagesfor end-user viewing. The usability of image-based assisted recallsystems is limited by the large number of images through whichthe user must navigate. Rewind is a scalable system of everydaymobile phones and supporting web services that we developedtoexplore how client- and server-side image processing can beusedto both lower bandwidth needs and streamline user navigation. Ithas been in use since August 2007 as part of a pilot study super-vised by an epidemiologist to evaluate its utility for improving re-call of dietary intake, as well as other shorter and more exploratorytrials. While the system is designed to accommodate a range ofimage processing algorithms, in this first prototype we relyon sim-ple filtering techniques to evaluate our system concept. We presentperformance results for a configuration in which the processing oc-curs on the server, and then compare this to processing on themo-bile phones. Simple image processing on the phone can addressthe narrow-band and intermittent upload channels that characterizecellular infrastructure, while more sophisticated processing tech-niques can be implemented on the server to further reduce thenum-ber of images displayed to users.

Categories and Subject DescriptorsJ.3 [Life and Medical Sciences]: Health

General TermsDesign, Experimentation, Human Factor

KeywordsUrban Sensing, Urban Computing, Mobile phones, Automatic im-age capture, Image processing, Prioritized upload

1. INTRODUCTIONPeople increasingly carry mobile phones with them everywhere andall the time, across cultures, countries, and demographics. Thesemobile phones increasingly contain imaging and location capabili-ties, (i.e., high resolution cameras and GPS). We are exploring in-

novative applications that leverage the use of image and locationdata captured by mobile phones as they move around in the envi-ronment with their users. In this paper we document and evaluateone such application type - assisted recall.

Assisted recall systems support individuals by recording aspects ofthe environment around them for later playback. They can assistpeople with healthy memories who want to track additional details,those with memory impairments or perceptual problems due toin-jury or illness, or in self-improvement where an individualwants tobecome more conscious of their habits. Additionally, scientists canuse them as tools to study the impact of individual behaviorsforwhich there are no direct measures suitable for longitudinal study,such as foods consumed or numbers of people interacted with [5,15, 21].

In contrast to lifeblogging systems such as MyLifeBits [9, 10] andSenseCam [13], which attempt to collect as much data as possi-ble for future mining, Rewind, our system for image-based assistedrecall, supports data collection on specific behaviors or situations.By targeting recall, we can take advantage of the application’s con-straints in capture, presentation, processing, and upload. In partic-ular, our system (i) reduces burden on the end user by employingapplication-specific signal processing to decrease the amount of in-formation presented, (ii) lowers bandwidth through prioritized orselective upload based on this processing (iii) pursues a conserva-tive, application-specific approach to personal data collection andretention, thus reducing privacy risks.

To evaluate its utility in improving dietary recall, Rewindhas beenin use since August 2007 as a supplemental tool in an NIH-fundedstudy on dietary intake. We have also designed it to explore otherimage-based targeted recall applications, such as thePosterSession-Capture Pilotdescribed in Section 4.3. Our use of everyday mobilephones and web services is motivated by low cost and easy avail-ability for the populations in our specific studies, as well as a desirefor extensibility to a broad range of applications in the future.

The use model is simple. For example, in the dietary intake pilot,study participants launch the Rewind application on their mobilephone before eating, place the phone around their neck on a lan-yard with the camera facing out and down, and then turn it off atthe end of the meal. When they visit the study facility on the nextday, images captured during their meals will already have been au-tomatically uploaded and tagged with image-quality metrics. Par-ticipants log-in to a secure image viewer to browse the best of theirown food images while taking a dietary survey.

Page 2: Rewind: Leveraging Everyday Mobile Phones for Targeted

To be effective for this application, Rewind must sample images ata high enough sampling rate to capture the “dynamics” of eating,such as the participant taking additional servings, leaving items onplates that are returned,etc., and often run for an hour or more ata time several times a day. It must automatically upload and thenprocess the images to select a small representative set for the usersto view during their visit. At the rapid sampling rate needed, asingle user captures too many images for manual investigation. Inthe DietaryRecall Pilot, each user generated an average of morethan 100 images per meal.

Based on this and other similar scenarios, we designed Rewind to:

• Automatically capture and cache images at a rate that wouldnot be tolerable if the user had to do it manually,

• Perform both handset- and server-side processing of imagesto generate simple classifications and quality metrics selectedand tuned for the specific application target,

• Prioritize images for upload based on mobile-phone-basedannotations to mitigate the impact of channel congestion onthe rapid availability of representative images to the end user,

• Support rapid experimentation with different image process-ing algorithms as part of an iterative design process with ap-plication experiences.

• Display images for the user in a private on-line gallery, wherethey are clustered in time and filtered based on the application-specific annotations, and,

• Enable the user to explicitly select images to be stored orshared within the target applications’ privacy constraints, whileassuring that all images not marked for sharing are deleted.

Contributions. To the best of our knowledge, Rewind is the firsttargeted assisted recall system using everyday mobile phones. Ourcontributions include:

• Conceiving, building, and deploying in a “live” medical studythis type of platform, with configurabality and extensibilityenabling other pilots that we will also describe.

• The use of web services and image processing techniques forserver-side operations.

• The use of a password protected, user-data workspace modelfor staging of potentially-private data until it is reviewed andreleased or deleted by the user.

• Demonstration and evaluation of on-device processing of im-ages to reduce and prioritize upload over bottle-necked chan-nels when real-time and interactive access to the data is needed.

This continues our early experimentation with using mobilephonesfor monitoring dietary intake [20]. Several significant newchal-lenges are addressed through a new system design with modularand flexible components that will accommodate larger numberofphones and improve performance and scalability. Additionally, wegained a wealth of practical experience through collaboration withmedical research partners and real deployment. Human subjectsresearch requirements on their study required the development ofa model for personal data capture addressing privacy and securityissues.

2. RELATED WORKThere is significant mobile and pervasive computing research thatexplores wearable image and audio recording devices. Several in-vestigate the use of wearable image capture devices for creating amultimedia diary of one’s life. Typically, there is noa priori focuson particular situations or behaviors to capture; Rather the goal is toarchive the data and, thus, life, in its entirety. The many challengeinclude: meaningful organization and indexing of the huge imagecorpus, user interface design, and effective navigation ofthe infor-mation [4, 8, 11]. Notably, SenseCam [13, 10] explores usageof awearable device worn on a pendant hung from ones neck and hasa combination of image, accelerometer, infrared and temperaturesensors. It captures images either continuously or based ontrigger-ing events from its other sensors and images can be subsequentlytransferred to a host PC for archival and visualization [12,14, 10,19]. Kapur et. al. tested the SenseCam to aid autobiographicalmemory in patients with severe memory impairment [5]. Subse-quent research developed tools to navigate through the SenseCamimages and effectively visualize the captured information[6].

Many factors distinguish our work from other assisted recall re-search. First is the usage of the phone itself. By using mobilephones that are ubiquitously available, we enable future deploy-ments and larger scale experimentation. Therefore we have de-signed our system to support a large number of phones and users.A second distinction is the usage of the information in our system.In our research, the objective is capturing images for improving aspecific, targeted, application. For instance, in the dietary experi-mentation, the objective is to improve the ability of the participantsto recall their daily food intake. This targeted capture also lendsitself to the third party usage of the data such as by a medicalre-searcher once the subject has reviewed and released data. The se-lective sharing of the image data is critical to individual privacyprotection and has implications for both system design and evalua-tion.

Several works in image processing and computer vision literatureare relevant to the assessment of application-specific quality of thecaptured images. This includes characterization of both qualityand relevance of the content of the images. The main difficultyin assessment of the image quality is a consistent metric indepen-dent of illumination changes. Many of the image processing tech-niques rely on holistic illumination-independent features such as ahistogram of coefficients of Discrete Cosine Transform (DCT) orHarr wavelet transform to assess the quality of the images [18, 23].There is also a multi-decade research thread in content-based as-sessment of image information. These research efforts haverootsin pattern recognition and object classification to use low-level fea-tures in order to find similarity in the content of visual information[2, 7, 17, 24]. In the version of our system described here we haveintegrated the simplest of image processing filters and hopeto useour pilot deployment experiences and image collections to engageimage processing and computer vision researchers in explorationand development of more sophisticated and effective techniques.

Finally, we acknowledge the deep challenges in understanding theprocess of memory and integrating that understanding into the de-sign of an effective assisted recall system. The related work in cog-nitive psychology to study human autobiographical memory,fac-tors that influence various forms of human memory and effective-ness of multimedia diaries in improving human recall are of signif-icant importance in designing future deployments and experiments[15, 21]. We hope Rewind can serve as an experimental tool for

Page 3: Rewind: Leveraging Everyday Mobile Phones for Targeted

psychologists to both test and integrate their knowledge infutureexperiments and applications. With respect to dietary recall mech-anisms in particular, several recent studies have indicated the su-periority of 24-hour recall over longer-time-period food frequencysurveys and we designed our system to work alongside these com-puterized 24-hour recall applications [22].

3. SYSTEM DESIGNThe system (Figure 1) is comprised of 1) mobile phones, carried bythe users, runningCampaignrdata collection software [16] to cap-ture and upload data 2) aData Management Server (DMS)manag-ing data flow among system components and providing a user inter-face to view captured images, and 3) anImage Processing Server(IPS) tasked by theData Management Serverto process, classify,and annotate the images. TheData Management Serverinteractswith the rest of the components through a RESTful HTTPS inter-face, and Rewind uses industry standard SSL/TLS encryption, andauthenticated transmission, of the data from the phone to the DataManagement Server.

Campaignris configured to capture images (or audio) and meta-data at predefined intervals using a simple XML configurationfile.When connectivity is available, it pushes data to theData Manage-ment Server, which stores media files in a secure file system andall other data in a relational database. An asynchronous batch pro-cess extracts unannotated images sequentially, requests image anal-ysis and annotation by theImage Processing Server, and generatesa thumbnail. TheData Management Serverprovides a password-protected web interface to each user’s system-annotated data. Whenusers log-in, theData Management Serveruses the image annota-tions to select which data to display. In the case of the dietary intakepilot, these are the “highest quality” images, as will be explainedbelow. This interface enables users to decide what data is savedpermanently and what is shared with selected users of the system.All other data is deleted.

By using low cost and reliable mobile handsets, interconnectingsystem components through standard web services, and a web-based viewer for end users, we hope to enable future adoptionand scaling up in the number of users. By separating the time-consuming and application-dependent image processing tasks fromthe core data flow services, it is straightforward to add additionalImage Processing Servers to support a larger number of images andthus a larger number of mobile phones operating simultaneously.Later in the paper, we explore additional benefits and trade-offs inpushing image processing out to the mobile handsets themselves.The remainder of this section describes each component of Rewindin more detail.

3.1 Mobile PhonesRewind uses Nokia N80 phones running the Symbian v9.1 Operat-ing System with the S60 3rd Edition User Interface [3]. Each hasa 3 mega pixel camera and 802.11b/g connectivity in additionto aGSM modem. These mobile phones runCampaignr[16], an appli-cation written for S60 phones that can acquire data from the manyhardware sensors on the cell phone, including the camera andmi-crophone.Campaignrcan also acquire data from software sensorssuch as date, time, and text strings.Campaignrstores the data in aninternal database on the mobile phone immediately after collectingfrom the sensors so that no successfully sensed data is lost due toexternal factors such as bad connectivity. After the data isstored,Campaignrattempts to upload it to theData Management Serverdescribed below.Campaignrreads a few rows from the database,

Figure 2: Mobile Phone Composite. (a) The mobile phone isworn around the participant’s neck. (b) A start and stop stickeris attached to assist participants. (c) A clip is attached ontheback of the phone for proper phone positioning targeting di-etary capture. (d) The phone includes a 3 mega pixel cameraand flash.

packages them, and attempts to transmit the data over HTTP. Onlyupon a success signal from theData Management Server, Cam-paignr will delete the corresponding data in the database. Thismay lead to duplicate entries in the on-line database, but they areeasy to distinguish by users, or automated processes running on theData Management Server. The cost of removing duplicates or theadditional memory requirements of storing redundant images is in-significant when compared to irrecoverable data lost between thephone and the server. An application-specific XML file is usedtospecify which sensors to collect data from, and where to uploadthe data to. Each XML-file captures the details of the specificdatacollection campaign. This allows for many different types of datacollection experiments to be run without having to modify the ap-plication itself.

TheDietaryRecall, FullDayDietary,andPosterSessionCapture Pi-lots described in this paper prompted the addition of configurableoptions that can be expressed in the campaign XML file. TheDietaryRecall Pilotdesigned for a broad population of untrainedusers needed a configurable user interface forCampaignr. Ele-ments, such as the font size, notification area, and soft-keyoptionswere added to support this. TheFullDayDietary Pilot exposedCampaignrdatabase corruption issues resulting data lose when thememory is full because it involved the longest period of continu-ous collection coinciding with key periods of disconnection. ForthePosterSessionCapture Pilot, we added an audio annotation fea-ture to Rewind campaign to let participants record audio annota-tions about the research posters. ThePosterSessionCapture Pilotrequired user feedback options that are not visual as the phone wasbeing worn in such a way as the screen was not easily accessible.To support this, each time an audio recording was initiated or ter-minated by the user the phone would vibrate, letting the userknowthe event had occurred without having to check the phone’s screen.Figure 2 shows images of the mobile phone used for Rewind appli-cations. While there are significant differences in the needs of eachcampaign, we were able to make only slight modifications to theexisting system to accommodate.

Page 4: Rewind: Leveraging Everyday Mobile Phones for Targeted

Figure 1: Rewind system components

3.2 DMS: Data Management ServerIn this section, we describe theData Management Serverwhichprovides core functionality of the system including a web interfaceused for data communications, an asynchronous batch image pro-cessing component called Image Handling Service, standardsecu-rity practices used for data protection and user authentication, andfinally a user interface developed for presentation and personal an-notation.

3.2.1 Web InterfaceThe Data Management Serverreceives images from Campaignrequipped phones over HTTPS as described above. Received im-ages and tagging information (time stamp, audio,etc.) are stored ina secure file system and a relational database. Image handling ser-vices that schedule processing based on the data these data alsocommunicate through this web interface to retrieve images andtheir meta-data from the secure repository, as well as to send backthe processing results for storage. All communications arethroughURLs. The web interface was chosen over a custom designedserver protocol because of its simplicity and ease of adaptation andextension.

3.2.2 Image Handling ServicesImage handling services periodically check if processing is requiredfor each image. A batch process wakes up and implements the workflow of the system. As shown in Figure 1, three batch processesarecurrently implemented as cron jobs. TheResizing processgener-ates a thumbnail of each image to allow many images to be dis-played in a single web page without stressing the web browser. Italso improves the download performance by reducing the band-width consumed. TheReaping processpermanently deletes imagesthat users marked for deletion through the Image Viewer, as well asimages that were never shown to the user due to the image filteringprocess. The Reaping process requests a feed from the web inter-face to learn which files should be preserved and assumes all othersshould be deleted. TheImage processing processpushes all not-yet-annotated images to theImage Processing Serverand retrievesannotation results for each image. This annotation information isused by the Image Viewer to filter and prioritize the images thatare displayed to the users. All three of the currently implementedimage handling services interact with the secure repository throughDMS’s web interface.

3.2.3 Data Protection/Security/PrivacyRewind potentially captures significant personal information, fromthe faces of family members across the breakfast table, to the de-tails of email and e-commerce displayed on the users computerscreen, and even the occasional image accidentally taken withina restroom. Therefore, privacy of automatically collectedpersonaldata had to be addressed even in the earliest prototyping phases ofthese systems. Rewind web services adopts standard security prac-tices. All communication uses secure HTTP connections, in par-ticular, HTTP over SSL with a X.509 public key certificate createdfor the web server. The web-server does not store any identifiableinformation about the users. Images are viewable only by thein-dividual who owns them, unless they explicitly choose to share theimages. In particular, for theDietaryRecall Pilotdiscussed below,images were deleted automatically if not viewed, or if marked fordeletion, within 72 hours after capture.

3.2.4 The Image ViewerUsers are given authenticated access to the Image Viewer (Figure3) with a user ID and a password. As the camera phone operatesin an autonomous collection mode, gathering data without user in-tervention, many images might not be useful for assisting recall.To support image viewing, annotation, and recall more effectively,the Image Viewer filters out low quality images and displays onlya selected set of images to the user. Each image is converted into athumbnail to enhance download performance and labeled automat-ically with four image feature metadata tags by the image handlingservices. These tags are explained in more detail below and areused for image filtering prior to presentating images to the user.The Image Viewer first automatically filters out blurred, over- andunder-exposed images, and then temporally clusters the images toreduce redundant images by showing only a few representative im-ages from each time cluster. We have currently implemented asim-ple uniform time interval clustering for display, but othermore so-phisticated, and likely more application-specific, clustering meth-ods could be substituted and should be developed as future workto increase the information content of each image displayedto theuser. Our modular system design will facilitate exploration and in-troduction of improved techniques.

3.3 IPS : Image Processing Server

Page 5: Rewind: Leveraging Everyday Mobile Phones for Targeted

(a) (b)

Figure 3: Rewind Image Viewer. Users can view (a) the entireimage set, or (b) one representative image from each cluster.

Our manual inspection of the images captured by the mobile phonesduring theFullDayDietary Pilot indicated a distinctive differenceamong the quality of the images. In particular, a large subset of theimages were under-exposed or over-exposed due to sudden changesin the luminance or non-ideal dark or bright lighting. In addition,there were many cases where the images were blurred due to themovement of individual who was carrying the camera or the sub-jects captured in the image. We developed the following approachto support this process, but generalized the system so as to enablemore experimentation and other applications in the future.

3.3.1 Image Processing Server DesignA primary objective in designing theImage Processing Serverwasto separate the time consuming image processing operation fromthe core functionality provided by theData Management Server,and thereby support replication of the server to accommodate higherprocessing demands or larger number of phones in the future.Thesecond objective in the design of theImage Processing Serverwasto enable rapid and convenient integration of application specificimage processing and computer vision techniques.

The Image Processing Serverwas designed such that no globalmemory was needed to synchronize the individual servers, henceeliminating the migration complications inherit in volatile data shar-ing. Instead theData Management Serveris responsible for schedul-ing and gathering data. The scheduling function uses a round-robinand asynchronous gathering methodology for communicationwiththe Image Processing Server. This insures theImage ProcessingServer is not burdened by other tasks and the centralized post-processing eases tuning of the decision algorithms.

To accommodate rapid exploratory integration of application spe-cific processing, we used Matlab as the computation engine for theImage Processing Server. However, since Matlab does not providean interface for external applications to access its internal functions,we extended it with an internal TCP/IP server. The internal serverprovided a common interface for the local Java Servlet-based ap-plication server to interact with Matlab’s internal function libraries.

To analyze the images, theData Management Servernegotiatesaccess to the services of theImage Processing Serverthrough astandardized HTTP-based interface. The interface acceptsREST-ful encoded function calls and the results were returned as aJSONencoded text string. The URL encodes the name of the requestedfunctionality and a unique ID for each requested image. The imageprocessing process, one of theData Management Server’s imagehandling services, retrieves the image from local disk, andsends

the image with the processing request of its choice, indicated by afunction name, to theImage Processing Server. The Image Pro-cessing Serverparses the request, saves the file to the local diskfor accessing by Matlab, and passes the function name to Matlab’sinternal TCP/IP server. Matlab looks up the function and either re-turns the result of the processing if the function is available or oth-erwise returns an error message to theData Management Server.

TheData Management Servercan be configured for two indepen-dent scenarios. In one scenario, new images are handled at sched-uled intervals; preferably at times when the system would not beoccupied by intensive tasks. In the second, a semi-real timesce-nario, incoming images are fed to theImage Processing Serverina FIFO manner. In either case, the logic on theData ManagementServerhandles all filtering and segmentation. The file descriptorsand data provided by theImage Processing Serverare used as inputto the filtering and segmentation logic.

3.3.2 Image Processing TechniquesTo filter out images before presenting through the Image Viewer,the Image Processing Serverprocesses each image in the galleryand determines its quality by classifying it into four categories ofClear, Blurred, ExposedandBlurExposed. The resulting categoryis recorded back as an annotation to the image and subsequentlyused by theData Management Serverto select images for presen-tation to the user.

To determine the class of each image, we use four well-known fea-tures: mean of intensity, standard deviation of intensity,the numberof edges in the image, and finally the sum of the high frequencyco-efficient of Discrete Cosine Transform (DCT) of the image. Were-fer to these features as the mean feature, standard deviation feature,edge feature and DCT feature respectively throughout the paper.While these features are widely used individually in previous work[18, 23] to determine the image quality, we use their combinationto enhance the performance of the image quality estimation.TheImage Processing Serveruses a decision tree classifier that basedon the above four features determines the class of each image. Weevaluate the performance of the classifier in terms of classifyingthe images and the latency cost of extracting the four features anddetermining the class of the images in Section 5.1.

3.4 Mobile Phone ProcessingIn this section, we describe extensions to the system designto sup-port local image processing on the mobile phone as a means of in-creasing the efficiency and interactivity of recall applications. Ourearly pilot image sets had many low quality images, as well assomecongestion on the wireless upload link. Therefore we were moti-vated to reduce the congestion by usingin situ filtering of the ex-tremely low quality images that were in any case filtered out on theback-end server once uploaded. Our pilots also prompted interest-ing requirements for prioritized and real-time upload, particularlywhen participants connected to the system after an extendedperiodof disconnected operation. The functionality added includes up-loading and viewing the most recently captured images first,andrapid viewing of a subset of images that represents the entire day,rather than an equal number of images that covers just a smallpor-tion of the day. To facilitate this, we have modified the functionalityof theCampaignrapplication running on the mobile phone to clas-sify the captured images based on the extracted features, and im-plementreverse chronological orderupload andprioritized uploadpolicies. This section describes the simple image processing andprioritized upload mechanisms implemented on the mobile phone.

Page 6: Rewind: Leveraging Everyday Mobile Phones for Targeted

Figure 4: Extension of Campaignr application to support pro-cessing on the mobile phone.

3.4.1 Image Processing on Mobile PhonesSimple image features can be used to determine the quality oftheimage as described in Section 3.3.2. However, image processingoverhead is a more serious concern on the mobile phone than itwas on the back-end server because of the device’s relatively con-strained resources.

To extract image features and classify the acquired images on themobile phones, we have extended theCampaignrapplication thatruns on the mobile phone. The newImage Annotationmodule ex-tracts image features and theImage Classificationmodule classifiesthe images based on the features extracted from theImage Annota-tion module. The modules are designed to easily accommodate ad-ditional or alternative image processing functions where the binarydecision on the feature can improve classification. The softwarewas designed so that application designers can conveniently selectthe features to be extracted from the captured images and config-ure the thresholds of the chosen decision tree used for classificationvia the localCampaignrXML configuration file. Campaignrcur-rently supports extracting mean and standard deviation of intensity,the number of edges in the image using well-known edge detectionalgorithms (i.e. Sobel, Canny, and Prewitt), and the JPEG imagesize of the captured image. We have also implemented aCluster-ing module that clusters the images based on the capture time ofthe image. This module can also be easily extended to use otheravailable information, such as image features for more sophisti-cated clustering algorithms. Lastly, anUpload Rankermodule isimplemented to implement different types of upload policies.

The data flow of each captured image is shown in Figure 4. When-ever Campaignrcaptures an image, it temporarily stores the im-age on the database along with other sensing data including thecapture time. If activated by the XML configuration file, theIm-age Annotationmodule fetches the stored image, extracts the se-lected features, and stores the results in the database. Then, theImage Classificationmodule reads the computed features and clas-sifies the image using the configured decision tree. TheClusteringmodule generates the cluster ID using the capture time. Lastly, theUpload Rankermodule ranks the images based on the configuredupload policy. The policy uses the annotations provided by the ear-lier modules for discarding or uploading the images. TheUploadRankermodule may also modify other previous image ranks as theycould be changed depending on the subsequent images.

We have implemented a decision tree on the mobile phone thatclassifies the images into the four categories ofClear, Blurred, Ex-posed,andBlurExposed, as we did in the back-endImage Process-

Figure 5: Prioritized upload policy example. Prioritized uploadpolicy consist of three steps: 1) clustering and classifying eachimage (the class membership are indicated by a number from1-4 for the BlurExposed, Exposed, Blurred and Clear images re-spectively), 2) assigning a rank to each image based on the clus-ter ID and class, and 3) uploading the image sequentially basedon their rank.

ing Serverdescribed earlier. However, to minimize the cost of clas-sification on the phone, we have profiled the cost of extracting eachfeature and we only use the subset of features, that provide max-imum classification performance with a minimum overhead. Wediscuss the performance and cost of the classifier we used on thephone in Section 5.2.

3.4.2 Upload Policy for Mobile PhonesAs expected, mobile phones continuously collecting multiple im-ages per minute can easily congest the relatively narrow andin-termittent upload channels that characterize cellular infrastructure.Simple image processing on the mobile phone can address thisbyselectively uploading informative and high quality images, ratherthan uploading every single image.

Prioritizing the upload order can also improve the usability of thesystem. When mobile phones are disconnected from the network,automatically collected images are stored in the phone’s memoryuntil network connectivity becomes available and the phonere-sumes upload. In the event of extended disconnection, uploadingthe accumulated images takes a significant amount of time. Thisupload delay is not a concern when users do not attempt to viewimages until a later time. However, if the usage model is suchthatusers want to review their images rapidly after starting theapplica-tion or after reconnecting to the wireless infrastructure,selectivelyuploading images can provide enhanced interactivity and respon-siveness.

For example, during theDietaryRecall Pilot, which we describe inmore detail in the next section, we received 3502 images thatwereclassified as low quality image, equivalent to 33% of the total num-ber of captured images. We also experienced many cases wheretheparticipants were disconnected from the network for an extendedperiod of time at the end of their recording day, and needed toup-load the previously captured images after they arrived at the lab toparticipate in the dietary recall experiment the next morning. Thisintroduced a considerable amount of waiting time before they couldactually run the web-based assisted recall process. Also duringthe image capture part of this pilot, users or experiment-facilitatorsneeded to occasionally view captured images right away to deter-

Page 7: Rewind: Leveraging Everyday Mobile Phones for Targeted

mine whether the mobile phone was being worn properly (e.g.,at aheight and angle that allowed the device to capture images offoodfor the particular user’s stature and posture.)

As another example of the use of this feature, consider the bene-fit of displaying a subset of images that represent the users entireday, rather than an equal number of images that cover just a smallportion of the day. This can be achieved by showing only the repre-sentative and quality images from each cluster of images collectedduring a time slot. Thus, uploading first the best quality imagesuniformly distributed over the entire capture time provides a moreinteractive interface for the user, as compared to sequentially up-loading all quality images from each time cluster in chronologicalorder.To provide better interactivity we implemented the ability toselectively upload images based on different upload policies, ratherthan naively uploading all unfiltered images based on chronologicalorder. We have implemented two upload policies.

The reverse chronological order uploadpolicy uploads the mostrecently captured images first. Configuring this policy requiresactivating theImage Annotationmodule andImage Classificationmodule to discard low quality images. However, other modules aredeactivated so thatCampaignrcan simply upload the images basedon the capture time of each image.

Theprioritized uploadpolicy improves the responsiveness that theusers experience after an extended period of disconnection(Fig-ure 5). The images on the phone are clustered by theClusteringmodule (step 1). Subsequently, the images within each cluster areclassified by theImage Classificationmodule and ranked by theUpload Ranker(step 2). TheUpload Rankerassigns a rank to theimages based on the combination of their cluster ID and classmem-bership and records the rank to the phone database. The rankingsystem guarantees that higher quality images are uniformlytakenfrom each cluster (starting from the most recent cluster). Finally, inthe event that the phone connects to the system, theUpload Con-troller gets a rank-sorted list of the images from the local databaseand uploads them sequentially to the server.

4. USE CASES, DATA COLLECTEDRewind was tested in a real epidemiological research setting toevaluate its utility for assisted recall. We also performedothershort-term trials. Results gathered from the first three pilots wereused to evaluate the server-side operations, while a fourthwas spe-cially designed to evaluate on-device processing.

4.1 Dietary Recall Study Pilot (DietaryRecall)To test Rewind’s feasibility in assisting previous-day dietary recall,we distributed camera phones to subjects who agreed to participatein theDietaryRecall Pilotstudy under the direction of Dr. LenoreArab. The participants began using the system in August. To date,10 have consented and received phones. Our first concern was thatsubjects might not find wearing the phone around their neck ac-ceptable. However, all subjects, save two, consented to theadd-oncamera phone study. They were provided with phones on a lanyardso that the camera would be directed downwards and in front ofthe subject. This is the first work of its kind that uses worn mo-bile phones angled downward to autonomously capture food itemsand it established the importance of proper phone positioning andend-to-end system interactivity. Consent to conduct the study wasreceived from both the UCLA medical IRB and the General Clini-cal Research Center at UCLA.

Ten users ran the system during the course of the initial phase ofthe DietaryRecall Pilot. Participants were asked to wear the de-vice turned on only when eating. Across those users there were110 distinct eating episodes and a total of 11090 images uploaded.There was on average of 101 images per episode, ranging from amaximum of 775 to a minimum of 1; reflecting the diversity ofshort snacks and longer meals across study participants. The im-ages were uploaded using the phone’s GSM modem.

Dr. Arab wished to keep the participant experience as simpleaspossible. Based on her experience with participant compliance, thesystem was configured to display a single screen with a maximumof 35 images per eating episode; thus eliminating any scrolling bythe user while completing their dietary recall survey. Rewind se-lected images to present based on image quality and uniquenessfeatures that could be automatically determined by the image pro-cessing software. The system deleted images that were outside of aspecified quality threshold. These images were considered of sucha low quality that they would not provide any useful information.The Reaper function deleted 3502 such images, equivalent to33%of the total number of captured images. Due to privacy concerns itwas not possible to access the files for later evaluation of the im-age filtering techniques. This large number of unusable images islikely exaggerated in this first study because we did not haveexten-sive experience in training users on device placement and had notoptimized the lanyard used. It will be interesting to observe in fu-ture deployments if we see a significant reduction in this percentageas training techniques are improved.

4.2 Full-day Dietary Pilot (FullDayDietary)Fourteen volunteers from our research lab ran the Rewind appli-cation to capture images during their meals to test its feasibilitybefore the study described above. This pilot was conducted duringa workday while the participants were outside the home, capturingsix images a minute throughout their day.

All of the phones captured and transferred images as expected.Both the secure image repository and image processing algorithmshandled the full load of incoming images without incident attheend of the pilot day. We were also able to test a primitive versionof our Image Viewer and verify that both the image deletion andthe eating-episode-identification mechanisms functioned; but at thesame time we were able to obtain feedback from the local usersthatlead to design refinement during the early development phaseof thesystem. Unlike theDietaryRecall Pilotparticipants who had to re-main anonymous and whose images we could not analyze due tostudy design and Institutional Review Board restrictions,the localusers were given the opportunity to release all or some of their im-ages for use by the system designers. Six participants consentedto release total of 14958 images. The users also provided us withdetailed verbal feedback on the experience.

A nutritionist analyzed the released images to determine the num-ber of images taken of each individual food reported during thesedays. The median of number of images per food was 28 for thispopulation and the range was 1 to 211. Most foods had much du-plication.

Significant information on phone use, management of the ImageViewer and presentation of images was obtained from theFullDay-Dietary Pilot study. For example, users experienced extended up-load time when phones were disconnected during operation for sig-nificant periods of time and could not obtain rapid feedback once

Page 8: Rewind: Leveraging Everyday Mobile Phones for Targeted

connected on the quality of their uploaded images. The usersalsosuggested providing a feedback mechanism for correct data cap-ture and upload (as in messages on the screen or audible sounds),and more stable and convenient physical mounting on the lanyard.We have also provided access to the images that were approvedfor sharing to computer vision researchers interested in categoryrecognition to begin developing targeted algorithms for identifyingimages that contain food items.

4.3 Poster Session Interactions Pilot (PosterS-essionCapture)

Fifteen visitors participated in a technical feasibility exercise de-signed to test the ability of our system to handle uploaded imagescollected by many simultaneous and co-located users for an hourduring a research conference poster session in October 2007. Par-ticipants were asked to wear a phone configured to capture sixim-ages a minute on average, while they were visiting student researchposters, amidst a large number (~200) of conference attendees. Par-ticipants were also asked to make audio annotations occasionally toprovide additional context for later discussion. Audio samples werecollected and uploaded by theCampaignron the same phones thatwere used to capture images. The captured images and audio sam-ples where used in a plenary session as a vehicle for promptingdiscussion of the student posters and associated research.In addi-tion we presented system usage statistics such as a ranking of the15 participants in terms of number of images uploaded and filtered,as well as number of audio annotations provided.

In this pilot, the 15 users uploaded 4342 images during the courseof one hour. Each participant uploaded 289 images on average,ranging from a maximum of 351 to a minimum of 162. As wewanted to present the results from the pilot and have discussionswith visitors immediately following the poster session, weran twoImage Processing Servers in parallel to handle all the images insemi-real time. The filtering rate was fairly low relative totheDi-etaryRecall Pilot. Only 2.64% of each user’s images were filteredout on average, ranging from a maximum of 17.41% to a minimumof 0%. The low filtering rate was in part due to the fact that we ranonly the standard deviation filter in order to process all images insemi-real time without applying significantly more computationalresources. In addition, we configured the filter thresholds to be thesame as those used in our earlier pilot studies, but found thelightingconditions and continuous movement of the participants wasquitedifferent. The average file size of the captured 800 x 600 imageswas 35 kB yielding an average total data rate of 3.5 kB/s. 118 au-dio samples were collected ranging from 1 to 40. The images wereuploaded using the phone’s 802.11 radio. In this setup, the systemwas able to handle 1.5 images/sec with a small safety margin.Thisis by no means an upper bound on the capacity of the system; thereare significant opportunities for performance tuning and optimiza-tion that we have not begun to explore let alone implement.

4.4 Pilot Testing for On Device ProcessingImages collected from theDietaryRecall, FullDayDietary,andPosterS-essionCapturepilots were processed entirely on the server. Duringthe pilots, we verified the effectiveness of the chosen imagepro-cessing techniques. However, two problems were observed. First,the number of filtered images was a significant portion of the im-ages collected. And second, at times the wireless upload channelwas congested. This means redundant upload consumed unneces-sary bandwidth in the presence of a wireless channel that is narrowrelative to the number and size of images. Thirdly, simultaneously

processing a large number of images requires non-negligible pro-cessing time when there are many simultaneous users. While par-allelization ofImage Processing Servers alleviates the third issue,pushing some of the image filtering to each mobile phone is a com-plementary step and contributes to the ultimate scalability of theapplication to many simultaneous users. Moreover, local process-ing enables more sophisticated and adaptive behavior such as theprioritized upload described earlier.

To evaluate the effectiveness as well as the cost of pushing process-ing to each mobile phone, we loaded two Nokia N80 smart phoneswith the new version ofCampaignrdescribed above, which pro-viding local image processing and prioritized upload functionality.Two participants wore the phone for three hours, collecting626images on average. Participants were asked to go about theirdailyroutines as they wore the camera, however as in theFullDayDi-etary Pilot, it was worn only outside of the house so as to reduceprivacy concerns. To evaluate the effectiveness of prioritized up-load, upload was disabled during the data capture phase of the pi-lot. To fairly compare between basic upload and prioritizedupload,we collected the images from one phone, copied the files before up-load, and configured two phones to upload the same image set usingdifferent upload policies. We present the results from thispilot insection 5.3.

5. EVALUATIONWe evaluated system performance with respect to the followingmetrics: the ratio of images classified as “high quality” comparedto the total number of images, which is a measure of the efficacyof applying image processing filters on the server and mobilede-vice, the performance ofImage Processing Serverin identifyingand annotating low quality images compared to manually annotated“ground truth”, and the time required to determine image quality.Since the latter is the major latency of the system, it is indicativeof the overall latency of our system and ultimately can be used todetermine the number of requiredImage Processing Servers.

We furthermore evaluate the performance of thein situ processingof the images on the phone with respect to the following metrics:performance of the local image classifier, its latency and its energyconsumption. The evaluation of energy consumption of the localclassifier determines the impact of thein situ image processing onthe lifetime of the phone. Finally, we investigate the advantagesof our prioritized image upload in improving the interaction of theuser with the system.

5.1 Image Processing Server PerformanceThe performance evaluation was performed on three individual ma-chines. A centralData Management Servermaintained a MySQL5.0 DBMS with access to all the image information. The serveralso ran the orchestration application that delegated the image pro-cessing tasks to theImage Processing Servers and updated the localdatabase. A high-performance setup, a Dual-Core AMD Opteron,1800 MHz with 2048 MB DRR RAM, ensured a minimal turn-around time for any requests to the management server. TheImageProcessing Servers were run in parallel on the local Ethernet net-work.

5.1.1 Server-side Image-Quality ClassificationTo investigate performance of server-side image quality classifica-tion, we used 200 randomly selected images from theFullDayDi-etary Pilot for classifier training and evaluation. To determine the

Page 9: Rewind: Leveraging Everyday Mobile Phones for Targeted

(a) (b) (c) (d) (e) (f)

Figure 6: This Figure illustrates sample images from theFullDayDietary Pilot. (a) and (b) Clear images from the food intake ofparticipants, (c) a Blurred image due to the motion of individual carrying the phone, (d)a Blurred image due to the motion of thesubjects, (e) an under-image due to poor lighting (Exposed), (f) an exposed and blurred image (BlurExposed).

Figure 7: Percentage of each image class in the groundtruthgallery from the FullDayDietary Pilot

ground truth class membership of each, we asked three peopletobrowse the data and mark the category of each image from the fourcategories ofClear, Blurred, ExposedandBlurExposed. We thenpicked the popular vote as the category of each image in the groundtruth data. We furthermore divided the data set into two equally-sized and randomly selected parts for training and evaluating theperformance of the classification.

Figure 6 illustrates a sample set of images taken during theFull-DayDietary Pilot.As described earlier many images were blurredeither due to the motion of the individual carrying the phoneor dueto the motion of the subjects in the images, and many other im-ages were deemed to be either under- or over-exposed due to thepoor lighting conditions. Figure7 illustrates the percentage of eachclass in the ground truth data as marked by the individuals. In thiscase, only 45 % of the images were marked as clear images by theusers and more than 24 % of the images were marked as blurred and31% of the images were marked as exposed. The high number ofblurred, under-, or over-exposed images motivated us to design thein situ image processing and prioritized upload features on the mo-bile phone to filter out the images that would otherwise be uploadedand filtered in theImage Processing Server. We also note that thenumber of blurred, under-, or over-exposed images is applicationspecific. For instance, theFullDayDietary Pilotresulted in a higherpotential filtering rate since images were captured throughout theentire day where mobile phone’s cameras were more exposed to

private situations where people intentionally blocked thecamera oractivities requiring many movements, whereas thePosterSession-Capture Pilotresulted in a lower filtering rate since images weretaken during a short period of time (an hour) in a very public set-ting where participants often stayed stationary when visiting theresearch posters.

Figure 8: The decision tree classifier on theImage ProcessingServer, classifies the images into four different classes.

Figure 8 illustrates the decision tree classifier that runs on theImageProcessing Server. The classifier uses four features including meanand standard deviation of intensity, the number of edges in the im-age, and finally the sum of a subset of Discrete Cosine Transform(DCT) coefficients of the image to classify each image. To deter-mine the structure of the tree we used the ground truth data fromtheFullDayDietary Pilotas marked by the individual.

The confusion matrix shown in Table 1 summarizes the perfor-mance of theImage Processing Serverin classifying and labeling

Inferred LabelTruth Clear Blurred Exposed BlurExposed FNClear 42 3 0 1Blurred 2 16 5 1 2Exposed 0 1 22 0 0BlurExposed 0 1 2 3 0

FP 3 0 1

Table 1: Confusion Matrix for the back-end Image ProcessingServer classifier. (FN: False Negative, FP: False Positive)

Page 10: Rewind: Leveraging Everyday Mobile Phones for Targeted

the test images. Since the emphasis is recognizing the clearimages,an instance was considered a False Positive if a clear image was de-tected when it was not clear and was considered a False Negative ifa clear image was not detected when it was clear. It illustrates thatmore than 83% of images were correctly classified. It also showsthat 93% of the clear images correctly classified as clear imagesand only 2% of low quality images were incorrectly classifiedasclear and hence potentially shown to the user.

5.1.2 Server-side Image Processing PerformanceTo determine the image processing overhead of the overall systemwhen processing occurs on the server-side, we benchmarked the la-tency of each component of our system for processing each image.This includes 1) the overhead of theData Management Servertolook up the local database for images and send the image to theIm-age Processing Server2) the latency of the web interface inImageProcessing Serverand 3) the latency of the Matlab environment toload the image and convert it to gray-scale for processing. In ad-dition, we benchmarked the amount of time it takes for extractingfeatures of the images.

The parallel system design allowed us to use two low-performanceservers for the image processing. Although these machines wereequipped with relatively outmoded hardware, the service was ableto run in a semi-real time. A quadro Pentium III machine running at700 MHz and a Dual-Core AMD Opteron, 1000 MHz were initiallychosen as experimental machines. The throughput rate made it un-necessary to upgrade the machines as more users were connectedto the system.

Figure 9 illustrates the image processing latency of our system. Itshows that latency is dominated by image processing computationnamely extracting theDCT feature, edge featureandmean/standarddeviation feature. Furthermore, it illustrates a very small deviationin processing latency of each image. Consequently, we were ableto process all images within a roughly fixed predetermined timeperiod.

The Image Processing Servers were used in two scenarios; a real-time application and a scheduled application. We required the real-time application to handle 1.5 images/sec. In this case, theexpectedprocessing time per image exceeds the rate at which images wereacquired. A large array of IPS machines would have been requiredto maintain real-time latency. Instead, we chose to reduce the num-ber of processing steps to include only standard deviation and meanintensity. We estimated the total expected end-to-end latency as0.95 seconds. In the scheduled application setting the off-line pro-cess was allowed to run 7 hours. The large time-span allowed us toapply all filters to each image. The average total latency perimagewas estimated to 8.02 seconds. Consequently, twoImage Process-ing Servermachines running in parallel could process 6300 imageseach night. The approximated throughput rate far exceeded the ac-tual demand per user, each of whom had an upper-bound of 1000images/day, and enabled the support for multiple users a day.

5.2 Image Processing On Mobile PhonesIn our study, we use the Emulator Debugger from the Nokia S60Development Tool to measure the classifier performance, andtheNokia N80 itself to profile the cost of the classifier. The emula-tion enables scalable collection of training data and measurementof the classification performance on the large set of the images,and the performance measurements from the Nokia N80 phone en-ables accurate measurement of the classification cost on thephone

Figure 9: Components of image feature extraction latency onImage Processing Server.

hardware. To implement image processing tasks on the device, weused the NokiaCV library (Nokia Computer Vision Library) whichis designed to support computer vision applications in a Symbianenvironment by Nokia Research Center [1].

5.2.1 Client-side Image Processing PerformanceWe considered the constraints of the phone in designing the localon-device classifier. The latency overhead of the image processingand further classification should be reasonably less than the rate ofcapturing the images. As described previously, in theDietaryRecallPilot, Rewind captures the images at 10 second intervals; hence thelocal classification latency should be reasonably less thanthe 10second image capturing intervals.

To design a low cost classifier on the phone, we first benchmarkedthe overhead of extracting each feature of interest on the phone. Ta-ble 2 illustrates the overhead of extracting mean/standarddeviationof intensity, the number of edges based on the Sobel edge detectionalgorithm, and the JPEG image file size on the device. In addition,for mean/standard deviation of intensity and edge count features,there is an additional fixed delay for converting the images fromthe RGB to Gray-scale format. Although we implemented the edgecount feature on the phone, we did not consider it in our final localclassifier due to its high overhead of 32 seconds. In addition, sinceour server side benchmarking of the DCT feature (Figure 9) in-dicated an even higher computation latency compared to the edgecount feature, we did not consider the DCT feature for the localdevice classifier since it was unlikely that it would end up beingan effective choice due to overhead, and thus did not warrantthedevelopment time.

In our scheme, we used the JPEG image file size (which is readilyavailable when the image is captured) instead of using the expen-

Color Conversion Mean and Std. Edge Count File Size

1.91 (sec) 2.97 (sec) 32.13 (sec) 0 (sec)

Table 2: The latency of extracting each feature on the phone.

Page 11: Rewind: Leveraging Everyday Mobile Phones for Targeted

Figure 10: This figure illustrates a strong relationship betweenthe DCT based feature used on theImage Processing Server andthe JPEG file size used on the phone. It also indicates the dis-tribution of image classes vs. the features.

Inferred LabelTruth Clear Blurred Exposed BlurExposed FNClear 43 3 0 0Blurred 3 19 2 0 3Exposed 2 3 17 1 2BlurExposed 3 1 0 2 3

FP 3 0 0

Table 3: Confusion Matrix for the classifier used on the phone.(FN: False Negative, FP: False Positive)

sive edge count and DCT feature. The intuition behind using filesize is that the JPEG compression is pipe-lined through a processthat consists of application of DCT to multiple 8 by 8 macro-blocksof images, quantization and coding. We also found a high correla-tion between edge count and JPEG file size. Images with moreedges tend to require a larger image size after JPEG compression.Hence, we exploited these high correlations by replacing the edgecount and DCT features we used on the back-end server with JPEGimage size and gained a reasonable performance. Figure10 illus-trates the scatter plot of the DCT based feature and JPEG sizefea-ture, and Figure 11 illustrates the scatter plot of the edge countfeature and JPEG size feature for images of different classes. Theysuggest a strong linear relationship between the two features andJPEG image feature. Additionally, the distribution of the pointsamong classes suggests that both features can be exploited to dis-tinguish among different image classes.

5.2.2 Client-side Image-Quality ClassificationThe most discriminative feature, as shown in Figure 8 is the “num-ber of edges”. However, due to its significant latency cost ofcom-puting this feature (as well as the DCT feature), we built a lowlatency classifier (Figure 12) that only uses the mean, standard de-viation and JPEG file size and compared the performance and la-tency . To train and evaluate the performance of the local classifier,we used the same gallery of images from theFullDayDietary Pilotstudy that we used for training the server side classifier.

The confusion matrix shown in Table 3 summarizes the perfor-

Figure 11: This figure illustrates a strong relationship betweenthe edge count feature used on theImage Processing Server andthe JPEG file size used on the phone. It also indicates the dis-tribution of image classes vs. the features.

mance of thein situ classifier in classifying and labeling the testimages. The results illustrate that more than 82% of the imageswere correctly classified. More significantly, there are only 14% oflow quality images which were incorrectly classified as clear im-ages on the phone.

5.3 Upload Policy EvaluationIn this section, we first evaluate the effectiveness ofreverse chrono-logical order upload on improving the waiting time before view-ing the most recently captured image. Secondly, we evaluatetheeffectiveness ofprioritized uploadon improving the waiting timebefore seeing at least one image from every time cluster at the Im-age Viewer. Lastly, we investigate howprioritized uploadcan im-prove the quality of the first uploaded image on each time cluster.Campaignrwas configured to upload five images every five sec-onds when an upload connection was available. To simulate anex-tended disconnection, we disabled upload while capturing imagesfor one to three hours. After the collection, the captured imageswere copied to several other mobile phones with different uploadpolicies. We present results from Pilot described in Section 4.4using both GSM modem and 802.11b/g connectivity.

5.3.1 Reverse Chronological Order UploadWe compare the waiting time of chronological order upload versusreverse chronological order upload in Table 4. The latency pro-vided here is measured from the time whenCampaignrstarts up-loading images until the back-end server receives the most recentlycaptured image. As the chronological order upload needs to uploadthe pre-captured images first, the waiting time was proportional tothe disconnection time. As expected, the reverse chronological or-der upload showed a negligible waiting time irrelevant to the dis-connection time.

5.3.2 Prioritized UploadTo evaluate the effectiveness of Rewind’s prioritized upload on re-ducing this waiting time, we define completion delay as the timeduration from when one of the cluster receives a image for thefirst

Page 12: Rewind: Leveraging Everyday Mobile Phones for Targeted

Figure 12: The low cost decision tree classifier used on thephone.

1 hour 2 hours 3 hours

chronological 201 (sec) 408 (sec) 561 (sec)reverse chronological 5 (sec) 4 (sec) 5 (sec)

(a) 802.11b/g1 hour 2 hours 3 hours

chronological 1022 (sec) 2192 (sec) 7399 (sec)reverse chronological 25 (sec) 24 (sec) 29 (sec)

(b) GSM modem

Table 4: Waiting time before viewing the most recently cap-tured image. Comparison of chronological order upload versusreverse chronological order upload following different discon-nection times (1, 2, 3 hours).

time until every cluster receives at least a single image. Uniformtime clustering with two minutes interval was used. First, we mea-sured the completion delay when images were collected but notuploaded for one to three hours using basic chronological orderupload. Then, we measure the completion delay with the same set-tings except this time using prioritized upload described in Section3.4.2. Table 5 shows that prioritized upload policy requires onlyabout 13 to 25 % of the time a simple chronological order uploadrequires.

Rewind’s prioritized upload policy can also order the upload bythe quality of image based on the class membership of each im-age. Within the same cluster, the prioritized-upload policy uploads

1 hour 2 hours 3 hours

Chronological 191 (sec) 394 (sec) 545 (sec)Prioritized 25 (sec) 99 (sec) 121 (sec)

(a) 802.11b/g1 hour 2 hours 3 hours

Chronological 953 (sec) 2126 (sec) 7282 (sec)Prioritized 124 (sec) 242 (sec) 380 (sec)

(b) GSM modem

Table 5: Completion delay comparison of chronological orderupload versus prioritized upload following different disconnec-tion times (1, 2, 3 hours).

high quality images before low quality ones. We briefly illustratethe effectiveness of prioritized upload compared on improving theearlier uploaded image’s quality in Table 6. While the result showsthat prioritized upload does help improving the distribution of highquality images, we could also observe that the performance gaincould sometimes be insignificant when the quality of every imagewithin the cluster is low.

BurrExposed Exposed Blurred Clear

Non-Prioritized 0 12 3 65Prioritized 0 4 1 75

Table 6: The quality distribution of the first uploaded imagefrom 80 clusters

6. CONCLUSION AND FUTURE WORKWe developed a targeted assisted recall system providing imagecapture, upload, filtering, and presentation of images fromSym-bian S60 handsets, and deployed it in several tests including a “live”epidemiological study on dietary intake. The system was designedto handle many simultaneous users by distributing the imagepro-cessing logic to network-accessible servers using standardized webinterface, and in later designs leveraged image processingon themobile phones themselves to reduce wireless-channel congestion.Image filtering was first implemented on the server to reduce imagebrowsing overhead for the user. In the enhanced version of the sys-tem, a time-efficient classification algorithm was implemented onthe mobile phones to filter out poor-quality images by characteriz-ing the mean intensity, standard deviation intensity, and JPEG filesize of the image. Processing on the phones also enabled the sys-tem to adapt to the varying connectivity experienced with wirelessinfrastructure by using prioritized upload mechanisms once con-nectivity is reestablished.

The system’s web-based processing framework supports rapid pro-totyping of server-side image processing. The core applicationframework was built on top of Matlab using well defined interfacesthat can be extended in the future to plug in to R or custom pro-cessing routines. The scalability of the service-side image handlingframework was addressed by centralizing decision processes ontoa dedicatedData Management Server. The image processing itselfwas delegated in a round-robin fashion by theData ManagementServerto multiple instances of theImage Processing Server. Theneed for global cache migration was eliminated by centralizing thedecision processes to allow moreImage Processing Servers to beadded to the system as need arises.

The system was successfully used in a dietary intake study incol-laboration with UCLA’s medical school as part of an ongoing NIHstudy on the effectiveness of web-based dietary recall survey tech-niques. The high sensitivity of the data gathered in this study man-dated the implementation of firm security policies to avoid dataleakage and empower the research subject to fully control theirown information. Privacy protection must be a foremost concernfor designers and users of systems that capture personally identifi-able data. Images in particular are a highly revealing and thereforesensitive form of documentation. We designed our system to re-quire the end-user to approve any information before it could bereleased to the personnel conducting and coordinating the study, letalone into the public domain. Additionally, all communication wasencrypted and data was automatically reaped if no user action wastaken to explicitly share an image. Two internal pilots identified theperformance metrics of the system and proved the scalability and

Page 13: Rewind: Leveraging Everyday Mobile Phones for Targeted

flexibility of the developed end-to-end system.

Our future work will focus on:

• More extensive evaluation of the systems effectiveness in di-etary and other targeted recall applications

• Larger scale systematic studies to identify where additionaltechnical advances are most needed

• Simple mechanical improvements such as the addition of afish-eye lens and an improved lanyard

• Integration of our system with other existing image browsingand navigation tools such as Picasa and Flickr

• Development of more sophisticated image filtering and clus-tering mechanisms, including those that take advantage ofthe specificity of particular target applications

7. ACKNOWLEDGEMENTWe acknowledge Joe Kim, August Joki, and Haleh Tabrizi for theirimplementation, and Teresa Ko and Brian Fulkerson from VisionLab at UCLA for their insightful comments and opinions. We alsoacknowledge Sasank Reddy, Andrew Parker, and Josh Hyman fortheir contributions to the design of the initial prototype of Diet-Sense. This work was supported partially by CENS CooperativeAgreement #CCR-0120778, NSF-FIND, Nokia Research Center,and UC Micro.

8. REFERENCES[1] Nokia computer vision library.http://research.

nokia.com/research/projects/nokiacv/.[2] S. Antani, R. Kasturi, and R. Jain. A survey on the use of

pattern recognition methods for abstraction, indexing andretrieval of images and video.Pattern Recognition,35(4):945–965, 2002.

[3] S. Babin.Developing Software for Symbian OS. SymbianPress/Wiley, second edition edition.

[4] B. Bederson. Photomesa: A zoomable image browser usingquantum treemaps and bubblemaps. 2001.

[5] E. Berry, N. Kapur, L. Williams, S. Hodges, P. Watson,G. Smyth, J. Srinivasan, R. Smith, B. Wilson, and K. Wood.The use of a wearable camera, sensecam, as a pictorial diaryto improve autobiographical memory in a patient with limbicencephalitis.Special issue of NeuropsychologicalRehabilitation, 2007.

[6] A. R. Doherty, A. F. Smeaton, K. Lee, and D. P. Ellis.Multimodal segmentation of lifelog data. Pittsburgh, PA,2007.

[7] R. Fablet, P. Bouthemy, and P. Pérez. Nonparametric motioncharacterization using causal probabilistic models for videoindexing and retrieval.IEEE Transactions on ImageProcessing, 11(4):393–407, 2002.

[8] U. Gargi, Y. Deng, and D. R. Tretter. Managing andsearching personal photo collections. InHP Labs TechnicalReport HPL-2002-67, 2002.

[9] J. Gemmell, G. Bell, and R. Lueder. Mylifebits: a personaldatabase for everything.Commun. ACM, 49(1):88–95, 2006.

[10] J. Gemmell, L. Williams, K. Wood, R. Lueder, and G. Bell.Passive capture and ensuing issues for a personal lifetimestore. InProceedings of the the 1st ACM workshop on

continuous archival and retrieval of personal experiences,New York, USA.

[11] A. Girgensohn, J. Adcock, M. Cooper, J. Foote, andL. Wilcox. Simplifying the management of large photocollections. InIn Proc. of INTERACT’03, 2003.

[12] J. Healey and R. Picard. Startlecam: A cybernetic wearablecamera. InProceedings of the Second InternationalSymposium on Wearable Computing, 1998.

[13] S. Hodges, L. Williams, E. Berry, S. Izadi, J. Srinivasan,A. Butler, G. Smyth, N. Kapur, and K. Wood. Sensecam: aretrospective memory aid. Springer-Verlag BerlinHeidelberg, 2006.

[14] T. Hori and K. Aizawa. Context-based video retrieval systemfor the life-log applications. InProceedings of the 5th ACMSIGMM international workshop on Multimedia informationretrieval, Berkeley, California, USA.

[15] R. R. Hunt, H. C. Ellis, , and H. Ellis.Fundamentals ofCognitive Psychology. McGraw-Hill Humanities/SocialSciences/Languages, 2003.

[16] A. Joki, J. A. Burke, and D. Estrin. Campaignr: A frameworkfor participatory data collection on mobile phones.

[17] S. Lefèvre, J. Holler, and N. Vincent. A review of real-timesegmentation of uncompressed video sequences forcontent-based search and retrieval.Real-Time Imaging,9(1):73–98, 2003.

[18] X. Marichal, W.-Y. Ma, and H. Zhang. Blur determinationinthe compressed domain using dct information. 1999.

[19] S. Niedzviecki and H. Mann. Doubleday Canada, Limited,2001.

[20] S. Reddy, A. Parker, J. Hyman, J. Burke, M. Hansen, andD. Estrin. Image browsing, processing, and clustering forparticipatory sensing: Lessons from a dietsense prototype.

[21] D. C. Rubin.Remembering Our Past: Studies inAutobiographical Memory. Cambridge University Press,1999.

[22] N. Slimani, P. Ferrari, M. Ocke, A. Welch, H. Boeing,M. van Liere, V. Pala, P. Amiano, A. Lagiou, I. Mattisson,C. Stripp, D. Engeset, R. Charrondie‘re, M. Buzzard, W. vanStaveren, and E. Riboli. Standardization of the 24-hour dietrecall calibration method used in the european prospectiveinvestigation into cancer and nutrition (epic): generalconcepts and preliminary results, 2000.

[23] H. Tong, M. Li, H. Zhang, and C. Zhang. Blur detection fordigital images using wavelet transform. 2004.

[24] R. Vidal, S. Soatto, Y. Ma, and S. Sastry. Segmentation ofdynamic scenes from the multibody fundamental matrix,2002.