9
How People Habituate to Mobile Security Warnings in Daily Life: A Longitudinal Field Study Jeff Jenkins, Brock Kirwan, Daniel Bjornn, Bonnie Brinton Anderson, Anthony Vance Brigham Young University {jeffrey_jenkins, kirwan, dbjornn, bonnie_anderson, anthony.vance}@byu.edu Abstract Research in the fields of information security and human–computer interaction has shown that habit- uation—decreased response to repeated stimula- tion—is a serious threat to the effectiveness of se- curity warnings. Although habituation is a phe- nomenon that develops over time, past studies have only examined this problem cross-sectionally. Fur- ther, past studies have not examined how habitua- tion influences actual security warning behavior in the field. For these reasons, the full extent of the problem is unknown. We addressed these gaps by conducting a three- week field experiment in which users were natural- ly exposed to privacy permission warnings as they installed apps on their mobile devices. We found that (1) users’ warning adherence substantially de- creased over the three weeks, validating previous cross-sectional studies, (2) the general decline in warning adherence was partially offset by a recov- ery effect—a key characteristic of habituation— when permission warnings were not displayed be- tween days, and (3) for users who received poly- morphic permission warnings—warnings that up- date their appearance with each repeated expo- sure—adherence dropped at a substantially lower rate and remained high after three weeks compared to users who received standard warnings. These findings provide the most complete view yet of the problem of habituation to security warnings and demonstrate that polymorphic warnings can substantially improve warning adherence behavior. Keywords: habituation, security warning, longitu- dinal field experiment, mobile devices. 1. Introduction Research in the fields of information systems and human–computer interaction has shown that habit- uation—“decreased response to repeated stimula- tion” [26, p.419]—is a serious threat to the effec- tiveness of security warnings. However, past stud- ies share three critical limitations. First, they only examined habituation cross-sectionally (see Table 1). This is a substantial limitation, because habitua- tion is a phenomenon that develops over time [17]. Furthermore, a key characteristic of habituation is recovery—the increase of a response after a rest period in which the stimulus is absent [17]. With- out a longitudinal design, it is not possible to exam- ine whether recovery can sufficiently counteract the effect of habituation to warnings. Second, past studies did not examine how habitua- tion influences actual warning adherence behavior in the field but instead used laboratory experiments that presented unrealistically high numbers of warnings to participants in a short session. Because users typically receive security warnings infre- quently, presenting an artificially high number of warnings in a short time is too far removed from real life to be ecologically valid [24]. Consequent- ly, for these reasons, the full extent of the problem of habituation is unknown. Third, previous research [3; 4] proposed that re- peatedly updating the appearance of a warning (i.e., a polymorphic warning design) can be effective in reducing habituation. However, their findings were subject to the same limitations above. Therefore, it is not clear (1) whether polymorphic warnings are effective over time or if users will quickly learn to ignore them and (2) whether the polymorphic de- sign can actually lead to better security warning behavior. We address these gaps in this paper by presenting the results of a longitudinal three-week field exper- iment in which users were naturally exposed to privacy permission warnings as they installed apps on their mobile devices. Consistent with previous cross-sectional experimental results, users’ warning adherence behavior substantially decreased over the three weeks. However, for users who received polymorphic permission warnings, adherence dropped at a substantially lower rate and remained high after three weeks compared to users who re- ceived standard warnings. Together, these findings provide the most complete view yet of the problem of habituation to security warnings and demon- strate that polymorphic warnings can substantially improve warning adherence behavior. 2. Literature Review and Theory Habituation has been identified as a key contributor to the failure of warnings [14; 19; 20]. Several re- searchers have inferred warning habituation in cross-sectional laboratory experiments [2-4; 8; 11; 12; 15; 19; 23; 25]. In addition, two studies sup- ported cross-sectional habituation in warning ad- herence behavior using Amazon Mechanical Turk [6; 7]. While these studies provide important in- sights into the problem of habituation to security warnings, they share a fundamental limitation: they only examine a single point in time. However, in the fields of neuroscience and neurobiology, it is well recognized that the effects of habituation

How People Habituate to Mobile Security Warnings in …rja14/shb17/vance2.pdf · How People Habituate to Mobile Security Warnings in Daily Life: A Longitudinal Field Study Jeff Jenkins,

Embed Size (px)

Citation preview

How People Habituate to Mobile Security Warnings in Daily Life: A Longitudinal Field Study

Jeff Jenkins, Brock Kirwan, Daniel Bjornn, Bonnie Brinton Anderson, Anthony Vance Brigham Young University

{jeffrey_jenkins, kirwan, dbjornn, bonnie_anderson, anthony.vance}@byu.edu Abstract Research in the fields of information security and human–computer interaction has shown that habit-uation—decreased response to repeated stimula-tion—is a serious threat to the effectiveness of se-curity warnings. Although habituation is a phe-nomenon that develops over time, past studies have only examined this problem cross-sectionally. Fur-ther, past studies have not examined how habitua-tion influences actual security warning behavior in the field. For these reasons, the full extent of the problem is unknown. We addressed these gaps by conducting a three-week field experiment in which users were natural-ly exposed to privacy permission warnings as they installed apps on their mobile devices. We found that (1) users’ warning adherence substantially de-creased over the three weeks, validating previous cross-sectional studies, (2) the general decline in warning adherence was partially offset by a recov-ery effect—a key characteristic of habituation—when permission warnings were not displayed be-tween days, and (3) for users who received poly-morphic permission warnings—warnings that up-date their appearance with each repeated expo-sure—adherence dropped at a substantially lower rate and remained high after three weeks compared to users who received standard warnings. These findings provide the most complete view yet of the problem of habituation to security warnings and demonstrate that polymorphic warnings can substantially improve warning adherence behavior. Keywords: habituation, security warning, longitu-dinal field experiment, mobile devices.

1. Introduction Research in the fields of information systems and human–computer interaction has shown that habit-uation—“decreased response to repeated stimula-tion” [26, p.419]—is a serious threat to the effec-tiveness of security warnings. However, past stud-ies share three critical limitations. First, they only examined habituation cross-sectionally (see Table 1). This is a substantial limitation, because habitua-tion is a phenomenon that develops over time [17]. Furthermore, a key characteristic of habituation is recovery—the increase of a response after a rest period in which the stimulus is absent [17]. With-out a longitudinal design, it is not possible to exam-ine whether recovery can sufficiently counteract the effect of habituation to warnings.

Second, past studies did not examine how habitua-tion influences actual warning adherence behavior in the field but instead used laboratory experiments that presented unrealistically high numbers of warnings to participants in a short session. Because users typically receive security warnings infre-quently, presenting an artificially high number of warnings in a short time is too far removed from real life to be ecologically valid [24]. Consequent-ly, for these reasons, the full extent of the problem of habituation is unknown. Third, previous research [3; 4] proposed that re-peatedly updating the appearance of a warning (i.e., a polymorphic warning design) can be effective in reducing habituation. However, their findings were subject to the same limitations above. Therefore, it is not clear (1) whether polymorphic warnings are effective over time or if users will quickly learn to ignore them and (2) whether the polymorphic de-sign can actually lead to better security warning behavior. We address these gaps in this paper by presenting the results of a longitudinal three-week field exper-iment in which users were naturally exposed to privacy permission warnings as they installed apps on their mobile devices. Consistent with previous cross-sectional experimental results, users’ warning adherence behavior substantially decreased over the three weeks. However, for users who received polymorphic permission warnings, adherence dropped at a substantially lower rate and remained high after three weeks compared to users who re-ceived standard warnings. Together, these findings provide the most complete view yet of the problem of habituation to security warnings and demon-strate that polymorphic warnings can substantially improve warning adherence behavior.

2. Literature Review and Theory Habituation has been identified as a key contributor to the failure of warnings [14; 19; 20]. Several re-searchers have inferred warning habituation in cross-sectional laboratory experiments [2-4; 8; 11; 12; 15; 19; 23; 25]. In addition, two studies sup-ported cross-sectional habituation in warning ad-herence behavior using Amazon Mechanical Turk [6; 7]. While these studies provide important in-sights into the problem of habituation to security warnings, they share a fundamental limitation: they only examine a single point in time. However, in the fields of neuroscience and neurobiology, it is well recognized that the effects of habituation

change over time [17]. For this reason, cross-sectional studies can only provide a partial view of the effects of habituation. For example, the two most prevalent characteristics of habituation are (1) response decrement—an attenuation of a response after multiple exposures—and (2) response recov-ery—the increase of a response after a rest period in which the stimulus is absent [17]. Without a lon-gitudinal design, it is not possible to observe how (or whether) users recover from habituation to warnings between exposures. For this reason, it is not clear from previous cross-sectional research whether response recovery can offset the negative impact of response decrements observed in previ-ous habituation research.

Hypotheses 1 and 2 explore how users become less accurate in rejecting risky permissions over time with repeated viewings and how polymorphic warnings can mitigate this effect. Hypotheses 3 and 4 explore how users’ responses to warnings recover after the warning is withheld and how polymorphic warnings enhance this recovery.

2.1 Response Decrement

We first hypothesize that users’ accuracy in reject-ing risky app permissions will decrease when view-ing multiple warnings across days. Dual-process theory (DPT) [13] states that when users see a re-peated stimulus, they compare it to a mental model of that stimulus. If the two match, users evaluate the actual stimulus less carefully and rely on the mental model instead. This is referred to as a “re-sponse decrement,” and may result in paying less attention and responding less thoughtfully to the stimulus. In the context of mobile app permission warnings, users will unconsciously compare warn-ings to their mental model of warnings they have seen when previously downloading other apps. If users determine that a warning is similar to the mental model (even if, in fact, it lists different per-missions), they will give it less attention. In future exposures, users will rely even more on the model and respond even less thoughtfully. As a result, users who view similar permission warnings over time will give less attention to them, and habitua-tion will inhibit the ability to identify and reject risky permission warnings.

H1: Multiple exposures to permission warnings over time will decrease users’ accuracy in rejecting risky permission warnings.

We hypothesize that users will habituate more slowly to polymorphic permission warnings—permission warnings that change their appearance with repeated exposures [2]—than to static permis-sion warnings. Wogalter states that “habituation can occur even with well-designed warnings… Where feasible, changing the warning’s appearance may be useful in reinvigorating attention switch

previously lost because of habituation” [28, p. 55]. Changing the appearance of a warning creates nov-elty, and the warning will therefore be less similar to existing mental models. As a result of this dis-similarity, the response strength will recover [22]. DPT describes this as sensitization, an energizing process that strengthens attention [13]. Sensitiza-tion counterbalances or decreases habituation [17]. As a result of sensitization, users will pay closer attention to warnings and reject risky permissions more accurately.

H2: Users’ accuracy in rejecting risky permission warnings over time will decrease more slowly when viewing polymorphic warnings as compared to static warnings.

2.2 Response Recovery

Although users will habituate to warnings, we pre-dict that they will partially recover from the habitu-ation after a rest period. Decay theory [5] explains that memory becomes weaker due to the passage of time. When a warning is withheld for some time, the mental model of the warning weakens. There-fore, when users see a warning in the future, it will be less likely to match the mental model and will appear novel, increasing sensitization and users’ attention to the warning [9]. This time between warnings should thus result in an increase in accu-racy in rejecting risky permissions.

H3: Time between warnings will improve users’ accuracy in rejecting risky permission warnings.

We predict that the amount of recovery after a time period will be greater for polymorphic warnings than for static warnings. As previously discussed, the mental models of polymorphic warnings are weaker and less stable than the models of static warnings. Less stable mental models (i.e., mental models that have not received as much reinforce-ment) fade more quickly than stable models [17]. Thus, after users have not seen a warning for a time period, they are more likely to perceive the poly-morphic warning as novel. As a result, the respons-es of users who view polymorphic warnings will recover to a greater degree than the responses of users who view static warnings.

H4: Users’ accuracy in rejecting risky permission warnings over time will increase more after a withholding time period when viewing polymorphic warnings as compared to static warnings.

3. Experimental Design

3.1 Motivation

To test our hypotheses in an ecologically valid con-text, we examined longitudinal habituation to mo-bile app permission warnings. Users see many noti-fications on mobile devices daily. An analysis of

40,191 Android users suggests that users encounter an average of nearly 100 notifications per day on their mobile phones (app notifications, email noti-fications, system notifications, etc.) [21].

A subset of these notifications reflects app permis-sion warnings (i.e., warnings that are shown before an app is granted access to information or re-sources). These warnings can be shown when the app is downloaded or when the app attempts to access a resource (i.e., just-in-time warnings). Per-mission warnings are frequent. In 2015, 25 billion iOS apps and 50 billion Android apps were down-loaded by smartphone users. The average Android user has 95 apps installed on his or her mobile de-vice, most of which displayed a permission warn-ing during installation or use [18]. Furthermore, users often see multiple permission warnings in a short period of time during an interaction. For ex-ample, when configuring a new phone, people may download many apps (and see many warnings) in a short period of time. When using apps with just-in-time warnings, it is typical for the user to see a se-ries of separate permission requests when first opening the app. Furthermore, when evaluating apps, it is common for people to download multiple apps in a short period of time. Mobile apps there-fore represent a realistic scenario where people frequently see a given warning (permission warn-ing), and thus this is an appropriate context for studying longitudinal habituation to security mes-sages.

In this experiment, we asked participants to evalu-ate apps at a third-party Android app store. Third-party app stores are common on the Android plat-form (e.g., Amazon Underground, Getjar, Mob-ogenie, Slideme, Appbrain, Aptoide Cloud Store, BAM, Top Apps, AppGratis, Myapp, MIUI, Baidu, and F-Droid). Some of these compete with the Google Play app store by offering app specials (e.g., free or reduced-price apps) and serve markets that have restricted access to Google Play (e.g., GetJar in China). Others complement Google Play by providing customized experiences (e.g., app of the day, in-depth app reviews, categorized apps, or recommended apps) with apps that link directly to Google Play (e.g., AppGratis). Some of these stores are standalone apps that can be downloaded (e.g., Amazon Underground), while others must be ac-cessed via a web browser on the Android phone (e.g., Mobogenie). In our experiment, we created a browser-based third-party app store and monitored participants’ responses to permission warnings across time.

3.2 Participants

Participants were students from a variety of majors recruited at a university in the western United States. They received course credit for their partic-ipation in the experiment. Of an initial group of

134 subjects, 26 failed to participate past the first week, so we had 108 valid responses. These sub-jects were 63% male and had average age of 21.9 years (SD 2 years). In addition to the extra credit, and to encourage them to continue participating in the study, participants were given $10 for complet-ing the first week, $10 for completing the second week, and an additional $20 if they completed all days in the third week, for a total of $40.

3.3 Ethics

The university’s Institutional Review Board ap-proved the deception protocol described below. After the experiment, participants were debriefed on its true purpose.

3.4 Study Design

Participants were asked to rank apps on an app store created specifically for this study (see Figure 1). This operated as a legitimate app store, and par-ticipants were unaware that it was created and managed by the research team for the purpose of the experiment. The study used a deception proto-col to increase realism. The store was presented as a third-party app store not affiliated with the re-search team. We told participants that the purpose of our study was to observe how people rank An-droid apps in various categories.

The app store presented apps from a different cate-gory (e.g., utilities, education, entertainment, trav-el, finance) each day. Participants were instructed to download, install, and evaluate three apps within the daily category on their personal Android device and rank each app from 1 (best) to 3 (worst). Par-ticipants then completed an apparently unaffiliated daily survey from the research team that allowed them to report their results. These steps were re-peated each day for three weeks (excluding week-ends).

Figure 1. A screenshot of the web-based app store created for the field experiment.

When participants clicked to download an app, they were shown a warning listing the app’s per-missions. These permissions were randomly drawn from two categories: safe and risky (see Table 1). Safe permissions were taken from the Android De-veloper Guide [1] and were selected because we determined participants would consider these to be low-risk across app categories. We also created four risky permissions to (1) heighten respondents’ perception of risk in ignoring the permission warn-ing and installing the app and (2) ensure that the requested permission was not appropriate, regard-less of the type of app that was being downloaded.

Safe permissions

Send notifications. Set an alarm. Pair with Bluetooth devices.

Alter the phone’s time zone.

Change the size of the status bar.

Change the phone’s dis-played wallpaper.

Install shortcut icons. Uninstall shortcut icons. Connect to the inter-net.

Use vibration for notifica-tions or interactions.

Change phone vol-ume and audio set-tings.

Temporarily prevent the phone from sleeping (for viewing videos).

Ask permission to download additional features.

Risky permissions Charge purchases to user’s credit card.

Record microphone audio at any time.

Delete photos. Sell web-browsing data. Table 1. Safe and risky permissions displayed in the app store permission warnings.

As a second deception, although the research team did in fact control the apps and validated their secu-rity, participants were told:

“Be aware that the research team is not affili-ated with App-Review.org in any way, so we cannot verify that the apps are all safe. Before you download an app, be sure to check the permissions that the app requires. This app store displays the permissions before directing you to the Google Play store.

Make sure that the permissions required by the app do not contain any of the following:

• Charge purchases to your credit card. • Delete your photos. • Record microphone audio any time. • Sell your web-browsing data.

If the app has any of these permissions, DO NOT download it. These apps are potentially dangerous and can harm your privacy and/or

phone. Not only is your own device at risk if you install these apps, but if you positively re-view these apps, it will also put future users at risk. Therefore, if you review too many apps with dangerous permissions, you may not re-ceive the course credit for this experiment.”

Before starting the experiment, participants were required to pass a short quiz verifying they knew which permissions were considered risky by the researchers. This allowed for an objective measure of security behavior—whether people knowingly installed apps with these risky permissions. After participants completed the quiz, we provided the mobile URLs for the app store and the separate daily survey. In addition, we sent two daily re-minders in the morning and evening telling partici-pants to download and review three apps from the app store.

We only allowed participants to submit one catego-ry evaluation each day (we told them that this was a feature of the app store). This required them to view a realistic number of permission warnings each day and reduced the effects of fatigue. Fur-thermore, participants had to evaluate a new cate-gory each day, ensuring they did not interact with the same app twice. Apps in each category were not well known, reducing the likelihood that partic-ipants trusted a particular brand.

Each category had ten apps to choose from. When the “Download” button on an app was clicked, the app store displayed a permission warning (see Fig-ure 2, leftmost screenshot). If the user accepted the permission warning, the app installation was com-pleted through Google Play—a pattern shared by many of the “customized experience” app stores (e.g., Mobile App Store, Cloud Store, BAM, Top Apps, AppGratis). Participants did not see a per-mission warning again through Google Play when the app was downloaded.

3.4.1 Dependent Variable

Our dependent variable was whether or not partici-pants rejected apps with risky permissions. We randomized the permissions for each warning to ensure that participants would encounter at least one risky permission among the first three apps they selected. Each app beyond the first three had a 50% chance of displaying a risky permission. We recorded whether or not participants ignored warn-ings containing a risky permission.

3.4.2 Manipulations

We implemented a between-subject study design for our manipulations. Participants were randomly assigned to either the static warning or polymor-phic warning conditions when they first visited the app store. Users were required to log in to the app store, which ensured they had the same condition

across all three weeks. The static warning condition always had the same look and feel for the duration of the experiment, although the requested permis-sions changed. In contrast, the polymorphic warn-ing condition randomly changed the appearance of the permission warning each time it was shown.

We created 16 variations of the polymorphic warn-ing; half of these involved animations and half did not. Furthermore, for each participant, we random-ly iterated through four polymorphic warning var-iations per week, with a new set of four variations introduced every four days. This was done to main-tain the novelty of the polymorphic treatment. We deliberately set the interval for changing the poly-morphic versions at the fourth day of each set so we would be able to detect if the level of habitua-tion changed due to the warning treatments or merely due to the weekend and time away from the task. Examples of static and polymorphic warnings are shown in Figures 2 and 3.

Figure 2. Sample static and polymorphic permission warnings. The top left warning shows the appearance of the warning in the static condition. The other 3 warnings represent examples of the 16 variations used in the polymorphic condition.

Figure 3. Three stages of “flip,” one of eight animated polymorphic warning variations.

3.4.3 Daily Survey

After downloading and installing three apps, partic-ipants completed a survey from the research team. The survey asked participants to list and rank the three apps that they downloaded from #1 (best) to #3 (worst) for the daily category. We deliberately branded the survey as coming from the researchers and not the app store, so it would appear to be a part of a separate research project. This helped promote the story that the app store was not affili-ated with the researchers and therefore could in-deed have risky apps. To ensure that participants actually downloaded the apps, we enabled a “shar-ing” feature in the app store through which partici-pants shared which apps they downloaded with us via the app store website. Although we actually captured all behavioral data in the app store regard-less of this functionality, this again strengthened the idea that the app store was not associated with the research team.

3.4.4 Debriefing Survey

At the end of the three-week experiment, partici-pants received a debrief survey. In the survey, we

asked participants “How concerned were you about each of the following permissions?” for the safe and risky permissions listed in Table 1. The re-sponse scale was 1, “Not at all concerned,” to 7, “Extremely concerned.” The average for risky permissions was 6.03, while the average for safe permissions was 1.97, a significant difference (t = -38.9, df = 653.9, p < .001). This indicated that par-ticipants did see a difference in concern levels for the two categories of permissions. We also asked a manipulation check question to ensure that partici-pants in the experimental condition noticed the polymorphic treatment [24]. All participants in the polymorphic condition responded affirmatively.

4. Analysis We limited our data to participants who completed at least half (seven or more) days of the experi-ment. This resulted in 102 participants—55 in the static condition and 47 in the polymorphic condi-tion—who viewed 7,248 warnings over three weeks or 15 weekdays. This averaged 4.74 (SD = 1.29) permission warnings viewed per weekday, per participant. Of these, 2,695 (about one third) were apps with risky permissions. Thus, the N for our analysis was 2,695.

To analyze our data, we specified a logistic linear mixed-effects model, because it is robust to uneven observations [10]. In other words, the analysis was robust even if participants saw a different number of warnings each day. Linear mixed-effects model-ing also allows for the inclusion of fixed effects (observations that are treated as non-random or non-independent) and random effects (observations that are treated as random or independent). Thus, we accounted for the within-subject nature of our experiment by including the participant identifier as a random effect. Finally, the logistic linear mixed-effects model was designed to handle binary dependent variables, such as in our case [16].

To test H1, we included the warning number (how many warnings the participant had seen up to that point) as a fixed effect to measure the stimulus repetition. We also included the treatment as a bi-nary fixed effect (1 = polymorphic warnings, 0 = static warnings). We then included an interaction effect between the warning number and treatment to test H2. To test H3, we included the time be-tween seeing the last warning (i.e., withholding time) as a fixed effect. Finally, we included an in-teraction effect between the withholding time and the treatment to test H4. As stated previously, our dependent variable was whether the user adhered to a warning containing a risky permission and can-celled their installation of the app (coded as 1) or disregarded the warning and installed the app any-way (coded as 0).

The results are shown in Table 2. The warning number negatively predicted whether the user re-jected the app with the risky permission (β = - 0.028, p < .001). Thus, H1 was supported. Like-wise, the interaction between the warning number and treatment was significant (β = 0.013, p < .01). Participants’ accuracy in rejecting risky permission warnings decreased more slowly when viewing polymorphic warnings compared to static warn-ings, supporting H2. The withholding time posi-tively influenced accuracy, supporting H3 (β = 0.419, p < .05). However, the interaction between withholding time and polymorphic treatment was not significant, and H4 was not supported (β = 0.506, p > .05).

Estimate Std. Error

z-value

Intercept 2.122

0.209

10.145***

Warning Number

-0.028

0.003

-9.414***

Polymorphic Treatment

0.323

0.324

0.997 (ns)

Withholding Time

0.419 0.183 2.294*

Warning Number × Polymorphic Treatment

0.013

0.005 2.679**

Withholding Time × Poly-morphic Treatment

0.506

0.293 -1.726 (ns)

* p < .05; ** p < .01; *** p < .001; (ns) = not significant Table 2. Logistic mixed-effects model results predict-ing participants’ rejection of apps with risky permis-sions.

To explore the extent of the interaction between the warning number and treatment, we graphed the trends. Figure 4 displays how each treatment group’s accuracy rate (percentage correct in reject-ing risky apps) changed over the three-week (15-day) experiment as well as presenting trend lines fitted to the data. Interestingly, after the three weeks, the accuracy rate of participants in the poly-morphic condition was 76%, whereas the accuracy of participants in the static condition was 55%. This difference of 21% was significant (x2 = 7.172, df = 1, p < .01). Overall, accuracy in the polymor-phic condition dropped from 87% at the start of the three weeks to 76% at the end. In contrast, accura-cy in the static condition dropped from 87% to 55%.

Figure 4. Percentage correct in rejecting risky warn-ings across 15 weekdays for each treatment group.

5. Discussion This study makes several contributions. First, this is the first longitudinal examination of privacy or security warnings. Although the value of longitudi-nal studies is widely recognized [27], a longitudinal design is especially useful in the context of habit-uation. This is because habituation is a neurobio-logical phenomenon that develops over time [17]. Consequently, cross-sectional examinations of ha-bituation necessarily provide only a limited view of this phenomenon. For this reason, it was unclear how the results of previous habituation studies ap-plied to repeated warnings outside of a single ex-perimental session. Consistent with habituation theory, our results corroborate those of previous studies by showing that the general pattern of ha-bituation reported in cross-sectional studies does in fact carry over to longitudinal exposures to warn-ings, at least in our three-week design. This also suggests that cross-sectional examinations of habit-uation are valid proxies for longitudinal designs. We also provide the first longitudinal examination of how users habituate to permission warnings on mobile devices in the field. Previous studies of habituation to security warnings have only ob-served behavior in laboratory settings, which may not generalize to real-life usage. In addition, habit-uation to warnings may occur at a different rate for mobile compared to desktop devices. This is be-cause mobile devices are used in a wider array of contexts compared to desktop computers and there-fore may be prone to the influence of interruptions and other competing demands. Furthermore, users may receive notifications from mobile devices more frequently, because the mobile computing paradigm encourages notifications for apps based on location, movement, or other factors. Our results address these concerns by showing that habituation to mobile permission warnings is consistent with that of desktop computing contexts. Third, our results show that polymorphic warnings are resistant to habituation over time. Although Authors [3] demonstrated that polymorphic warn-ings are effective in reducing habituation in a sin-

gle laboratory session, it was unclear whether poly-morphic warnings would sustain their efficacy over time as they lose their novelty or as users become accustomed to them. Our results show that the con-trary is in fact the case: although users also habitu-ate to polymorphic warnings, they do so at a sub-stantially slower rate compared to traditional warn-ings that do not change their appearance, so much so that the gap in adherence between polymorphic and static warnings widens over time. This finding demonstrates that polymorphic warnings are an effective and cost-effective solution for mobile developers and security practitioners.

Finally, we extend prior research and capture not only the response decrement in the habituation pro-cess but also the daily recovery or increase in re-sponse strength. Although our findings do not sup-port a greater recovery associated with polymor-phic warnings, they do demonstrate that withhold-ing warnings for a time can help increase users’ sensitivity and response.

6. Conclusion This study is the first to present a longitudinal field experiment examination of security/privacy warn-ings as well as the first to address habituation to permission warnings on mobile devices. In our 15-day field experiment involving 108 people, users habituated to permission warnings over time. How-ever, the results suggest that the rate of habituation can be reduced substantially by employing poly-morphic warnings that continually update their appearance. Our longitudinal field study corrobo-rates previous cross-sectional experiments of habit-uation to warnings. In addition, our study offers new insight into how users habituate to warnings on mobile devices in everyday usage and provides strong evidence that polymorphic warnings sustain their advantage over time.

References 1 Android. Android developers guide. (2016).

https://developer.android.com/guide/topics/manifest/manifest-intro.html

2 Authors. How polymorphic warnings reduce habituation in the brain—insights from an fMRI study. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI) (2015).

3 Authors. From warnings to wallpaper: Why the brain habituates to security warnings and what can be done about it. Journal of Management Information Systems 33, 3 (2016), 713-743.

4 Authors. Your memory is working against you: How eye tracking and memory explain habituation to security warnings. Decision Support Systems 92 (2016), 3-13.

5 M.G. Berman, J. Jonides and R.L. Lewis. In search of decay in verbal short-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition 35, 2 (2009), 317-333.

6 C. Bravo-Lillo, L. Cranor, S. Komanduri, S. Schechter and M. Sleeper 2014. Harder to ignore? Revisiting pop-up fatigue and approaches to prevent it USENIX Association, 105-111.

7 C. Bravo-Lillo, S. Komanduri, L.F. Cranor, R.W. Reeder, M. Sleeper, J. Downs and S. Schechter. Your attention please: Designing security-decision uis to make genuine risks harder to ignore. In Proceedings of the Ninth Symposium on Usable Privacy and Security (2013).

8 J.C. Brustoloni and R. Villamarín-Salomón 2007. Improving security decisions with polymorphic and audited dialogs. In Proceedings of the Third symposium on Usable Privacy and Security (SOUPS 2007) ACM, New York, NY, USA, 76-85.

9 M.Ö. Çevik. Habituation, sensitization, and pavlovian conditioning. Frontiers in Integrative Neuroscience 8 (2014), 13.

10 A. Cnaan, N.M. Laird and P. Slasor. Tutorial in biostatistics: Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data. Statistics in Medicine 16 (1997), 2349-2380.

11 S. Egelman, L.F. Cranor and J. Hong 2008. You've been warned: An empirical study of the effectiveness of web browser phishing warnings. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems ACM, Florence, Italy, 1065-1074.

12 S. Egelman and S. Schechter. The importance of being earnest [in security warnings]. In Financial cryptography and data security, A.-R. SADEGHI Ed. Springer Berlin Heidelberg, 52-59, 2013.

13 P.M. Groves and R.F. Thompson. Habituation: A dual-process theory. Psychological Review 77 (1970), 419-450.

14 M. Kalsher and K. Williams. Behavioral compliance: Theory, methodology, and result. In Handbook of warnings, M. WOGALTER Ed. Lawrence Erlbaum Associates, Mahwah NJ, 313-331, 2006.

15 K. Krol, M. Moroz and M.A. Sasse. Don't work. Can't work? Why it's time to rethink security warnings. In 7th International Conference onRisk and Security of Internet and Systems (CRiSIS) (2012).

16 C.E. Mcculloch and J.M. Neuhaus. Generalized linear mixed models. John Wiley & Sons, Ltd,, Hoboken, NJ, 2001.

17 C.H. Rankin, T. Abrams, R.J. Barry, S. Bhatnagar, D.F. Clayton, J. Colombo, G. Coppola, M.A. Geyer, D.L. Glanzman, S. Marsland, F.K. Mcsweeney, D.A. Wilson, C.-F. Wu and R.F. Thompson. Habituation revisited: An updated and revised description of the behavioral characteristics of habituation. Neurobiology of Learning and Memory 92, 2 (2009), 135-138.

18 P. Sawers. Android users have an average of 95 apps installed on their phones, according to Yahoo Aviate data. (2014). http://thenextweb.com/apps/2014/08/26/android-users-average-95-apps-installed-phones-according-yahoo-aviate-data, accessed 2/16/2017.

19 S.E. Schechter, R. Dhamija, A. Ozment and I. Fischer 2007. The emperor's new security indicators. In Security and Privacy, 2007. SP'07. IEEE Symposium on IEEE, Berkeley, CA, 51-65.

20 D. Sharek, C. Swofford and M. Wogalter 2008. Failure to recognize fake internet popup warning messages. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting Sage Publications, New York, New York, 557-560.

21 A.S. Shirazi, N. Henze, T. Dingler, M. Pielot, D. Weber and A. Schmidt 2014. Large-scale assessment of mobile notifications. . In SIGCHI Conference on Human Factors in Computing Systems (CHI '14), M. JONES AND P. PALANQUE Eds. ACM, Toronto, Ontario, Canada, 3055-3064.

22 E. Sokolov. Higher nervous functions: The orienting reflux. Annual Review of Physiology 25 (1963), 545-580.

23 A. Sotirakopoulos, K. Hawkey and K. Beznosov 2011. On the challenges in usable security lab studies: Lessons learned from replicating a study on SSL warnings. In Proceedings of the Seventh Symposium on Usable Privacy and Security (SOUPS) ACM, Menlo Park, CA, 3:1-3:18.

24 D. Straub, M.-C. Boudreau and D. Gefen. Validation guidelines for is positivist research. Communications of the Association for Information Systems 13, 24 (2004), 380-427.

25 J. Sunshine, S. Egelman, H. Almuhimedi, N. Atri and L.F. Cranor 2009. Crying wolf: An empirical study of SSL warning effectiveness. In SSYM'09 Proceedings of the 18th conference on USENIX security symposium, Montreal, Canada, 399-416.

26 R.F. Thompson and W.A. Spencer. Habituation: A model phenomenon for the study of neuronal substrates of behavior. Psychological Review 73, 1 (1966), 16-43.

27 R.T. White and H.J. Arzi. Longitudinal studies: Designs, validity, practicality, and value. Research in Science Education 35 (2005), 137-149.

28 M.S. Wogalter. Communication-human information processing (C-HIP) model. In Handbook of warnings, M.S. WOGALTER Ed. Lawrence Erlbaum Associates, Mahwah, NJ, 51-61, 2006.