9
Evaluating Human-Agent Interaction in the Wild Alper Alan 1 , Enrico Costanza 1 , Sarvapali D. Ramchurn 1 , Joel Fischer 2 , Tom Rodden 2 , and Nicholas R. Jennings 1 1 Agents, Interaction and Complexity Group, University of Southampton, Southampton, UK [ata1g11, ec, sdr, nrj]@ecs.soton.ac.uk 2 Mixed Reality Lab, University of Nottingham, Nottingham, UK [jef, tar]@cs.nott.ac.uk Abstract. Interactive agent-based systems are becoming increasingly ubiquitous in our everyday lives, helping people in many domains including healthcare, transportation, and energy. As such, there is a need to investigate how humans and agents interact with each other to maximize the benefit that such systems can offer. In this paper, we present a field study that lasted for six weeks, in which 12 different households were required to interact on a daily basis with an agent-based system in order to manage their electricity pricing scheme. The main goal of this study was to explore long term interactions between human users and an agent to understand how people’s trust towards an agent may change over time, and consequently affect their autonomy and interaction preferences. Our results suggest that flexible autonomy shows promise for sustaining users’ trust and engagement with an agent, despite its occasional mistakes. Keywords: Human-Agent Interaction, Autonomous Agents, Flexible Autonomy, Energy. 1 Introduction The increasing availability and miniaturisation of low-cost sensing, actuation and computational devices is likely to result in the wide adoption of sensor-based ”smart” applications and systems that leverage large amounts of data. Artificial Intelligence (AI) researchers need to face the challenge of designing interactive systems to help people take advantage of this data in a domestic context, often with the ambition to help us more efficiently utilise the limited resources of our planet. Recently, a small number of studies have investigated systems that automatically respond to sensed environmental changes such as ”smart” thermostats that take into account our whereabouts and learn our preferences [6, 20]. We are interested in the potential of domestic interactive technology to operate autonomously, and in users’ incli- nation, or resistance, to deliberately transfer control to such systems. On the one hand, autonomous operation may be desirable, or even essential, if we are to harness the capabilities offered by increasing amounts of data. On the other hand, autonomous operation may not be the best choice due to noise and biases in real world data, the limited size of training datasets, and the discrepancies between computationally feasible models and complex real-life systems, that result in the operation of these ”smart” autonomous systems being, at times, sub-optimal or, in the worst case, detrimental. For example, a recent work examining the real-world uptake of a smart thermostat highlighted how such errors are likely to cause users’ frustration and may lead them to abandon the technology [20]. It is therefore crucial that researchers and designers understand how to best design autonomous systems and interaction techniques that make the system status and operation clearly readable, and that allow its users to easily shift between autonomous and manual operation, a notion known in the Multi-Agent Systems community as ”adjustable autonomy” [14] or ”flexible autonomy” [8]. In particular, our focus is on how interactive autonomous computing systems may mediate user interaction with future energy infrastructure: the ”smart” grid. The choice of such a setting is driven by two main factors. First, energy as an application is important in itself for its societal and economic implications. Second, energy systems provide an opportunity to study rich interactions with prototypes of future autonomous systems. Hence, in this paper, we present a field evaluation of an agent-based home energy management application, which monitors household energy consumption, as well as available energy pricing schemes, and calculates the best one, and (optionally) automatically switches to it. The field trial, which lasted for six weeks and involved a diverse group of 12 participants, is reported through an analysis of automatic interaction logs and qualitative analysis of post-trial semi-structured interviews, aimed at uncovering long-term orientations towards smart systems in the wild, by taking AI beyond lab simulations.

Evaluating Human-Agent Interaction in the Wild

Embed Size (px)

DESCRIPTION

From the Proceedings of HAIDM 2015.

Citation preview

Page 1: Evaluating Human-Agent Interaction in the Wild

Evaluating Human-Agent Interaction in the Wild

Alper Alan1, Enrico Costanza1, Sarvapali D. Ramchurn1, Joel Fischer2, Tom Rodden2, and Nicholas R. Jennings1

1 Agents, Interaction and Complexity Group, University of Southampton, Southampton, UK[ata1g11, ec, sdr, nrj]@ecs.soton.ac.uk

2 Mixed Reality Lab, University of Nottingham, Nottingham, UK[jef, tar]@cs.nott.ac.uk

Abstract. Interactive agent-based systems are becoming increasingly ubiquitous in our everyday lives, helpingpeople in many domains including healthcare, transportation, and energy. As such, there is a need to investigatehow humans and agents interact with each other to maximize the benefit that such systems can offer. In this paper,we present a field study that lasted for six weeks, in which 12 different households were required to interact on adaily basis with an agent-based system in order to manage their electricity pricing scheme. The main goal of thisstudy was to explore long term interactions between human users and an agent to understand how people’s trusttowards an agent may change over time, and consequently affect their autonomy and interaction preferences. Ourresults suggest that flexible autonomy shows promise for sustaining users’ trust and engagement with an agent,despite its occasional mistakes.

Keywords: Human-Agent Interaction, Autonomous Agents, Flexible Autonomy, Energy.

1 Introduction

The increasing availability and miniaturisation of low-cost sensing, actuation and computational devices is likely toresult in the wide adoption of sensor-based ”smart” applications and systems that leverage large amounts of data.Artificial Intelligence (AI) researchers need to face the challenge of designing interactive systems to help people takeadvantage of this data in a domestic context, often with the ambition to help us more efficiently utilise the limitedresources of our planet. Recently, a small number of studies have investigated systems that automatically respondto sensed environmental changes such as ”smart” thermostats that take into account our whereabouts and learn ourpreferences [6, 20].

We are interested in the potential of domestic interactive technology to operate autonomously, and in users’ incli-nation, or resistance, to deliberately transfer control to such systems. On the one hand, autonomous operation may bedesirable, or even essential, if we are to harness the capabilities offered by increasing amounts of data. On the otherhand, autonomous operation may not be the best choice due to noise and biases in real world data, the limited sizeof training datasets, and the discrepancies between computationally feasible models and complex real-life systems,that result in the operation of these ”smart” autonomous systems being, at times, sub-optimal or, in the worst case,detrimental. For example, a recent work examining the real-world uptake of a smart thermostat highlighted how sucherrors are likely to cause users’ frustration and may lead them to abandon the technology [20]. It is therefore crucialthat researchers and designers understand how to best design autonomous systems and interaction techniques thatmake the system status and operation clearly readable, and that allow its users to easily shift between autonomous andmanual operation, a notion known in the Multi-Agent Systems community as ”adjustable autonomy” [14] or ”flexibleautonomy” [8].

In particular, our focus is on how interactive autonomous computing systems may mediate user interaction withfuture energy infrastructure: the ”smart” grid. The choice of such a setting is driven by two main factors. First, energyas an application is important in itself for its societal and economic implications. Second, energy systems providean opportunity to study rich interactions with prototypes of future autonomous systems. Hence, in this paper, wepresent a field evaluation of an agent-based home energy management application, which monitors household energyconsumption, as well as available energy pricing schemes, and calculates the best one, and (optionally) automaticallyswitches to it. The field trial, which lasted for six weeks and involved a diverse group of 12 participants, is reportedthrough an analysis of automatic interaction logs and qualitative analysis of post-trial semi-structured interviews,aimed at uncovering long-term orientations towards smart systems in the wild, by taking AI beyond lab simulations.

Page 2: Evaluating Human-Agent Interaction in the Wild

1.1 Flexible Autonomy

Autonomous agents (software agents, robots or other systems) have long been the object of study of AI. Althoughdefinitions of agents can vary depending on the domain, they are commonly referred to as intelligent systems thatperform a number of operations on behalf of human users [19]. Today, systems based on human-agent collectives,where humans and agents establish a range of collaborative relationships and share responsibility for accomplishing aspecific goal, are continually growing [8]. In systems where humans rely on autonomous systems to complete complextasks, the relationship between humans and agents is typically defined by the level of autonomy attributed to the agents[14]. Traditional approaches to defining this relationship assume that humans are somehow more knowledgeable aboutthe task and therefore more apt at defining when the agent should autonomously act and when it should request manualoperation. This is typically referred to as ”adjustable autonomy”. However, in many cases, the points at which guidanceneeds to be requested may not be well-defined. Moreover, humans may instead be guided by agents to complementtasks undertaken by the agents. Thus the locus of control may change between humans and agents at any point intime. Such interactions put humans and agents on the same level and are, hence, defined as realisations of flexibleautonomy. The concept of ”flexible autonomy” indicates that the responsibility and authority of both humans andagents can be configured at any time depending on the current situation to optimise the overall performance. To studyflexible autonomy and people’s interactions with it in their everyday lives, we use electricity tariff switching as anexample application scenario, which is described in the following.

1.2 Electricity Tariff Switching

An energy tariff is a pricing scheme by which consumers are charged for the energy they use. Today, there are 34energy providers that the consumer in the UK faces when she needs to pick an electricity tariff. The energy providersoffer various tariff structures, where the most common ones are fixed tariffs and tiered tariffs. The fixed tariffs usuallyinclude two guaranteed rates: a daily standing charge that is the price for the energy service such as distribution andmaintenance, and a unit rate that is the price per unit of energy. The tiered tariffs mostly have multiple tiers where eachtier is charged with different rate. For example, ’Economy 7’ is one of the well-known tiered tariffs, which offers acheaper rate for a seven-hour period during the night, where consumers are required to have a special meter installed tobe able to use such tiered tariffs. Given that there are many tariff options available in the energy market, finding the bestenergy tariff may be a daunting task for most customers. Indeed, in the UK, most customers (70%) are on the wrongtariff and tend to stick to the same tariff without spending much time searching for a better option. 3 Agent-basedsolutions can help people to react to these different tariffs to get actual benefit out of them.

1.3 Previous Work and Overview of New Results

For agent-based systems to become practical and embraced in reality, many research challenges have to be addressed.In [1] an initial iteration of tariff-switching application was described and a set of design guidelines derived from ashort field study were presented. This paper complements that work: we focus on long-term interactions of automatedelectricity tariff switching through removing an emphasis on renewable energy purposefully.

In this paper, we make three new contributions. First, we propose a principled way of applying flexible autonomyfor the development of agent-based applications. Second, we introduce a new application and UI design that allowsusers to determine their interaction and autonomy preferences flexibly. In particular, the new UI enables users to decidehow frequent the agent will be interacting with them in addition to what level and way they will be interacting withthe agent. Our third and main contribution is a long-term field study which was conducted to learn whether users’preferences for autonomy and interaction vary over time in connection with their trust towards an autonomous agent.Our findings, from automatically recorded user interactions and semi-structured interviews, demonstrate that none ofour participants disabled or even reduced the frequency of the updates they received from the agent once every dayfor the entire six weeks. Furthermore, our results show that all users preferred control over the agent and at least someof them gave away more control over time, which provides the first promising results that flexible autonomy might beeffective in developing autonomous agent-based systems that are perceived more useful and trustworthy.

3http://www.bbc.co.uk/news/uk-22745110.

2

Page 3: Evaluating Human-Agent Interaction in the Wild

2 Background

In AI, a number of agent-based solutions have recently been introduced to assist humans managing residential energyuse with real-time electricity prices [18, 11]. However, these approaches have only been tested in simulation, assumingthat users have predictable energy consumption profiles and users will trust the agent to make the right decisions whenenergy prices are not completely predictable. Closer to our work, researchers proposed an agent-based recommendersystem without automatic tariff-switching [10], and only performed a usability test [5].

To date, there have been few studies in which agent-based systems are evaluated with actual end-users throughreal-world deployments. [14, 16, 9] presented a seminal work on adjustable autonomy where a number of personalassistant agents are deployed to alleviate the burden on humans in an office environment. [2] tested their intelligentalarm triage system with real network operators through a user study. [4] explored an agent-based system with a fieldstudy. In this study, software agents assisted users by charging their home batteries in lower priced hours. [3] deployedenergy-aware washing machines that provide users suggestions on when best to do their laundry based on the amountof green energy generated locally, and tested various intervention techniques. Differently from these work, our workgoes one step further, and investigates the dynamics around human-agent interactions with flexible autonomy in longterm. In particular, we aim to shed light on the challenge of designing autonomous agent-based systems that are ableto establish a long-term relationship with their humans users by sustaining their usefulness and credibility.

3 Managing Tariffs with TariffAgent

TariffAgent is developed as a combination of a web application and off-the-shelf sensors. It is inspired by the scenariodepicted in a recent work [12], where autonomous agents embedded in households have the ability to switch theenergy providers based on their offered rates and the user’s consumption routines. In our scenario, we consider a dailyelectricity tariff-switching problem so as to be able to create a realistic field study (as depicted in Section 4). Therefore,we only concerned with the daily energy usage of a home for selecting the best tariff. In what follows, we elaborateon how this daily consumption will be billed by different tariffs and hence define the challenge of choosing betweenthese tariffs based on the predictions of it.

3.1 Daily Tariff Switching

Let the energy consumption of a home on day d be denoted as cd ∈ <+ kWh, where d ∈ {0, · · · , D}, and D ∈ Z+.Then, let the set of tariffs provided by suppliers in the energy market be denoted as t1, ..., ts ∈ T and for each tariffthere exists a function F : T ×< → < that takes the predicted energy consumption for the next day cd+1 of the homeand returns the predicted cost for that tariff. For example, given a standard tariff from typical UK retailers where acustomer is charged a fixed rate r1 and a standing charge s1, function F would return cd+1 × r1 + s1.

To create a more challenging decision environment for users of the application (and therefore incentivize them todelegate their tariff decisions to an agent), we assume there exists eight suppliers, each with their own tariffs similarto real-world tariffs. In particular, each tariff represents the best value for a particular consumption range so that it isnot easy to decide which tariff is the cheapest as it may change every day unless the user is able to accurately predicther own consumption.

3.2 Software Agent

In our scenario, planning which tariff to change to and when to change is a well-suited task for a software agentsince it is required to continuously monitor the changing consumption to predict the best tariff. Energy consumption ismonitored by the agent through off-the-shelf home energy monitoring devices (AlertMe4). These monitoring devicesmeasure the total consumption of the household through a current clamp, and make the data available through anHTTP API.

As was shown in a previous study, applying complex ma-chine learning techniques to predict day-ahead usageaccurately is a challenging problem [17]. Here we also do not aim to test the user’s trust in the accuracy of predicted

4http://www.alertme.co.uk.

3

Page 4: Evaluating Human-Agent Interaction in the Wild

Table 1. Autonomy Levels.

Autonomy Levels What human does What agent doesHuman-guided Changes tariff Provides suggestion

Semi-autonomous Monitors changes Changes tariffFully autonomous Does not monitor Changes tariff

consumption (since this is likely to be low when using state-of-the-art algorithms in any case) and instead focus ontrust in the agent that may make mistakes at times. Thus, the software agent uses a very simple prediction algorithmto predict the day-ahead consumption on day d that simply uses the previous day’s consumption as a prediction forthe next day’s consumption (i.e., cd+1 = cd−1). For all tariff switching suggestions and those autonomously enacted,the agent calculates the cheapest tariff for the day-ahead based on the next day’s predicted consumption and the costfunction defined in Section 3.1.

3.3 Applying Flexible Autonomy

To enable users to flexibly specify their relationship with our agent, we provide a number of autonomy levels forvarious interaction modalities that go beyond the simple notion of moving between human-controlled and fully au-tonomous tariff switching. These autonomy levels allow users to dynamically arrange the authority and responsibilityof their own and the agent. In addition, with these levels they can decide how the human-agent interactions will beheld. Table 1 below shows the three different autonomy levels designed for this study.

Human-guided: If the agent detects that the current tariff is different from the one predicted to be the best for thenext day, it sends an SMS suggesting a tariff change. Users can accept the suggestion by replying ”Yes” via SMS.

Semi-autonomous: The agent automatically switches to the predicted best tariff and informs users of the changevia SMS. If the users are not happy with the change they can go to the website and manually change the tariff there.This level is semi-autonomous in that it automatically switches tariffs, but it allows users to easily regain control.

Fully autonomous: The agent automatically switches to the predicted best tariff but does not inform the user of thechange. This level is fully autonomous in that it completely offloads users of the burden of tariff switching.

In the next section, we present the interaction modalities that allow users to communicate with the agent.

3.4 Human-Agent Interactions

Interactions between users and the agent are supplied through two mediums: they can interact with the agent eitherthrough SMS, or through a web site that represents the main ”face” of the system. The site includes two pages: Homeand Details, which are described in what follows.

The Home page, illustrated in Figure 1, comprises four components: tariff, setting, reports and budget. Throughthe Tariff component users can see the current tariff and manually select the tariff for the next day. The current andthe next day’s tariffs are displayed on the top. The next day’s tariff is highlighted in green if it is the same as the onethe agent suggested, otherwise in orange, to emphasise the difference. In the middle of the component, the predictedvalues for the user’s consumption on the next day are shown. Below the predictions, the eight tariffs are listed, from theone predicted to be the cheapest to the most expensive. The suggested tariff (the cheapest) is marked as such througha text label. Users can select a tariff through a button that is placed next to each tariff.

The predicted amount of energy consumption for the next day can be modified through radio buttons. Changesin the consumption prediction are immediately reflected in the estimated costs for each tariff. The manually selectedprediction can be confirmed, or ”told”, to the agent by clicking a button. By so doing, users can understand how theagent uses their predicted consumption to make a better choice on their behalf and therefore inspire confidence in thesystem.

Setting is the second component of the Home page. It allows users to select one of the three autonomy levelsdescribed in the previous section. Because the delegation of autonomy is central to our study, we included this settingon the Home page, to make it easy for participants to adjust the level of autonomy during the trial.

4

Page 5: Evaluating Human-Agent Interaction in the Wild

Fig. 1. Home Page

The third component, Reports, enables users to decide how often to receive an SMS report, where the optionevery day is initially selected by default. The other options are: every 3 days, every 5 days and every week. The lastcomponent on the home page is the Budget, which displays how much was spent and how much remains of the budgetallocated at the beginning of the study (see the next section for more details). It also provides a link to the other webpage of the system, described in the following.

Fig. 2. Details View

The Details Page provides historical information about the operation of TariffAgent, with the aim of allowing usersto evaluate their performance, and make the system transparent. In particular, for each past day the predicted and actualvalues for energy consumption are shown, together with the suggested and actual tariff selection, the budget, the costand the savings or loss incurred. To facilitate the understanding of the information displayed, table cells are colour-coded. Tariffs are displayed in green/red depending on whether the selection was optimal or suboptimal. Consumption

5

Page 6: Evaluating Human-Agent Interaction in the Wild

predictions are shown in green when they turned out to be accurate (within 10%) and the resulting tariff suggestionis optimal. They are shown in red when they are inaccurate (outside 10%) compared to the realised values and theresulting tariff suggestion is suboptimal. They are shown in orange if the predictions are off (outside 10%), but theresulting tariff suggestion was optimal.

Agent Interacting with Human User The agent can send three different types of notifications via SMS: reports,suggestions and confirmations. Reports provide information on how much energy was consumed, how much the costwas, which tariff was selected, and how much was saved or lost (compared to the optimal or the worst tariff). Reportswere sent to all users regardless of their setting or tariff. An example report is: ”Hello, yesterday your tariff was Tariff-A, your consumption was 4.4 kWh and it cost you 0.69 pound. You saved 1.30 pound with Tariff-A.” The system sendssuggestions to users who are on setting S1, when their tariff for the next day is predicted not to be optimal, for example:”Hello, your tariff needs to be changed from Tariff-A to Tariff-B for tomorrow. If you confirm the change please replyas ’Yes’. ” Saving assumptions are only presented in the web UI for brevity, rather than in the SMS. Confirmationmessages are sent only to users on setting S2 to inform them of an automatic tariff switch, such as: ”Hello, I switchedyour tariff from Tariff-A to Tariff-B for tomorrow.”

Fig. 3. SMS Dialogue with Semi-Autonomy

4 Evaluation in the Wild

Our focus is to study how people interact with an autonomous system, such as TariffAgent, which can influencepeople’s financial situation and possibly daily routines. In order to obtain meaningful results from such a study, it iscrucial to offer a high degree of ecological validity for a long period of time (also to reduce novelty effects). There-forea real world deployment, in the form of a field trial, was designed and carried out to evaluate how people experiencedthe system by using it as part of their everyday life.

4.1 Participants

We recruited 12 participants (6 female, 6 male) to cover a range of lifestyles, as detailed in Table 2 with pseudonyms,through word of mouth and snowball sampling. The only requirement to take part in the trial was to have a broad-bandInternet connection and a mobile phone.

4.2 Method

The trial involved participants installing the meters in their own homes and using the system for a period of 6 weeks (42days). The system used the real electricity consumption data collected from the participants’ homes to calculate their

6

Page 7: Evaluating Human-Agent Interaction in the Wild

Table 2. Participants’ Details.

Participant Gender Age OccupationAdam Male 67 PriestArthur Male 66 RetiredChloe Female 48 Community Mgr

Dionisia Female 25 PhD Stu. (Chemistry)Evelyn Female 66 RetiredGerard Male 26 Media Production MgrGonca Female 23 MSc Stu. (Social Policy)Hiroko Female 37 HousewifeLucy Female 25 Estate MgrPeter Male 33 Unemployed

Stewart Male 33 Software ConsultantTamer Male 30 PhD Stu. (Chemistry)

daily energy cost, based on the energy consumption and the selected tariff (as detailed in the previous section). It wasparticularly emphasized that the system does not affect their actual energy tariffs and bills. To motivate participants toengage with the system and experience the use of an autonomous system, we provided monetary incentives based ontheir performance according to the following setup.

At the beginning of the study, all participants were allocated an online budget of 80, and their daily consumptioncost was reduced from this budget over the period of the trial. At the end of the study, participants received payments(in the form of a gift voucher) according to the amount left on their budget. By so doing, we not only aimed toencourage participants to engage with the system, but also make saving have a real and tangible impact. This idea ofusing monetary incentives to mimic energy pricing was partly inspired from an earlier study around energy pricing[15], where participants were rewarded according to their study performance.

4.3 Data Analysis

Data was collected through automatic recording of interaction logs and individual semi-structured exit interviews. Theinterview focused on people’s use, adoption and under-standing of the system, and lasted between 20 and 30 minutes.To analyse the qualitative data collected through the interviews we adopted an approach inspired by grounded theory,a qualitative analysis method characterized by an absence of predefined assumption [13]. The interviews were docu-mented through audio recording (later fully transcribed); analysis started by categorizing the material at the sentencelevel through open codes grouped in broader categories, such as perception, orientation and responsibility

5 Results

Most participants used the web interface through desktop computers, only 3 accessed the website from smart phones(34 hits from iPhone and 4 hits from Android). The Home page was loaded more frequently, with 420 page loads overthe course of the study, while the Details page was loaded overall 184 times, still accounting for approximately 30%of the total page views (see Figure 4 for more details).

The figures are based on the auto-recorded user interactions. Figure 4 shows the change in the total number ofusers opted for each type of autonomy level over the 42 days. Figure 5 shows the overall page loads of home anddetails page during the 6 weeks. Figure 6 shows the acceptance rate of the agent’s suggestions with percentages.

The default autonomy level at the beginning of the trial, for all participants, was human-guided: the system wouldsend SMS suggestions about tariff switching but it would not automatically switch. This default option was chosenbecause it is the one that requires users the most interaction, so we wanted to see whether they would change to a lessdemanding one by time. Four of the participants modified the autonomy level to semi-autonomous option, where thesystem automatically changes to the predicted best tariff and informs the user of the change via SMS. The remainingeight users kept using the default autonomy level. No one selected full autonomy (where the system changes the tariff

7

Page 8: Evaluating Human-Agent Interaction in the Wild

Fig. 4. Page Loads Fig. 5. Autonomy Preferences Fig. 6. Trust in Suggestions

without informing the user). We report in Figure 5 for how the number of users keeping different autonomy levelsvaried over the course of the study.

In terms of tariff selection, the system’s predictions of the optimal tariff were correct on average 65% of the time(SD:12.3%); for reference users’ actual selections throughout the study (a combination of automatic and manual)corresponded to the optimal tariff 61% of the time (SD:15.4%). Overall 10 participants received SMS tariff switchingsuggestions from the agent (while on the human-guided). The eight participants accepted them by replying ”Yes” viaSMS at least once, the other two participants never accepted any tariff switching suggestion but changed their tariffmanually on the website at least once. In total 139 SMS suggestions were sent from the system to the users, with a38% acceptance rate. Additionally, 59 times users changed their tariffs manually from the website and in 49 cases theyaccepted the agent’s suggestions.

All users except one took advantage of the web UI to provide manual estimates of their electricity consumptionprediction for the following day at least once. In total this explicit input was provided 110 times during the study, andresulted in 85 times correct tariff selections. The one person who did not use the input feature turned out to have a veryregular consumption profile, within a range of 2.1 kWh throughout the entire duration of the study.

The figures above demonstrate that the interactions of users were drastically dropped after the first week, yetmaintained at considerable level by each user till the end of the study. Everyone either accessed the web interface orreplied to SMS suggestions with some regularity that is on average at least once every 2.5 days (SD: 1.3 days). Moreinterestingly, none of the users changed the frequency of reports from daily to a less frequent option, namely opted forthe agent interacting with them every day.

In the interviews, most participants appeared to hold an understanding that mirrors quite closely the actual designof the agent. For example, Gerard eloquently described, ”It looks at what usage you think you are gonna be using inthe next day, which is some thing you either tell it or it just guesses itself, and then it looks at various tariffs basedon the standing costs and the per unit costs, and works out which one is going to be the cheapest.”, or more conciselyHiroko told us, ”It decides the tariff and recommends me.” All participants commented that they perceived the systemas helping them save money, and most of them stated that they trust the system’s tariff decisions. When asked aboutexperienced mistakes in tariff suggestions or selections, they mostly considered it to be their own responsibility. Forinstance, Chloe said, ”It requires individual responsibility to make the system work, it sort of gives responsibility backto the energy user.” In an even more drastic way Gerard stated: ”I think it was always my fault. So I just felt annoyedwith myself and I started trusting the machine more than I trusted myself, I was like oh the machine knows what it isdoing, just leave it alone.”

6 Discussion and Conclusion

In this paper, we present a field trial that exposed 12 participants to a prototyped future energy scenario, where anautonomous system helps in energy cost management. Our results suggest that participants appear to understand andappreciate the autonomous nature of TariffAgent, which proactively sends users suggestions about switching energytariffs. In particular, this is demonstrated by the general and prolonged engagement that most participants had inswitching tariffs and in providing input to the system to improve its consumption forecast performance, and hence the

8

Page 9: Evaluating Human-Agent Interaction in the Wild

quality of the suggestions. Indeed all participants accepted tariff suggestions, and all except one provided consumptionpre- dictions input at least once. Additionally, none of the participants reported to find the SMS tariff switching promptsbothersome, supporting the results of recent studies [1, 3].

Overall four participants delegated more responsibility to the system to automatically switch their tariff. Thus webelieve that it is important to offer autonomy delegation as an option, as some users took advantage of it. In otherwords, the decision of balancing control and autonomy should be left to users where appropriate to cater for individualdifferences in what levels of system autonomy people are comfortable with. In so doing, users will may continue tomake use of relevant parts of the system autonomy, rather than taking over full control. While existing trends in mixed-initiative systems put the onus of requesting human control or input on the system [7, 14], we suggest that systemsinvolving humans and agents should enable users to enact control or to adjust autonomy by default. Moreover, wepropose that such systems should also enable users to provide input to agents at any point in time to improve theiroperation. In turn, this makes legibility of system state an essential requirement for mixed-initiative systems. As futurework, we aim to apply flexible autonomy to applications in other domains so as to explore interactions among multiplehumans and agents.

References1. A. Alan, E. Costanza, J. Fischer, S. Ramchurn, T. Rodden, and N. Jennings. A field study of human-agent interaction for

electricity tariff switching. In Proc. AAMAS. ACM Press, 2014.2. S. Amershi, B. Lee, A. Kapoor, R. Mahajan, and B. Christian. Human-guided machine learning for fast and accurate network

alarm triage. In Proc. IJCAI, 2011.3. J. Bourgeois, J. Linden, G. Kortuem, B. Price, and C. Rimmer. Conversations with my washing machine: an in the wild study

of demand shifting with self-generated energy. In Proc. UbiComp, 2014.4. E. Costanza, J. E. Fischer, J. A. Colley, T. Rodden, S. Ramchurn, and N. R. Jennings. Doing the laundry with agents: a field

trial of a future smart energy system in the home. In Proc. CHI. ACM, 2014.5. J. Fischer, S. Ramchurn, M. Osborne, O. Parson, T. Huynh, M. Alami, N. Pantidi, S. Moran, K. Bachour, S. Reece, E. Costanza,

T. Rodden, and N. Jennings. Recommending energy tariffs and load shifting based on smart household usage profiling. InProc. IUI. ACM, 2013.

6. M. Gupta, S. S. Intille, and K. Larson. Adding gps-control to traditional thermostats: an exploration of potential energy savingsand design challenges. In Proc. Pervasive, pages 95–114, Berlin, Heidelberg, 2009. Springer-Verlag.

7. E. Horvitz. Principles of mixed-initiative user interfaces. In Proc. CHI, pages 159–166. ACM, 1999.8. N. Jennings, L. Moreau, D. Nicholson, S. Ramchurn, S. Roberts, T. Rodden, and A. Rogers. Human-Agent Collectives.

Communications of the ACM, 2014.9. J. Kwak, P. Varakantham, R. Maheswaran, Y. Chang, M. Tambe, B. Becerik-Gerber, and W. Wood. Tesla: an extended study

of an energy-saving agent that leverages schedule flexibility. Journal of Autonomous Agents and Multiagent Systems, 2013.10. S. Ramchurn, M. Osborne, O. Parson, T. Rahwan, S. Maleki, S. Reece, T. Huynh, M. Alam, J. Fischer, T. Rodden, L. Moreau,

and S. Roberts. Agentswitch: towards smart energy tariff selection. In Proc. AAMAS. ACM, 2013.11. S. D. Ramchurn, P. Vytelingum, A. Rogers, and N. R. Jennings. Agent-based control for decentralised demand side manage-

ment in the smart grid. In Proc. AAMAS. ACM, 2011.12. T. A. Rodden, J. E. Fischer, N. Pantidi, K. Bachour, and S. Moran. At home with agents: exploring attitudes towards future

smart energy infrastructures. In Proc. CHI. ACM, 2013.13. Y. Rogers, H. Sharp, and J. Preece. Interaction design: beyond human-computer interaction. Wiley, 2011.14. P. Scerri, D. Pynadath, and T. Milind. Towards adjustable autonomy for the real world. Journal of Artificial Intelligence

Research, 2(50), 2002.15. R. E. Slavin, J. S. Wodanski, and B. L. Blackburn. A group contingency for electricity conservation in master-metered apart-

ments. Journal of Applied Behavior Analysis, 14(3), 1981.16. M. Tambe, E. Bowring, J. Pearce, P. Varakantham, P. Scerri, and D. Pynadath. Electric elves: what went wrong and why. AI

Magazine, 29(2), 2008.17. N. C. Truong, J. McInerney, L. Tran-Thanh, E. Costanza, and S. D. Ramchurn. Forecasting multi-appliance usage for smart

home energy management. In Proc. IJCAI, 2013.18. P. Vytelingum, T. D. Voice, S. D. Ramchurn, A. Rogers, and N. R. Jennings. Theoretical and practical foundations of large-scale

agent-based micro-storage in the smart grid. Journal of Artificial Intelligence Research, 42, 2010.19. M. Wooldridge and N. Jennings. Intelligent agents: theory and practice. The Knowledge Engineering Review, 10(2):115–152,

1995.20. R. Yang and M. W. Newman. Learning from a learning thermostat: lessons for intelligent systems for the home. In Proc.

UbiComp, pages 93–102, New York, NY, USA, 2013. ACM.

9