Evaluating the safety and effectiveness of mHealth apps · No advertising 4. References 5. ... surrogate outcomes. Possible ... Methods Advantages Disadvantages Examples Wisdom of

Evaluating the safety and effectiveness of mHealth apps

Jeremy Wyatt DM FRCP ACMI Fellow

Professor of Digital Healthcare & Director

Wessex Institute of Health and Research

Overview

Why mHealth apps ?

What should a safe app look like ?

• Examples of studies examining safety

What should an effective app look like ?

• Examples of studies examining effectiveness

Should we evaluate safety & effectiveness for everymHealth app ?

How to disseminate the result of our studies ?

Conclusions

Why use digital channels ?

£8.60

£5.00

£2.83

£0.15£0

£1

£2

£3

£4

£5

£6

£7

£8

£9

£10

Face to face Letter Telephone Digital

Co

st in

£ p

er

en

cou

nte

r

Mean UK public sector cost per completed encounter across 120 UK councils

Source: Cabinet Office Digital efficiency report, 2013

Why bother with mHealth apps ?

1. Face-to-face contacts with health professionals do not scale

2. Smart phone hardware used by 65%+ of UK adults:• Cheap, convenient, fashionable

• Inbuilt sensors / wearables allow easy measurements

• Multiple communication channels: SMS, voice, video, apps…

3. mHealth apps enable:• Unobtrusive alerts to record data, take action

• Incorporation of Susan Michie’s behaviour change techniques

• Tailoring, which makes behaviour change more effective (d=0.16, Lustria, J H Comm 2013)

A safe mHealth app should…

1. Respect the privacy of sensitive user data

2. Be based on sound evidence (not just opinion)

3. Be usable & behave predictably

4. Give accurate output or advice

Privacy and mHealth apps

Permissions requested: use accounts, modify USB, read phone ID, find files, full net access, view connections…

Our study of 80 apps: average of 4 clear privacy breaches for health apps, only 1 for medical apps

We know that - we read the Terms & Conditions ! (this one only 1200 words, but many much longer…)

Firs

t Fo

lio A

s Yo

u L

ike

It P

ub

lic D

om

ain

Ph

oto

tak

en

by

Co

war

dly

Lio

n-

Folio

So

ciet

y e

dit

ion

of

19

96

With Hannah Panayiotou & Anam Noel, Leeds medical students

Evidence on privacy & mHealth apps

Huckvale et al 2015 “man in the middle” study of 79 accredited lifestyle apps from the NHS Apps library:

• Only 53 (67%) had a privacy policy: policies vague, did not explain types of data being shared

• No app encrypted data held on device

• 70 (89%) of apps leaked confidential data over network

• 35 included identifiers, 23 sent IDs without encryption

• 4 (5%) apps sent both IDs and health information without encryption

Does higher price correlate with sound evidence base for mHealth apps ?

y = 3.7 - 0.1x R² = 0.016

0

5

10

15

20

25

30

35

40

45

0 5 10 15 20 25 30 35

Ap

p p

rice

US

do

llars

Evidence score: high score means app adheres to US Preventive Service Task Force guidelines

Price ($US) of 47 smoking cessation apps versus evidence score (data from Abroms et al 2013)

User ratings: app display rank versus app adherence to evidence

Study of 47 smoking cessation apps (Abroms, 2013)

Engineering approaches to safety

Usability evaluation methods eg. expert inspection; scripted user-centred design workshops with think aloud etc.; user testing in the wild with log analysis, interviews etc.

Technology resilience: failure mode analysis, PFMEA etc. [see PAS 277]

Human reliability assessment

Source: Ann Blandford, UCL

Do CE marked apps behave predictably ?

Sepsis 6 app: screenshot shows data from two patients

A different Sepsis 6 app: screenshot shows clipped list of key actions to complete –cannot scroll down

Source: Harold Thimbleby, Swansea

Other evidence on app safetyApps for insulin dosage adjustment (n= 46, Huckvale 2015):

14 (30%) declared source of algorithm, 3 (9%) validated input data, 27 (59%) allowed calculation with missing data

Only 1 app was free of issues

17 (37%) did not update when input data was changed

Asthma apps (Huckvale 2015):

Number doubled from 93 in 2011 to 191 in 2013

23 (25%) of the first group withdrawn; 147 new apps in 2 years

Newer apps not more evidence based: only 75 (50%) of 147 gave basic info on asthma, 36 (24%) had diary functions

Only 4 (17%) of 23 apps advising on asthma management were consistent with guidelines

Accuracy of CVD risk apps for public

We located 21 apps: only 19 (7 paid) gave figures

All 19 communicated risk using percentages (cf. Gigerenzer, BMJ 2004: use numbers)

One app said see your GP every time; none of the rest gave advice

Some apps refused to accept key data, eg. age > 74, diabetes

Heart Health App

Error rates

Of 19 apps, 8 (42%) misclassified at least 20% of scenarios

Median error rate: free apps 13%, paid apps 27% (p = 0.026)

0%

5%

10%

15%

20%

25%

30%

35%M

iscl

assi

fica

tio

n r

ate

App misclassification rate (20% threshold; paid apps orange)

Error rates varied from 7% (safe ?) to 33% (unsafe !)

Assessing app accuracy

1. Only applies to apps that give advice, calculate a risk, drug dose etc.

2. Need a gold standard for correct advice / risk [QRisk2 in our case]

3. Need a representative case series, or plausible simulated cases

4. Ideally, users should enter case data – or their own data

5. How accurate is “accurate enough”:

• Accurate enough to get used ?

• Accurate enough to encourage user to take action ?

An effective app should…

1. Deliver on its claims

2. Offer the user more benefits than harms

3. EITHER: a) Be equivalent to current alternatives but less

costly, OR

b) Be better than alternatives, and the same cost

and therefore be prescribable…

Options for evaluating effectiveness

1. Psychological experiments: within user change in knowledge / views / decisions / certainty [“Intervention modelling experiments”]

2. Exploiting big health data: instrumental variable, regression discontinuity etc designs

3. Engineering methods: SMART, A-B testing; testing & disseminating generic design principles, not apps

4. and online or face-to-face randomised trials !

Online RCT to measure impact of Fogg’s persuasive technology theory on NHS organ donation register sign up rates

Persuasive features:1. URL includes https, dundee.ac.uk2. University Logo3. No advertising4. References5. Address & contact details6. Privacy Statement7. Articles all dated8. Site certified (W3C / Health on Net)

Source: Nind, Sniehotta et al 2010

Trials of app effectiveness

In 2016 there were 21 published randomised trials of apps used by patients / the public:

3 studies were confounded (used app + much else besides)

3 were equivalence studies (does app save resources, but with same outcomes ?): 2 were positive

Of the remaining 15 trials*:8 studied health behaviours: 7 positive, 1 worse (compared to SMS for smoking cessation)

5 studied clinical processes: 3 positive, 2 equal

5 studied patient outcomes: 3 positive, 2 equal

Overall (inc. equivalence trials): 15 positive, 4 equal, 1 worse

* 3 studies measured more than one of these

Source: JW work in progress to date

Why bother with impact evaluation – can’t we just predict the results ?

No, the real word of people and organisations is too messy / complex to predict whether technology works:

• Diagnostic decision support (Wyatt, MedInfo ‘89)

• Integrated medicines management for a children’s hospital (Koppel, JAMA 2005)

• MSN messenger for nurse triage (Eminovic, JTT 2006)

What to evaluate, for which apps?

low highPotential risk from using the app

Likely benefit from using the app

low

high

Developer self declaration on key features

Effectiveness

Safety

Both safety & effectiveness

RCP quality criteria for physician apps, based on Donabedian 1966

Structure = the app development team, the evidence base, use of an appropriate behaviour change model etc. …

Processes = app functions: usability, accuracy etc.

Outcomes = app impacts on user knowledge & self efficacy, user behaviours, resource usage

Wyatt JC, Thimbleby H, Rastall

P, Hoogewerf J, Wooldridge D,

Williams J.

What makes a good clinical

app? Introducing the RCP

Health Informatics Unit

checklist. Clin Med (Lond).

2015 Dec;15(6):519-21. doi:

10.7861/clinmedicine.15-6-519

https://www.ncbi.nlm.nih.gov/pubmed/26621937

RCP app checklist part 1

RCP app checklist part 2

Good evaluation practice for digital health interventions

1. Know why you are evaluating: who are the stakeholders, what decision do they face ?

2. Understand stakeholder questions and the level of evidence they need to answer them

3. Design your study with: • Enough participants of the right kind• The right intervention• The right control• Validated outcome measures

4. Check for biases and confounders & that you will learn something if study negative

5. Run the study & report your results

See: Murray E et al. Design & evaluation of digital interventions. Am J Prev Med Nov 2016

Conclusions

1. The quality of mHealth apps varies too much

2. User rating, professional reviews, developer self-certification and regulation are not enough

3. To help reduce “apptimism” and strengthen other methods, we should agree criteria for safety & effectiveness, evaluate selected apps against them and label these apps with the results

4. This will support patients, health professionals, health systems and app developers to maximise the benefits of mHealth

Email [email protected]

mailto:[email protected]

Spare slides

A proposed evaluation cascade for mHealth Apps

Area Topics Methods

Source • Purpose, sponsor• User, cost

Inspection

Safety • Data protection• Usability

InspectionHCI lab / user tests

Content • Based on sound evidence• Proven behaviour change methods

Inspection

Accuracy • Calculations• Advice

Scenarios with gold standard

Potential impact

• Ease of use in the field• Understanding of output

Intervention modelling experiments

Impact • Knowledge, attitudes, self-efficacy• Health behaviours, outcomes

Within-subject exptsField trials

Which outcomes to measure ?

Depends on app purpose. Eg. to assess the impact of a health promotion app, consider measuring:

• User knowledge and attitudes• Self efficacy (validated tool)• Short term behaviour change• Long term behaviour change• Individual health-related outcomes• Population health

surrogate outcomes

Possible quality approval processes to improve mHealth

Methods Advantages Disadvantages Examples

Wisdom of the crowd

Simple user ranking

Hard for users to assess quality; click factory bias

Current app storesMyHealthApps

Users apply quality criteria

Explicit Requires widespread dissemination; can everyone apply them ?

RCP checklist

Classic peerreviewed article

Rigorous (?) Slow, resource intensive,doesn’t fit App model

47 PubMed articles

Physician peer review

TimelyDynamic

Not as rigorousScalable ?

iMedicalApps, MedicalAppJournal

Developer self-certification

Dynamic Requires developers tounderstand & comply; checklist must fit apps

HON Code ?RCP checklist

Developer support Resource light Technical knowledge neededMultitude of developers

BSI PAS 277

CE marking,external regulation

Credible Slow, expensive, apps don’t fit national model

NHS App Store, FDA, MHRA

We need to think differently…Old think New Think

Paternalism: we know & determine what is

best for users

Self determination: users decide what is best

for them

Regulation will eliminate harmful Apps after

release

Prevent bad Apps - help App developers

understand safety & quality

The NHS must control Apps, apply rules and

safety checks

Self regulation by developer community

Consumer choice informed by truth in

labelling

App developers are in control Aristotle’s civil society* is in control

Quality is best achieved by laws and

regulations

Quality is best achieved by consensus and

culture change

The aim of Apps is innovation (sometimes

above other considerations)

App innovation must balance benefits and

risks

An Apps market driven by viral campaigns,

unfounded claims of benefit

An Apps market driven by fitness for purpose

(ISO) & evidence of benefit

• The elements that make up a democratic society, such as freedom of speech, an independent judiciary, collaborating for common wellbeing

Some criteria for an mHealth quality approval process

1. Empower patient & professional choice

2. Promote survival of the fittest and a proper market, not just innovation for its own sake

3. Use criteria that make sense to patients / professionals / health systems / industry

4. Scalable to thousands of apps

5. Proportionate to clinical risk

6. Resistant to manipulation, and auditable

Documents

Evaluating the safety and effectiveness of mHealth apps · No advertising 4. References 5. ... surrogate outcomes. Possible ... Methods Advantages Disadvantages Examples Wisdom of