Upload
truongtuong
View
214
Download
0
Embed Size (px)
Citation preview
Evaluating the safety and effectiveness of mHealth apps
Jeremy Wyatt DM FRCP ACMI Fellow
Professor of Digital Healthcare & Director
Wessex Institute of Health and Research
Overview
Why mHealth apps ?
What should a safe app look like ?
• Examples of studies examining safety
What should an effective app look like ?
• Examples of studies examining effectiveness
Should we evaluate safety & effectiveness for everymHealth app ?
How to disseminate the result of our studies ?
Conclusions
Why use digital channels ?
£8.60
£5.00
£2.83
£0.15£0
£1
£2
£3
£4
£5
£6
£7
£8
£9
£10
Face to face Letter Telephone Digital
Co
st in
£ p
er
en
cou
nte
r
Mean UK public sector cost per completed encounter across 120 UK councils
Source: Cabinet Office Digital efficiency report, 2013
Why bother with mHealth apps ?
1. Face-to-face contacts with health professionals do not scale
2. Smart phone hardware used by 65%+ of UK adults:• Cheap, convenient, fashionable
• Inbuilt sensors / wearables allow easy measurements
• Multiple communication channels: SMS, voice, video, apps…
3. mHealth apps enable:• Unobtrusive alerts to record data, take action
• Incorporation of Susan Michie’s behaviour change techniques
• Tailoring, which makes behaviour change more effective (d=0.16, Lustria, J H Comm 2013)
A safe mHealth app should…
1. Respect the privacy of sensitive user data
2. Be based on sound evidence (not just opinion)
3. Be usable & behave predictably
4. Give accurate output or advice
Privacy and mHealth apps
Permissions requested: use accounts, modify USB, read phone ID, find files, full net access, view connections…
Our study of 80 apps: average of 4 clear privacy breaches for health apps, only 1 for medical apps
We know that - we read the Terms & Conditions ! (this one only 1200 words, but many much longer…)
Firs
t Fo
lio A
s Yo
u L
ike
It P
ub
lic D
om
ain
Ph
oto
tak
en
by
Co
war
dly
Lio
n-
Folio
So
ciet
y e
dit
ion
of
19
96
With Hannah Panayiotou & Anam Noel, Leeds medical students
Evidence on privacy & mHealth apps
Huckvale et al 2015 “man in the middle” study of 79 accredited lifestyle apps from the NHS Apps library:
• Only 53 (67%) had a privacy policy: policies vague, did not explain types of data being shared
• No app encrypted data held on device
• 70 (89%) of apps leaked confidential data over network
• 35 included identifiers, 23 sent IDs without encryption
• 4 (5%) apps sent both IDs and health information without encryption
Does higher price correlate with sound evidence base for mHealth apps ?
y = 3.7 - 0.1x R² = 0.016
0
5
10
15
20
25
30
35
40
45
0 5 10 15 20 25 30 35
Ap
p p
rice
US
do
llars
Evidence score: high score means app adheres to US Preventive Service Task Force guidelines
Price ($US) of 47 smoking cessation apps versus evidence score (data from Abroms et al 2013)
User ratings: app display rank versus app adherence to evidence
Study of 47 smoking cessation apps (Abroms, 2013)
Engineering approaches to safety
Usability evaluation methods eg. expert inspection; scripted user-centred design workshops with think aloud etc.; user testing in the wild with log analysis, interviews etc.
Technology resilience: failure mode analysis, PFMEA etc. [see PAS 277]
Human reliability assessment
Source: Ann Blandford, UCL
Do CE marked apps behave predictably ?
Sepsis 6 app: screenshot shows data from two patients
A different Sepsis 6 app: screenshot shows clipped list of key actions to complete –cannot scroll down
Source: Harold Thimbleby, Swansea
Other evidence on app safetyApps for insulin dosage adjustment (n= 46, Huckvale 2015):
14 (30%) declared source of algorithm, 3 (9%) validated input data, 27 (59%) allowed calculation with missing data
Only 1 app was free of issues
17 (37%) did not update when input data was changed
Asthma apps (Huckvale 2015):
Number doubled from 93 in 2011 to 191 in 2013
23 (25%) of the first group withdrawn; 147 new apps in 2 years
Newer apps not more evidence based: only 75 (50%) of 147 gave basic info on asthma, 36 (24%) had diary functions
Only 4 (17%) of 23 apps advising on asthma management were consistent with guidelines
Accuracy of CVD risk apps for public
We located 21 apps: only 19 (7 paid) gave figures
All 19 communicated risk using percentages (cf. Gigerenzer, BMJ 2004: use numbers)
One app said see your GP every time; none of the rest gave advice
Some apps refused to accept key data, eg. age > 74, diabetes
Heart Health App
Error rates
Of 19 apps, 8 (42%) misclassified at least 20% of scenarios
Median error rate: free apps 13%, paid apps 27% (p = 0.026)
0%
5%
10%
15%
20%
25%
30%
35%M
iscl
assi
fica
tio
n r
ate
App misclassification rate (20% threshold; paid apps orange)
Error rates varied from 7% (safe ?) to 33% (unsafe !)
Assessing app accuracy
1. Only applies to apps that give advice, calculate a risk, drug dose etc.
2. Need a gold standard for correct advice / risk [QRisk2 in our case]
3. Need a representative case series, or plausible simulated cases
4. Ideally, users should enter case data – or their own data
5. How accurate is “accurate enough”:
• Accurate enough to get used ?
• Accurate enough to encourage user to take action ?
An effective app should…
1. Deliver on its claims
2. Offer the user more benefits than harms
3. EITHER: a) Be equivalent to current alternatives but less
costly, OR
b) Be better than alternatives, and the same cost
and therefore be prescribable…
Options for evaluating effectiveness
1. Psychological experiments: within user change in knowledge / views / decisions / certainty [“Intervention modelling experiments”]
2. Exploiting big health data: instrumental variable, regression discontinuity etc designs
3. Engineering methods: SMART, A-B testing; testing & disseminating generic design principles, not apps
4. and online or face-to-face randomised trials !
Online RCT to measure impact of Fogg’s persuasive technology theory on NHS organ donation register sign up rates
Persuasive features:1. URL includes https, dundee.ac.uk2. University Logo3. No advertising4. References5. Address & contact details6. Privacy Statement7. Articles all dated8. Site certified (W3C / Health on Net)
Source: Nind, Sniehotta et al 2010
Trials of app effectiveness
In 2016 there were 21 published randomised trials of apps used by patients / the public:
3 studies were confounded (used app + much else besides)
3 were equivalence studies (does app save resources, but with same outcomes ?): 2 were positive
Of the remaining 15 trials*:8 studied health behaviours: 7 positive, 1 worse (compared to SMS for smoking cessation)
5 studied clinical processes: 3 positive, 2 equal
5 studied patient outcomes: 3 positive, 2 equal
Overall (inc. equivalence trials): 15 positive, 4 equal, 1 worse
* 3 studies measured more than one of these
Source: JW work in progress to date
Why bother with impact evaluation – can’t we just predict the results ?
No, the real word of people and organisations is too messy / complex to predict whether technology works:
• Diagnostic decision support (Wyatt, MedInfo ‘89)
• Integrated medicines management for a children’s hospital (Koppel, JAMA 2005)
• MSN messenger for nurse triage (Eminovic, JTT 2006)
What to evaluate, for which apps?
low highPotential risk from using the app
Likely benefit from using the app
low
high
Developer self declaration on key features
Effectiveness
Safety
Both safety & effectiveness
RCP quality criteria for physician apps, based on Donabedian 1966
Structure = the app development team, the evidence base, use of an appropriate behaviour change model etc. …
Processes = app functions: usability, accuracy etc.
Outcomes = app impacts on user knowledge & self efficacy, user behaviours, resource usage
Wyatt JC, Thimbleby H, Rastall
P, Hoogewerf J, Wooldridge D,
Williams J.
What makes a good clinical
app? Introducing the RCP
Health Informatics Unit
checklist. Clin Med (Lond).
2015 Dec;15(6):519-21. doi:
10.7861/clinmedicine.15-6-519
RCP app checklist part 1
RCP app checklist part 2
Good evaluation practice for digital health interventions
1. Know why you are evaluating: who are the stakeholders, what decision do they face ?
2. Understand stakeholder questions and the level of evidence they need to answer them
3. Design your study with: • Enough participants of the right kind• The right intervention• The right control• Validated outcome measures
4. Check for biases and confounders & that you will learn something if study negative
5. Run the study & report your results
See: Murray E et al. Design & evaluation of digital interventions. Am J Prev Med Nov 2016
Conclusions
1. The quality of mHealth apps varies too much
2. User rating, professional reviews, developer self-certification and regulation are not enough
3. To help reduce “apptimism” and strengthen other methods, we should agree criteria for safety & effectiveness, evaluate selected apps against them and label these apps with the results
4. This will support patients, health professionals, health systems and app developers to maximise the benefits of mHealth
Email [email protected]
Spare slides
A proposed evaluation cascade for mHealth Apps
Area Topics Methods
Source • Purpose, sponsor• User, cost
Inspection
Safety • Data protection• Usability
InspectionHCI lab / user tests
Content • Based on sound evidence• Proven behaviour change methods
Inspection
Accuracy • Calculations• Advice
Scenarios with gold standard
Potential impact
• Ease of use in the field• Understanding of output
Intervention modelling experiments
Impact • Knowledge, attitudes, self-efficacy• Health behaviours, outcomes
Within-subject exptsField trials
Which outcomes to measure ?
Depends on app purpose. Eg. to assess the impact of a health promotion app, consider measuring:
• User knowledge and attitudes• Self efficacy (validated tool)• Short term behaviour change• Long term behaviour change• Individual health-related outcomes• Population health
surrogate outcomes
Possible quality approval processes to improve mHealth
Methods Advantages Disadvantages Examples
Wisdom of the crowd
Simple user ranking
Hard for users to assess quality; click factory bias
Current app storesMyHealthApps
Users apply quality criteria
Explicit Requires widespread dissemination; can everyone apply them ?
RCP checklist
Classic peerreviewed article
Rigorous (?) Slow, resource intensive,doesn’t fit App model
47 PubMed articles
Physician peer review
TimelyDynamic
Not as rigorousScalable ?
iMedicalApps, MedicalAppJournal
Developer self-certification
Dynamic Requires developers tounderstand & comply; checklist must fit apps
HON Code ?RCP checklist
Developer support Resource light Technical knowledge neededMultitude of developers
BSI PAS 277
CE marking,external regulation
Credible Slow, expensive, apps don’t fit national model
NHS App Store, FDA, MHRA
We need to think differently…Old think New Think
Paternalism: we know & determine what is
best for users
Self determination: users decide what is best
for them
Regulation will eliminate harmful Apps after
release
Prevent bad Apps - help App developers
understand safety & quality
The NHS must control Apps, apply rules and
safety checks
Self regulation by developer community
Consumer choice informed by truth in
labelling
App developers are in control Aristotle’s civil society* is in control
Quality is best achieved by laws and
regulations
Quality is best achieved by consensus and
culture change
The aim of Apps is innovation (sometimes
above other considerations)
App innovation must balance benefits and
risks
An Apps market driven by viral campaigns,
unfounded claims of benefit
An Apps market driven by fitness for purpose
(ISO) & evidence of benefit
• The elements that make up a democratic society, such as freedom of speech, an independent judiciary, collaborating for common wellbeing
Some criteria for an mHealth quality approval process
1. Empower patient & professional choice
2. Promote survival of the fittest and a proper market, not just innovation for its own sake
3. Use criteria that make sense to patients / professionals / health systems / industry
4. Scalable to thousands of apps
5. Proportionate to clinical risk
6. Resistant to manipulation, and auditable