Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Design of an Adaptive System for Upper-Limb Stroke Rehabilitation
by
Patricia Wai Ling Kan
A thesis submitted in conformity with the requirements for the degree of Master of Applied Science
Institute of Biomaterials and Biomedical Engineering University of Toronto
© Copyright by Patricia Wai Ling Kan 2008
ii
Design of an Adaptive System for Upper-Limb Stroke
Rehabilitation
Patricia Wai Ling Kan
Master of Applied Science
Institute of Biomaterials and Biomedical Engineering University of Toronto
2008
Abstract
Stroke is the primary cause of adult disability. To support this large population in recovery,
robotic technologies are being developed to assist in the delivery of rehabilitation. A partially
observable Markov decision process (POMDP) system was designed for a rehabilitation robotic
device that guides stroke patients through an upper-limb reaching task. The performance of the
POMDP system was evaluated by comparing the decisions made by the POMDP system with
those of a human therapist. Overall, the therapist agreed with the POMDP decisions
approximately 65% of the time. The therapist thought the POMDP decisions were believable
and could envision this system being used in both the clinic and home. The patient would use
this system as the primary method of rehabilitation. Limitations of the current system have been
identified which require improvement in future research stages. This research has shown that
POMDPs have promising potential to facilitate upper extremity rehabilitation.
iii
Acknowledgments First and foremost, I would like to thank my supervisor, Dr. Alex Mihailidis, for his continuous
advice, guidance, and support throughout the project. I thank Debbie Hébert for sharing her
expertise in the field of occupational therapy, especially in the area of upper-limb stroke
rehabilitation. I am especially grateful to Dr. Jesse Hoey for teaching me all I know about
POMDPs, as well as his endless patience and assistance in the construction of the POMDP
model used in this project – all while living overseas! Thank you to Dr. Jacob Apkarian, Hervé
Lacheray, and Don Gardner from Quanser Inc. for all their technical support on the robotic
device and virtual environment. I also thank the members of my thesis committee, Dr. Milos
Popovic, Dr. Craig Boutilier, and Dr. Tom Chau, for their time and helpful comments. And, of
course, a big thank you to everyone in the IATSL lab for their help and friendship, especially Jen
Boger for her constant advice and assistance!
To my girls, thanks for all your support over these past few years and for keeping me sane! A
special thanks to my fiancé, Michael Liau, for encouraging me to pursue my Master’s degree
even though we’d be apart, and for always believing in me. I also want to thank my family –
Mom, Dad, Christine – for their unfailing love, support, and encouragement. I love you all!
I would like to recognise the CITO-Precarn Alliance Program and Quanser Inc. for funding this
project. Finally, I would like to thank my only therapist-patient pair at TRI for participating in
the study and providing some insight into the future development of this rehabilitation system.
iv
Table of Contents ABSTRACT ................................................................................................................................................................II
ACKNOWLEDGMENTS........................................................................................................................................ III
TABLE OF CONTENTS ......................................................................................................................................... IV
LIST OF TABLES...................................................................................................................................................VII
LIST OF FIGURES............................................................................................................................................... VIII
LIST OF APPENDICES.........................................................................................................................................XII
LIST OF ACRONYMS......................................................................................................................................... XIII
LIST OF SYMBOLS............................................................................................................................................... XV
CHAPTER 1 INTRODUCTION..............................................................................................................................1
1.1 PROBLEM STATEMENT.....................................................................................................................................1 1.2 OBJECTIVES .....................................................................................................................................................3 1.3 RESEARCH QUESTIONS ....................................................................................................................................3 1.4 SCOPE OF RESEARCH .......................................................................................................................................3
1.4.1 Development of the Intelligent System ....................................................................................................5 1.4.2 Integration of the POMDP Model with the Robotic System....................................................................5 1.4.3 Development of the Evaluation Study......................................................................................................5 1.4.4 Conducting the Evaluation Study ............................................................................................................5
1.5 CONTRIBUTIONS ..............................................................................................................................................5
CHAPTER 2 BACKGROUND ................................................................................................................................7
2.1 STROKE............................................................................................................................................................7 2.2 STROKE RECOVERY .........................................................................................................................................7 2.3 REHABILITATION OF MOTOR SKILLS ...............................................................................................................8 2.4 ROLE OF THERAPISTS.....................................................................................................................................10 2.5 REDUCING HEALTH CARE AND THERAPIST BURDEN .....................................................................................11
CHAPTER 3 LITERATURE REVIEW................................................................................................................12
3.1 CURRENT REHABILITATION ROBOTIC SYSTEMS FOR UPPER EXTREMITIES ....................................................12 3.2 DISCUSSION ...................................................................................................................................................19
CHAPTER 4 PARTIALLY OBSERVABLE MARKOV DECISION PROCESS.............................................21
4.1 ARTIFICIAL INTELLIGENCE ............................................................................................................................21 4.2 DEFINITION OF PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES ...................................................22
4.2.1 Components...........................................................................................................................................22
v
4.2.2 Acting Optimally ...................................................................................................................................24 4.2.2.1 Computing the Belief State ............................................................................................................................. 25
4.2.3 Finding the Optimal Policy: Value Iteration.........................................................................................26 4.3 EXAMPLES OF POMDPS IN REAL-WORLD APPLICATIONS.............................................................................29 4.4 JUSTIFICATION FOR USING A POMDP TO MODEL REACHING REHABILITATION ............................................30
CHAPTER 5 DESIGN OF THE POMDP REACHING EXERCISE MODEL.................................................33
5.1 REQUIREMENTS SPECIFICATION.....................................................................................................................33 5.1.1 Definition of the Reaching Exercise ......................................................................................................33 5.1.2 Development of the Robotic System ......................................................................................................36 5.1.3 Definition of the POMDP Model...........................................................................................................39
5.2 STRENGTHEN MODEL................................................................................................................................40 5.2.1 Definition of the Variables ....................................................................................................................41 5.2.2 Definition of the Actions........................................................................................................................43 5.2.3 Definition of the Observation Variables and Observation Function.....................................................44 5.2.4 Definition of the Transition Function....................................................................................................44 5.2.5 Definition of the Reward Function ........................................................................................................46 5.2.6 Computation of the STRENGTHEN Model ...........................................................................................50
5.2.6.1 Selection of the Solution Method.................................................................................................................... 50 5.2.6.2 Iteration Process and Solving the Model......................................................................................................... 53
5.3 ISTRETCH MODEL.......................................................................................................................................54 5.3.1 Definition of the Variables ....................................................................................................................55 5.3.2 Definition of the Actions........................................................................................................................57 5.3.3 Definition of the Observation Variables and Observation Function.....................................................58 5.3.4 Definition of the Transition Function....................................................................................................58 5.3.5 Definition of the Reward Function ........................................................................................................63 5.3.6 Computation of the iSTRETCH Model ..................................................................................................65
5.3.6.1 Selection of the Solution Method.................................................................................................................... 65 5.3.6.2 Iteration Process and Solving the Model......................................................................................................... 65
5.4 COMPARISON OF STRENGTHEN AND ISTRETCH MODELS........................................................................66
CHAPTER 6 INTEGRATION OF THE POMDP MODEL WITH THE ROBOTIC SYSTEM.....................81
6.1 ACQUISITION OF DATA FROM THE ROBOTIC SYSTEM.....................................................................................82 6.2 SETTING THE VALUE RANGES FOR THE OBSERVATION VARIABLES...............................................................84 6.3 MERGING THE POMDP AGENT WITH THE ROBOTIC DEVICE CONTROLLER...................................................85
CHAPTER 7 EVALUATION STUDY ..................................................................................................................88
7.1 QUESTIONS TO BE ANSWERED BY THE STUDY ...............................................................................................88 7.2 PARTICIPANTS................................................................................................................................................88 7.3 TESTING METHODOLOGY ..............................................................................................................................90
vi
7.4 MODIFICATION OF INTEGRATED SYSTEM.......................................................................................................92 7.5 CAPTURING DECISIONS MADE BY POMDP AND THERAPIST .........................................................................95 7.6 QUESTIONNAIRE ............................................................................................................................................96
7.6.1 Questionnaire for Therapists.................................................................................................................96 7.6.2 Questionnaire for Patients ....................................................................................................................97
7.7 ETHICS APPROVAL.........................................................................................................................................97
CHAPTER 8 RESULTS .........................................................................................................................................98
8.1 SUBJECT DATA ..............................................................................................................................................98 8.2 DECISIONS FROM POMDP AND THERAPIST...................................................................................................99 8.3 QUESTIONNAIRE DATA ................................................................................................................................102
CHAPTER 9 DISCUSSION .................................................................................................................................107
9.1 STUDY ANALYSIS ........................................................................................................................................107 9.2 ANALYSIS OF OTHER UPPER EXTREMITY REHABILITATION ROBOTIC SYSTEMS..........................................109 9.3 LIMITATIONS ...............................................................................................................................................110 9.4 RECOMMENDATIONS FOR FUTURE WORK....................................................................................................110
CHAPTER 10 CONCLUSION.............................................................................................................................112
REFERENCES ........................................................................................................................................................114
APPENDIX I – EXAMPLE CONSTRUCTION OF A CONDITIONAL PROBABILITY TABLE ...............118
APPENDIX II – SIMULATION EXAMPLES OF THE STRENGTHEN AND ISTRETCH MODELS........120
APPENDIX III – MICRO-CONTROLLER SOFTWARE CODE.....................................................................137
APPENDIX IV – QUESTIONNAIRE FOR THE THERAPIST.........................................................................139
APPENDIX V – QUESTIONNAIRE FOR THE PATIENT ...............................................................................143
APPENDIX VI – RAW QUANTITATIVE DATA ON DECISIONS MADE BY POMDP AND
THERAPIST............................................................................................................................................................149
APPENDIX VII – RAW QUANTITATIVE AND QUALITATIVE DATA ON THERAPIST’S RATINGS
PER SESSION .........................................................................................................................................................191
vii
List of Tables Table 1.1: Contributions in the development of the upper-limb rehabilitation system .................. 6
Table 5.1: Description of the variable dynamics in the reaching exercise ................................... 45
Table 5.2: Reward function for STRENGTHEN model............................................................... 47
Table 5.3: Reward function for iSTRETCH model ...................................................................... 64
Table 5.4: Summary of pros and cons of both models during simulation .................................... 79
Table 5.5: Summary of computational aspects of each model ..................................................... 80
Table 8.1: Therapist information .................................................................................................. 98
Table 8.2: Patient information ...................................................................................................... 99
Table 8.3: Percentage of agreement over all sessions................................................................. 102
Table 8.4: Qualitative response from therapist for overall questionnaire................................... 104
Table 8.5: Quantitative response from patient for overall questionnaire.................................... 105
Table 8.6: Qualitative response from patient for overall questionnaire...................................... 106
viii
List of Figures Figure 1.1: Block diagram of the upper-limb rehabilitation system............................................... 4
Figure 1.2: Major research phases for the development and evaluation of the intelligent
reaching rehabilitation system ........................................................................................................ 4
Figure 3.1: ARM Guide (© D. Reinkensmeyer, 2000 – use of picture is by permission of the
copyright holder)........................................................................................................................... 13
Figure 3.2: MIME system in bimanual mode (© Elsevier, 2002 – use of picture is by
permission of the copyright holder) .............................................................................................. 14
Figure 3.3: GENTLE/s system (© W. Harwin, 2007 – use of picture is by permission of the
copyright holder)........................................................................................................................... 16
Figure 3.4: MIT-MANUS (© H. Krebs, 2004 – use of picture is by permission of the
copyright holder)........................................................................................................................... 18
Figure 4.1: Diagram of the relationship of the POMDP components........................................... 23
Figure 4.2: Decision cycle of a POMDP agent............................................................................. 25
Figure 4.3: A n-step policy tree .................................................................................................... 27
Figure 5.1: The reaching exercise from the initial position (a) to the final position (b) (© Lam,
2007 – use of picture is by permission of the copyright holder) .................................................. 34
Figure 5.2: Actual diagram of rehabilitation robot (A) with end-effector (B).............................. 37
Figure 5.3: Trunk photoresistor sensors placed in three locations: lower back, lower left
scapula, and lower right scapula (a) (© Lam, 2007 – use of picture is by permission of the
copyright holder) and its detection of light (b) ............................................................................. 38
Figure 5.4: Virtual environment ................................................................................................... 39
ix
Figure 5.5: STRENGTHEN (POMDP) model as a DBN. It consists of the state, S, represented
by a combination of ten variables; the actions, A; the observations, O; the reward function, R;
and the dynamics, represented by the arrows. Variables at the next time step, t+1, are denoted
with an apostrophe (e.g. hat’). ...................................................................................................... 41
Figure 5.6: Example of an optimal value function in a two-state POMDP. The belief space is a
one-dimensional vector of two non-negative numbers that sum to 1 [b(s0) = P(s0) = 1-P(s1)].
The x-axis, therefore, represents the whole belief space on which the value function Vn(b) is
defined. The upper surface of the three α-vectors is the optimal value function, Vn*(b), which
defines the optimal action to take in a particular belief state. At the belief state, b, the action
associated with α2 should be taken. .............................................................................................. 52
Figure 5.7: Example of a Perseus backup stage in a two-state POMDP. The x-axis represents
the belief space and the y-axis represents V(b). Solid lines are the α-vectors from the current
stage and dashed lines are the α-vectors from the previous stage. There are seven belief states
{b1,…,b7} which comprise the set of reachable belief points (B) indicated by the tick marks.
The backup stage computing Vn+1 from Vn proceeds as follows: (a) the value function at stage
n; (b) the computation of Vn+1 starts by sampling b6, which produces an α-vector that
improves the values of b6 and b7; (c) b3 is then sampled, which produces an α-vector that
improves the values of b1 through b5; and (d) the values of all b ∈ B has improve and thus, the
backup stage at n+1 is completed (© AI Access Foundation, 2005 – use of picture is by
permission of the copyright holder). ............................................................................................. 53
Figure 5.8: iSTRETCH (POMDP) model as a DBN. It consists of the state, S, represented by
a combination of nine variables; the actions, A; the observations, O; the reward function, R;
and the dynamics, represented by the arrows. .............................................................................. 55
Figure 5.9: Example pace function for comp=yes, with φ+ = 0.9, φ- = 0.1, st+ = +3, st- = -1,
m(f=yes) = 0.8, and m(f=no) = 0.0. Shown are the upper and lower pace limits, and the pace
function for each condition of fat.................................................................................................. 61
Figure 5.10: Example pace function for ttt, with φ+ = 0.9, φ- = 0.1, and m(f=no) = 0.0. Shown
are the upper (st+ = -3) and lower (st- = +2) pace limits for ttt=norm, and the upper (st+ = +4)
x
and lower (st- = +1) pace limits for ttt=none. The pace function for ttt=slow gets what is left of
the probability mass. ..................................................................................................................... 62
Figure 5.11: (a) Updated belief state of n(r) and fat after the user failed to reach d=d3, had
minimum control and no compensation. The POMDP decides to set the next action at d=d1 at
the same resistance (r=none); (b) Updated belief state of n(r), stretch, fat, and learnrate after
the user failed to reach d=d3, had minimum control and no compensation. The POMDP
decides to set the next action at d=d3 at the same resistance (r=none). ....................................... 68
Figure 5.12: (a) Updated belief state after the user successfully reached d=d1, with maximum
control and no compensation; (b) Updated belief state after the user failed to reach d=d3, with
minimum control and no compensation........................................................................................ 69
Figure 5.13: (a) Updated belief state of n(r) and fat after the user successfully reached d=d3,
with maximum control and no compensation. The POMDP decides to set the next action at
d=d3 at the same resistance (r=max); (b) Updated belief state of n(r), stretch, fat, and
learnrate after the user successfully reached d=d3, with maximum control and no
compensation. The POMDP decides to set the next action at d=d3 at the same resistance
(r=max). ........................................................................................................................................ 73
Figure 5.14: (a) Updated belief state after the user successfully reached d=d3, with maximum
control but this time with compensation. The POMDP decides to set the next action again at
d=d3 at the same resistance (r=max); (b) Updated belief state after the user successfully
reached d=d3, with minimum control but this time with compensation. The POMDP decides
to set the next action again at d=d3 at the same resistance (r=max)............................................. 74
Figure 5.15: (a) Updated belief state after the user again successfully reached d=d3, with
maximum control and with compensation. The POMDP decides to set the next action again at
d=d3 at the same resistance (r=max); (b) Updated belief state after the user again successfully
reached d=d3, with minimum control and with compensation. The POMDP decides to stop
the exercise.................................................................................................................................... 75
Figure 5.16: STRENGTHEN model. Updated belief state of n(r) and fat after the user
successfully reached d=d3 in slow time, with minimum control, and with compensation. The
POMDP decides to set the next action at d=d3 at the same resistance (r=max)........................... 76
xi
Figure 5.17: STRENGTHEN model. Updated belief state after the user successfully reached
d=d3 in slow time, with compensation but this time with no control. The POMDP decides to
set the next action at d=d3 at the same resistance (r=max). Notice the reverse in the fatigue
level from the previous Figure 5.16.............................................................................................. 77
Figure 5.18: STRENGTHEN model. Updated belief state after the user again successfully
reached d=d3 in slow time, with no control, and with compensation. The POMDP decides to
set the next action at d=d3 at the same resistance (r=max). Notice again the reverse in the
fatigue level from the previous Figure 5.17.................................................................................. 78
Figure 6.1: Diagram of the reaching rehabilitation system consisting of the robotic system (a)
and the POMDP agent (b)............................................................................................................. 82
Figure 6.2: Massachusetts Institute of Technology’s Handyboard (micro-controller)................. 83
Figure 6.3: Interaction between POMDP agent and robotic controller ........................................ 87
Figure 7.1: Interaction between POMDP agent and robotic controller, via the therapist............. 93
Figure 7.2: Therapist GUI displaying: (A) decision from POMDP, (B) therapist agreement of
decision made, (C) action choice if therapist disagrees, (D) decision from therapist, (E)
history of actions and observations, and (F) emergency stop button............................................ 94
Figure 7.3: Final rehabilitation system in use consisting of: (A) virtual environment on the
computer monitor, (B) therapist GUI on another monitor, (C) end-effector with rotational
encoder, (D) haptic-robotic device, (E) trunk photoresistor sensors (not seen – placed on
chair), and (F) robotic controller and POMDP agent ................................................................... 95
Figure 8.1: Percentage of agreement per session........................................................................ 101
Figure 8.2: Evaluation of POMDP decisions made by therapist on Likert scale with a mean
and SD of 2.833 and 0.408, respectively, for question a) and a mean and SD of 3.167 and
0.408, respectively, for question b)............................................................................................. 103
xii
List of Appendices APPENDIX I – EXAMPLE CONSTRUCTION OF A CONDITIONAL PROBABILITY TABLE ...............118
APPENDIX II – SIMULATION EXAMPLES OF THE STRENGTHEN AND ISTRETCH MODELS........120
APPENDIX III – MICRO-CONTROLLER SOFTWARE CODE.....................................................................137
APPENDIX IV – QUESTIONNAIRE FOR THE THERAPIST.........................................................................139
APPENDIX V – QUESTIONNAIRE FOR THE PATIENT ...............................................................................143
APPENDIX VI – RAW QUANTITATIVE DATA ON DECISIONS MADE BY POMDP AND
THERAPIST............................................................................................................................................................149
APPENDIX VII – RAW QUANTITATIVE AND QUALITATIVE DATA ON THERAPIST’S RATINGS
PER SESSION .........................................................................................................................................................191
xiii
List of Acronyms 2D Two-dimensional
3D Three-dimensional
ADD Algebraic decision diagram
ADL Activities of daily living
AI Artificial intelligence
ANN Artificial neural network
ARM Guide Assisted Rehabilitation and Measurement Guide
CIMT Constrained-induced movement therapy
CMSA Chedoke-McMaster Stroke Assessment
CPT Conditional probability table
DBN Dynamic Bayesian network
DOF Degree of freedom
FM Fugl-Meyer
GUI Graphical user interface
iSTRETCH intelligent STroke Rehabilitation Exercise TeCHnology
MIME Mirror Image Movement Enabler
MIT-MANUS Massachusetts Institute of Technology
OT Occupational therapist
POMDP Partially observable Markov decision process
PI Proportional-integral
PT Physical therapist
PWLC Piecewise linear and convex
SD Standard deviation
STRENGTHEN STroke REhabilitatioN Guidance Tool in Haptic ENvironment
xiv
TRI Toronto Rehabilitation Institute
VE Virtual environment
xv
List of Symbols A / a Action space / action
α-vector An |S|-dimensional hyper-plane
B Set of reachable belief states
b Belief state
β Discount factor
f Fatigue
Φ Pace function
Γn Set of α-vectors
h Horizon
m Mean stretch
m(f) Fatigue effect
n Finite horizon
O / o Observation space / observation
P / p Policy tree space / policy tree
R Reward function
S / s State space / state
st Stretch
σst Slope of pace function
T Transition function
t Time step
V Value function
Z Observation function
1
Chapter 1 Introduction
1 Introduction
1.1 Problem Statement Stroke is the leading cause of physical disability and third leading cause of death in most
countries around the world, including Canada (Canadian Stroke Network, 2007; Caplan, 2006).
Every year more than 50,000 Canadians will suffer a stroke – one person every ten minutes.
This number is expected to increase with Canada’s aging population, since the risk of stroke
doubles every decade after the age of 55 (Heart and Stroke Foundation of Canada, 2008). The
consequences of stroke are devastating with approximately 75% of stroke victims left with a
permanent disability. Statistics from the Heart and Stroke Foundation show that the general
effects after stroke are:
• 10 percent of stroke survivors recover completely
• 25 percent recover with a minor impairment
• 40 percent are left with a moderate to severe impairment
• 10 percent require long-term care as they are left with a severe impairment
• 15 percent die (Heart and Stroke Foundation of Canada, 2008)
The expense of stroke in Canada is estimated at approximately $2.7 billion a year in physician
services, hospital costs, lost wages, and decreased productivity (Heart and Stroke Foundation of
Canada, 2006).
A growing body of research has shown that stroke rehabilitation can substantially reduce the
limitations and disabilities that arise from stroke, and improve motor function, allowing stroke
survivors to regain their quality of life and independence. It is generally agreed that intensive
(e.g. constraint-induced movement therapy), repetitive, and goal-directed rehabilitation improves
2
motor function and cortical reorganisation in stroke patients with both acute and long-term
(chronic) impairments (Fasoli, Krebs, & Hogan, 2004). However, this long and physically
demanding rehabilitation process is both slow and tedious, usually involving extensive
interaction between one or more therapists and one patient. One of the main motivations for
developing rehabilitation robotic devices is to automate interventions that are normally repetitive
and labour-intensive. These robots can provide stroke patients with intensive, reproducible, and
task-oriented movement training in time-unlimited durations, which can alleviate physical strain
on therapists. In addition, these devices can provide therapists with accurate measures on patient
performance and function (e.g. range of motion, speed, smoothness, strength) over the course of
a therapeutic intervention, and provide quantitative diagnosis and assessments of motor
impairments such as spasticity, tone, and strength (Hilder, Nichols, Pelliccio, & Brady, 2005).
This technology makes it possible for a single therapist to supervise multiple patients
simultaneously, which can contribute in the reduction of health care costs. It must be
emphasised that the goal of therapy robots is not to replace physical and occupational therapists,
but rather to complement existing treatment options. The use of rehabilitation robots would
provide therapists with more freedom to apply their expertise in educating patients to live with
their new or relearned motor skills in functional activities, and on pain management (Young,
2007).
The upper extremities are typically affected more than the lower extremities after stroke. The leg
most often recovers enough to allow standing and walking, whereas arm recovery is usually not
as complete (Caplan, 2006). Stroke patients with an affected upper-limb have great difficulties
performing many activities of daily living (ADL), such as reaching to grasp objects. Although
there are many robotic systems designed to assist and improve upper-limb stroke rehabilitation
(Brewer, McDowell, & Worthen-Chaudhari, 2007), none of them are able to operate
autonomously (without any explicit feedback from the therapist) and account for the specific
needs and abilities of each individual, which will change over time. These features are
especially important to reduce stroke patients’ direct dependence on therapists in the clinic and to
eventually have patients practicing efficient therapy at home.
The main goal of this thesis was to design and develop an intelligent system to autonomously
facilitate upper-limb reaching rehabilitation for moderate level stroke survivors using a partially
observable Markov decision process (POMDP).
3
1.2 Objectives The objectives of this research were to:
1. Design an adaptive system to guide stroke patients through a targeted, load-bearing,
linear-reaching exercise for the upper-limb.
2. Have professional therapists evaluate the performance of the system through the
comparison of the decisions made by the system versus those by a human therapist.
1.3 Research Questions This study attempted to answer the following questions:
1. Can a POMDP make decisions that are in line with those made by human therapists to
guide stroke patients through a targeted, load-bearing, linear-reaching exercise for the
upper-limb?
2. What aspects of the system seem to get more positive or negative feedback from
therapists and patients?
3. What future work is needed to improve the development of the POMDP and overall
system?
1.4 Scope of Research The overall upper-limb rehabilitation system can be represented by the block diagram in Figure
1.1. The user performs the exercise on the robotic device, and at the same time receives visual
feedback from the virtual environment on the computer display. The POMDP system analyses
performance data from the robotic system, makes a decision, and selects an action for the system
to execute (i.e. sets the exercise parameters). A therapist is present to oversee and control the
entire system.
4
Figure 1.1: Block diagram of the upper-limb rehabilitation system
The primary focus of this thesis was on the software aspect (POMDP) of the overall
rehabilitation system. Figure 1.2 shows the four main research phases for this project.
Figure 1.2: Major research phases for the development and evaluation of the intelligent reaching rehabilitation system
5
1.4.1 Development of the Intelligent System
An artificial intelligence system for guiding stroke users during the upper-limb exercise was
developed using a POMDP model. The chosen exercise was analysed and the resulting model
dynamics were defined.
1.4.2 Integration of the POMDP Model with the Robotic System
The final POMDP model was integrated with all aspects of the robotic system, including the
postural sensors, computer interface, and haptic technology.
1.4.3 Development of the Evaluation Study
The goal of the study was to obtain feedback from both therapists and stroke patients to evaluate
and enhance not only the decision-making strategy used by the system, but on the overall
rehabilitation system itself. Two questionnaires were developed: one to gather professional
therapists’ opinions on the decision-making ability of the POMDP model, and the other to gain
insight on the overall rehabilitation system from stroke patients.
1.4.4 Conducting the Evaluation Study
A pilot study of one therapist and one patient was conducted. The study sessions were held three
times a week for two weeks.
1.5 Contributions Table 1.1 outlines the specific contributions of the author and other parties in the development of
the overall upper-limb rehabilitation system. Quanser Inc.1 was the project’s industry partner,
and Paul Lam was a previous student whose thesis focused on designing and testing the
hardware aspect (robotic device) of the system.
1 Contact Quanser Inc. by phone at +1 905 940 3575 or visit their website at www.quanser.com
6
Table 1.1: Contributions in the development of the upper-limb rehabilitation system
Patricia Kan Paul Lam Quanser Inc.
• designed and developed
POMDP system
• integrated POMDP system
with all aspects of robotic
system
• developed and conducted
evaluation study of overall
integrated system
• designed concept of
robotic device
• developed trunk sensors
(micro-controller)
• developed and conducted
usability study of
hardware platform
• developed robotic device
• developed haptic
controller
• developed virtual
environment
7
Chapter 2 Background
2 Background
2.1 Stroke Stroke is defined as injury to the brain caused by an abnormality of blood supply to part of the
brain. There are two major categories of brain damage in stroke patients: ischemia – a lack of
blood flow depriving brain cells of needed fuel and oxygen; and hemorrhage – the release of
blood either into the brain or into the extravascular spaces within the skull. Bleeding damages
the brain by tearing and disconnecting vital nerve centres and pathways, and by causing pressure
inside the cranium (Caplan, 2000; Caplan, 2006).
An ischemic stroke is the most common form of stroke, accounting for approximately 80% of all
stroke occurrences (Caplan, 2006). Ischemic infarctions can be caused by three different
mechanisms: thrombosis (obstruction of blood flow due to narrowing of blood vessels),
embolism (blockage of blood flow due to lodging of foreign materials in blood vessels), and
systemic hypoperfusion (diminished blood flow to brain due to abnormal performance of the
heart). Hemorrhagic strokes account for the remaining 20% of stroke occurrences (Caplan,
2006).
Both types of stroke cause death in brain cells, and that portion of the brain becomes unable to
perform its normal functions. The effects of stroke include motor and sensory dysfunction,
cognitive and behavioural changes, loss of memory, and language and visual abnormalities
(Caplan, 2006).
2.2 Stroke Recovery A majority of stroke patients improve from the effects of stroke, with some even returning to
normal or near-normal functioning (Caplan, 2006). This improvement results from three general
8
changes in the sensorimotor networks: restitution, substitution, and compensation (Barnes,
Dobkin, & Bogousslavsky, 2005).
Restitution is an early, spontaneous recovery that is independent of external variables such as
physical and cognitive stimulation. It usually occurs within the first three to six months after a
stroke (Caplan, 2000) and is typically attributed to the biochemical and gene-induced events that
help to restore the functionality of the injured brain tissue, such as reduction of edema,
absorption of heme, and restoration of ionic currents and axonal transport (Barnes et al., 2005;
Gillen & Burkhardt, 2004).
Further intrinsic recovery is due to a reserve system that involves a redundancy built into the
central nervous system pathways. If a portion of the total pathway controlling an activity is
destroyed by a stroke, the remainder can take over the task. Conversely, an activity may be
controlled through multiple pathways, and if the predominant pathway is destroyed by the stroke,
one of the others can take over (Caplan, 2006). This reorganisation of the undamaged system in
the brain is referred to as plasticity, and is influenced by external stimuli such as practise with the
affected hemiparetic limb during rehabilitation (Barnes et al., 2005; Gillen & Burkhardt, 2004).
This process takes time, accounting for the slow and gradual recovery of patients after the first
three months of stroke (Caplan, 2006). Essentially, a new system is “substituted” for the
function of the old one.
Compensating or adapting to the disabilities that arise from stroke is learning to function
independently using movements alternative to the ones used before the stroke (Caplan, 2006).
This can include developing a new skill that replaces the defective one such as learning to dress
with one hand as opposed to two, and adjusting intentions such as training to use a wheelchair
because walking is not feasible (Barnes et al., 2005).
2.3 Rehabilitation of Motor Skills Knowing that functional recovery results in the ongoing nature of the reorganisational processes
in the nervous system in response to use and activity, designs for optimal stroke rehabilitation
can be created (Carr & Shepherd, 2003). Rehabilitation focuses on recovery, and helps to
9
minimise any handicaps that relate to neurologic impairments following a stroke (Caplan, 2006).
Studies of animals and humans with brain lesions provide insight into the process of functional
recovery and on the relationship between neural reorganisation and rehabilitation.
A study on squirrel monkeys, which modelled the effects of a focal ischemic infarct within the
hand motor area of the cortex, found that there was further loss of hand representation in the area
adjacent to the lesion when the monkeys had no post-infarct intervention (Nudo & Milliken,
1996). In the follow up study, the monkey’s unimpaired hand was restrained while the impaired
hand had daily repetitive training in skilled use of retrieving food pellets from small wells. Not
only was tissue loss prevented, but there was also a net gain of approximately ten percent in the
total hand area adjacent to the lesion (Nudo, Wise, SiFuentes, & Milliken, 1996). In a further
study, in which the monkey’s unimpaired hand was restrained but no training was given to the
unimpaired hand, the size of the total hand area was decreased (Friel & Nudo, 1998). The results
of these studies show that active use, such as repetitive training and skilled use, of the limb is
necessary for the survival of undamaged neurons adjacent to those damaged by cortical injury.
Studies involving both healthy and hemiparetic subjects have also provided evidence that
functional plasticity after stroke is associated with meaningful use of a limb during task-oriented
or task-specific repetitive exercises (Carr & Shepherd, 2003), especially when performed in a
massed or contextual interference paradigm (Barnes et al., 2005). Constrained-induced
movement therapy (CIMT) is a type of massed practice, where the unaffected limb is restricted
to force training of the impaired limb. Liepert, Bauder, Miltner, Taub, and Weiller (2000)
showed the relationship between CIMT and reorganisation of the cerebral cortex in persons
several years after stroke. After CIMT, the authors reported a significant enlargement in the size
of the cortical area of the affected hand muscle, which corresponded to a greatly improved motor
performance of the impaired limb. These changes were maintained six months later in a follow-
up examination, with the area of the cortical representation in the affected hemisphere almost
identical to the unaffected side (Liepert et al., 2000). The results of several other studies also
support the idea that use-dependent activities result in functional reorganization in the adult
cerebral cortex after stroke (Carr & Shepherd, 2003). In addition, Barnes et al. (2005) suggest
that more intensive, task-oriented practice seems to enhance learning and performance.
10
There is evidence that brain reorganisation and functional recovery from brain lesions are
dependent on intensive, repetitive, and task-oriented movements. Thus, the rehabilitation
environment must offer possibilities for intensive and meaningful exercise and training (Caplan,
2006). The three primary means of rehabilitation are physical therapy, occupational therapy, and
speech language pathology.
2.4 Role of Therapists The main role of a physical therapist (PT) is to train the patient for ambulation. This includes
range-of-motion, strengthening, and endurance exercises. They also instruct patients on how to
use various aids, such as canes and walkers, if needed. An occupational therapist (OT) helps to
retrain the skills needed for activities of daily living. Patients and OTs work on improving fine
motor skills so activities such as feeding, bathing, dressing, and cooking can be accomplished.
Speech language pathologists work with patients to relearn language and communication skills
such as speaking, reading, and writing (Caplan, 2000).
In general, the primary role of a therapist is to facilitate the motor relearning process. This is
done by identifying the problems faced by the patient and by analysing their movements through
observation and comparison with normal movements. The therapist also identifies components
that are lacking or poorly controlled, and teaches the patient to perform these missing
components using goal setting, instruction, feedback, and manual guidance (Barnes et al., 2005).
These movements are then practiced, followed by training of the task in a more functional
context to promote transfer or carry-over in real-life situations. The patient is encouraged by the
therapist to practice these exercises extensively, not only under the therapist’s supervision but
also independently in a variety of environments (Barnes et al., 2005).
The process of conventional therapy requires a large amount of therapist involvement. It is time
consuming and most often requires one-on-one therapist-patient interaction. This interaction
places excessive physical demands on therapists that sometimes results in repetitive strain
injuries, lower back problems, and severe fatigue (Hilder et al., 2005). Not only does
rehabilitation place a great deal of burden on therapists, it contributes to the high health care
costs, for example, during instances where more than one therapist is needed to provide a
11
therapeutic intervention, such as gait training in a severely impaired stroke patient (Hilder et al.,
2005).
2.5 Reducing Health Care and Therapist Burden Efforts toward developing robotic treatments are motivated by the increasing pressure to contain
and reduce health care costs that have resulted in a cutback of the time and resources available
for post-stroke rehabilitation (Barnes et al., 2005). These factors emphasise the need for new
approaches to increase the effectiveness and efficiency for motor therapy after stroke.
Integration of robotic therapy into current practice may improve inpatient rehabilitation, where it
can unburden the therapist of repetitive, time-consuming tasks and allow more time to focus on
care delivery and individual patient needs (Young, 2007). Robots may even provide a means of
delivering high-quality outpatient treatments in places that incur lower costs, including
community care centres, skilled nursing facilities, assisted living centres, and eventually,
patients’ homes (Barnes et al., 2005).
Robotic technology may also improve the quality and effectiveness of rehabilitation in the
following ways (Barnes et al., 2005):
• provide better control on movement delivery
• allow increased intensity or dosage
• provide better responsiveness and adaptation to a patient’s changing needs and abilities
• provide accurate measures on performance and assessment
12
Chapter 3 Literature Review
3 Literature Review This chapter presents a review of some of the current robotic applications in post-stroke therapy
for the upper extremity.
3.1 Current Rehabilitation Robotic Systems for Upper Extremities
There have been several types of robotic devices designed to deliver upper-limb rehabilitation
for people with paralysed upper extremities. The Assisted Rehabilitation and Measurement
(ARM) Guide developed by Reinkensmeyer et al. (2000) was designed to mimic the reaching
motion. It consists of a single motor and chain drive that is used to move the user’s hand along a
linear constraint, which can be manually oriented in different angles to allow reaching in various
directions (Figure 3.1). The ARM Guide implements a technique called “active assist therapy”,
in which its essential principle is to complete a desired movement for the user if he/she is unable
to do so. This assistance is achieved with a control algorithm that:
• allows the user to initiate movement through at least one centimetre along the track in
the forward direction,
• completes the reaching movement in a smooth fashion by driving the arm along the
desired minimum-jerk trajectory if the user cannot complete the movement, and
• does not apply assistance if the user follows the desired trajectory within a one
centimetre dead-band (Kahn, Zygman, Rymer, & Reinkensmeyer, 2006).
A pilot study was performed to compare the effects of active-assisted versus free reaching
exercises to improve arm movement after stroke. Nineteen chronic individuals were randomised
into two groups: one performed the active-assisted exercises on the ARM Guide, while the other
13
performed a task-matched amount of unassisted training. The study concluded that the
improvements were not significantly different between the two groups. However, Kahn et al.
(2006) suggested that the inconclusive results might have been due to the study’s small sample
size.
Figure 3.1: ARM Guide (© D. Reinkensmeyer, 2000 – use of picture is by permission of the copyright holder)
The Mirror Image Movement Enabler (MIME) therapy system was designed through a
collaborative effort between the Veteran Administration Medical Center in Palo Alto and
Stanford University (Lum, Burgar, Shor, Majmundar, & Van der Loos, 2002). It consists of a
six-degree of freedom (DOF) robot manipulator, which is attached to the orthosis supporting the
user’s affected arm (Figure 3.2), and applies forces to the limb during both unimanual and
bimanual goal-directed movements in 3-dimensional (3D) space. Unilateral movements involve
the robot moving or assisting the paretic limb towards a target in pre-programmed trajectories.
The bimanual mode works in a slave configuration where the robot-assisted affected limb
14
mirrors the unimpaired arm movements as seen in Figure 3.2.
Figure 3.2: MIME system in bimanual mode (© Elsevier, 2002 – use of picture is by permission of the copyright holder)
A randomised controlled study involving 27 chronic stroke subjects was performed to compare
the effects of robot-assisted movement training with conventional rehabilitation techniques. For
the robot group, subjects practised shoulder and elbow movements while assisted by the robot in
four different modes of operation:
• passive – the subject’s arm was passively moved by the robot along a predetermined
trajectory, while the subject relaxed the paretic limb
15
• active-assisted – the subject would first initialise the movement with volitional force
toward the target, and then both the user and robot would work together to move the limb
• active-constrained – the robot provided resistive forces in the direction of the desired
movement
• bilateral – the robot assisted the affected limb by continuously moving the affected
forearm to the unaffected forearm’s mirror image position and orientation (i.e. the two
forearms were kept in mirror symmetry)
The control group practised various tasks with their arm, which targeted proximal upper-limb
function. It was found from this study that subjects who received MIME therapy made
statistically higher gains in proximal arm function (Fugl-Meyer (FM) scores), strength, and
reaching. However, at the six-month follow up, there were no statistical differences in function
between the two groups (Lum et al., 2002). Similar results were also found for individuals with
subacute stroke (Lum et al., 2006).
The GENTLE/s project, funded by the European Union under the Quality of Life initiative of
Framework Five, was also designed to deliver upper-limb robot-mediated therapy for stroke
patients (Amirabdollahian et al., 2007). The GENTLE/s system (Figure 3.3) is comprised of a
commercially available 3-DOF robot, the HapticMASTER (FCS Robotics Inc.), which is
attached to a wrist splint via a passive gimbal mechanism with 3-DOF. The gimbal allows for
pronation/supination of the elbow as well as flexion and extension of the wrist. The seated user,
whose arm is suspended from a sling to eliminate gravity effects, can perform reaching
movements through interaction with the virtual environment (VE) on the computer screen.
16
Figure 3.3: GENTLE/s system (© W. Harwin, 2007 – use of picture is by permission of the copyright holder)
A randomised controlled study to assess the effect of the robot-mediated therapies on the
GENTLE/s system compared to sling suspension therapies was performed with 31 chronic stroke
participants. Subjects in the robot group performed reaching tasks in three different modes:
• patient passive – the robot moved the user’s arm following a predefined path
• patient active assisted – the robot would only start to move if the user initiated a
movement by providing a nominal force in the correct direction
• patient active – the robot stayed passive until the user deviated from the planned
trajectory; only then would the robot assist the user to return to the path
Subjects in the control group practised reaching-type exercises while the paretic arm was
suspended from a frame eliminating gravity. The study results indicated that both groups
17
improved function, as measured by the FM scale’s upper-limb section. However, the
improvements were not significantly different between the two groups (Amirabdollahian et al.,
2007).
The rehabilitation robotic device that has received the most clinical testing is the Massachusetts
Institute of Technology (MIT)-MANUS (Krebs, Hogen, Aisen, & Volpe, 1998). The MIT-
MANUS consists of a 2-DOF robot manipulator that assists shoulder and elbow movements by
moving the user’s hand in the horizontal plane (Figure 3.4). In a previous randomised study
involving 56 subacute stroke patients, those who received 25 hours of robot exercise in addition
to their conventional therapy had greater gains in proximal arm strength, reduced motor
impairment at the shoulder and elbow, and greater recovery of ADL function when compared
with controls who received only minimal exposure (five hours) to the robot (Volpe et al., 2000).
The robot group practised goal-directed reaching movements in active or passive modes. The
robot would guide the user’s hand to the desired target if the user did not move; otherwise, the
robot would be left in passive mode. The control group interacted with the robot in passive
mode only. If the user could not perform the task with the affected limb, s/he used the
unimpaired limb to complete the movement (Volpe et al., 2000). Unfortunately, these results are
not definitive since the treatment group received five times of additional therapy compared with
the control group. The additional time spent on therapy, not the robotic device, may have
accounted for the different results.
18
Figure 3.4: MIT-MANUS (© H. Krebs, 2004 – use of picture is by permission of the copyright holder)
Further studies evaluating the effect of robotic therapy with the MIT-Manus in reducing chronic
motor impairments show that there were statistically significant improvements in motor function
(Ferraro et al., 2003; Fasoli et al., 2004; MacClellan et al., 2005). These participants were at or
near a plateau in their ability to move the paretic arm at the time of study admission. However,
these studies were not compared with conventional therapy.
Researchers in the artificial intelligence community have started to design robot-assisted
rehabilitation devices that implement artificial intelligence methods to improve on the active
assistance techniques found in the previous systems mentioned above. However, very few have
been developed.
Ju, C. Lin, D. Lin, Hwang, and Chen (2005) developed an elbow and shoulder rehabilitation
robot that uses a hybrid position/force fuzzy logic controller to assist the user’s arm along
19
predetermined linear or circular trajectories with specified loads. The robot helps to constrain
the movements in the desired direction, if the user deviates from the predetermined path. Fuzzy
logic was incorporated in the position and force control algorithms to cope with the nonlinear
dynamics (i.e. uncertainty of the dynamics model of the user) of the robotic system to ensure
operation for different users.
Erol, Mallapragada, Sarkar, Uswatte, and Taub (2006) developed an artificial neural network
(ANN) based proportional-integral (PI) gain scheduling direct force controller to provide robotic
assistance for upper extremity rehabilitation. The controller has the ability to automatically
select appropriate PI gains to accommodate a wide range of users with varying physical
conditions by training the ANN with estimated human arm parameters. The idea is to
automatically tune the gains of the force controller based on the condition of each patient’s arm
parameters in order for it to apply the desired assistive force in an efficient and precise manner.
3.2 Discussion Although these robotic systems have shown promising results, none of them are able to provide
an autonomous rehabilitation regime that accounts for the specific needs and abilities of each
individual. Each user progresses in different ways and thus, exercises must be tailored to each
individual differently. For example, the difficulty of an exercise should increase faster for those
who are progressing well compared to those who are having trouble performing the exercise.
The GENTLE/s system requires the user or therapist to constantly press a button in order for the
system to be in operational mode (Amirabdollahian et al., 2007). It is imperative that a
rehabilitation system can operate with no or very little feedback as any direct input from the
therapist (or user), such as setting a particular resistance level, prevents the user from performing
the exercise uninterrupted. The system should be able to autonomously adjust different exercise
parameters in accordance to each individual’s needs.
The rehabilitation systems discussed above also do not account for physiological factors, such as
fatigue, which can have a significant affect on rehabilitation progress (Barnes et al., 2005). A
20
system that can incorporate and estimate user fatigue can provide information as to when the
user should take a break and rest, which may benefit rehabilitation progress.
This thesis aims to fill in these existing gaps by using partially observable Markov decision
process (POMDP) techniques to autonomously guide stroke patients during upper-limb reaching
rehabilitation, tailor exercise parameters for each individual, and estimate user fatigue.
21
Chapter 4 Partially Observable Markov Decision Process
4 Partially Observable Markov Decision Process
4.1 Artificial Intelligence Artificial intelligence (AI) is a field that not only tries to understand how humans think, but also
attempts to build intelligent entities (agents) that are capable of thinking and acting in a rational
manner (Russell & Norvig, 2003). AI in its formative years was influenced by ideas from many
disciplines including philosophy, mathematics, economics, neuroscience, psychology, computer
engineering, control theory, and linguistics (Russell & Norvig, 2003). However, AI has now
grown beyond these lines of work and has, in turn, occasionally influenced them. Only in the
last half century have there been computational devices and programming languages powerful
enough to create and solve experimental tests of ideas about what intelligence is (Buchanan,
2005).
For an agent to operate interactively, it must be able to perceive its environment through sensors
and act upon that environment through actuators (Russell & Norvig, 2003). Through senses such
as sight, sound, and touch, humans are able to perceive their environment, make decisions based
on this input, and then affect their environment through actuators (body parts) such as speech,
gestures, and movement. An AI agent operates in the same fashion except its sensors and
actuators may differ depending on the particular problem. For example, a robotic agent designed
to navigate through a maze may have cameras and infrared range finders for sensors and various
motors for actuators, and a software agent may receive keystrokes as sensory inputs and act on
the environment by displaying characters on the screen. In any case, to design an agent that is
rational and effective, it is important to comprehend the problem at hand, which will guide the
selection of suitable sensors and actuators, as well as the type of AI employed to solve the
problem.
There are many models and techniques of AI available to solve problems in various areas from
speech recognition to game playing. However, each type has a different technique that is better
22
suited to solve some problems over others. Fuzzy logic, neural networks, and decision theory are
some examples of commonly used AI techniques. A POMDP is a decision-theoretic model that
assumes partial observability of the environment. It is a combination of probability and utility
theory, and is the type of AI chosen for use in this thesis. POMDPs can provide a natural
framework for modelling complex planning problems with partial observability, uncertain action
effects, incomplete knowledge of the state of the environment, and multiple interacting
objectives.
4.2 Definition of Partially Observable Markov Decision Processes
A POMDP model can represent a planning problem under uncertainty: to optimally choose
sequences of actions in a partially observable environment that will achieve a particular goal. It
is based on decision theory, which is a combination of probability theory (describes what the
agent should believe on the basis of evidence) and utility theory (describes what the agent wants)
that describes what the agent should do. The POMDP agent uses decision theory to make
decisions by considering all possible actions and choosing the one that leads to the best expected
outcome. A POMDP is also a sequential decision model, in which the agent’s utility depends on
a sequence of states (an environment history) rather than on a single state to make a decision
(Russell & Norvig, 2003). This feature allows more complex, real-world problems to be solved.
For a more detailed review of POMDPs refer to Kaelbling, Littman, and Cassandra (1998). The
following equations (Equations 4.1 - 4.5) are also based on the paper by Kaelbling et al. (1998).
4.2.1 Components
POMDPs can be described as having eight components: the state space S, the action space A, the
transition function T, the observation space O, the observation function Z, the reward function R,
the horizon h, and the discount factor β. The relationship of these components can be seen in
Figure 4.1. The POMDP described below assumes discrete time steps.
23
Figure 4.1: Diagram of the relationship of the POMDP components
State space (S): The world is represented by a finite set (S) of distinct states (s).
Action space (A): The action space (A) is comprised of a finite number of actions (a) available to
the agent. The agent’s goal is to choose actions that will influence the world in such a way that
desirable states are visited more frequently.
Transition function (T): As opposed to classical planning models, POMDPs can model the
uncertainty in the effects of actions. This means that the current state of the world (s) has a
certain probability of making a transition to any state (s’) in S as a result of executing an action
(a). P(s’|s,a) denotes the probability of the world making a transition to state s’ when action a is
executed in state s. Note that this transition function operates under the Markov assumption,
which declares that the probability of transition to some state s’ at the next time step, t+1,
depends only on the state s and action a at the current time step, t. It is independent of the
previous states and actions.
Observation space (O): The observation space (O) is comprised of a finite number of
observations (o) the agent can experience of its world. Observations correspond to features of
the world that are directly perceptible by the agent’s sensors.
24
Observation function (Z): Observations provide information about the current state of the world.
The observation function describes the probability of the agent experiencing observation o after
executing action a and making a transition to state s’ denoted by P(o|a,s’). Note that
observations only provide partial information to the agent since the same observation may be
experienced in different states.
Reward function (R): In order for the agent to decide which action to choose, there must be
motivation to pick one action over another. The reward function, R(s,a), dictates how much the
agent earns when the world is in state s and executes some action a. Knowing these rewards
allows the agent to choose which action to take by following some strategy, such as attempting
to maximise its cumulative reward. Note that rewards can be both positive and negative (i.e.
cost). The reward function can model both simple and complex concurrent goals, which allows
the agent to combine multiple goals and make rational tradeoffs with respect to those goals. For
example, the agent may take actions that will penalise it in the short term, but may yield the
agent a better probability of success in the long term.
Horizon h and discount factor β: In decision theory, the agent’s goal is to maximise the expected
utility earned over some time frame. This time frame is known as the horizon h, which specifies
the number of time steps the agent must plan for. It can be finite or infinite. A discount factor β
is used to indicate how rewards received by the agent at different time steps should be weighted.
If β is set to 1, then future rewards will be worth as much to the agent as current rewards. If 0 ≤
β < 1, then future rewards will be worth less than current ones, each scaled down for every time
step delay. This thesis assumes infinite horizon POMDPs.
4.2.2 Acting Optimally
The decision cycle of a POMDP agent can be seen in Figure 4.2. Basically, the agent makes an
observation of the world, and then generates an action. The agent’s goal remains to maximise
the expected discounted sum of future rewards.
25
Figure 4.2: Decision cycle of a POMDP agent
Since knowledge of the state of the world is uncertain, the POMDP agent keeps an internal belief
state, b, that represents the probability distribution over all possible states of the world (S).
These distributions encode the agent’s subjective probability about the state of the world and
provide a basis for acting under uncertainty. In addition, the belief state summarises its previous
experiences due to the Markovian assumption.
Given the agent’s belief state, the policy decides which action to generate. It is a mapping of
belief states to actions. The agent then makes an observation from the resulting state of the
world. The state estimator is responsible for updating the belief state based on the last action
executed, the current observation, and the previous belief state. From the updated belief state,
the policy decides on the next action to execute. This decision cycle continues repeating until
the agent has reached its goal.
4.2.2.1 Computing the Belief State
A belief state, b, is a probability distribution over S. b(s) denotes the probability assigned to
some world state s according to the distribution of belief state b. The axioms of probability
require that:
26
• 0 ≤ b(s) ≤ 1 for all s ∈ S, and
• Σs∈S b(s) = 1.
Given the old belief state b, an action a, and an observation o, the state estimator must update a
new belief state b’. The new degree of belief in some state s’, b’(s’), can be obtained using
Bayes’ Rule and basic probability theory as follows:
′ b ( ′ s ) = P( ′ s | o,a,b)
=P(o | ′ s ,a,b)P( ′ s | a,b)
P(o | a,b)
=P(o | ′ s ,a) P( ′ s | a,b,s)P(s | a,b)s∈S∑
P(o | a,b)
=Z( ′ s ,a,o) T(s,a, ′ s )b(s)s∈S∑
P(o | a,b)
(4.1)
where T(s,a,s’) is the transition probability and Z(s’,a,o) is the observation probability. The
denominator, P(o|a,b), is independent of s’ and can be treated as a normalising factor to cause
b’(s’) to sum to 1.
4.2.3 Finding the Optimal Policy: Value Iteration
The calculation of the value function, namely the expected sum of discounted rewards that the
POMDP agent will earn when starting in a belief state b, allows the agent to decide on what
action to choose next. When making a decision, the agent must take into account the future
implications of its current action since current actions influence the future belief state of the
world, and in turn, future actions, observations, and rewards. In order to do this, the agent must
have a preference or utility over the actions available. The goal of the POMDP agent is to
maximise the cumulative reward possible and therefore, will prefer courses of actions that will
net the agent the highest expected reward. Even though the agent may have many actions to
choose from at every possible state, there will be at least one that will have the greatest expected
utility than the rest.
For an agent that has one step remaining, all it can do is take a single action. With two steps to
go, it can execute an action, receive an observation, and then execute a final action. In general,
an agent’s non-stationary n-step policy can be represented by a policy tree, p, as shown in Figure
27
4.3. The top node determines the first action to take. Then, depending on the resulting
observation, an arc is followed to a node on the next level, which determines the next action.
Figure 4.3: A n-step policy tree
In the simplest case, p is a 1-step policy tree (i.e. a single action). The value function is simply
the reward gained by the agent by executing that action in its present state:
Vp(s) = R(s,a( p)) (4.1)
where a(p) is the action specified in the top node of p. In general, if p is a n-step policy tree, the
value function becomes:
Vp(s) = R(s,a(p)) + β (Expected value of the future)
= R(s,a(p)) + β P( ′ s | s,a( p))′ s ∈S∑ P(oi | ′ s ,a(p))Voi (p)( ′ s )oi ∈O∑
= R(s,a(p)) + β T( ′ s ,a(p),s)′ s ∈S∑ Z( ′ s ,a( p),oi )Voi (p)( ′ s )oi ∈O∑
(4.2)
28
where oi(p) is the (n-1)-step policy subtree associated with observation oi at the top level of a n-
step policy tree p, R(s,a(p)) is the reward incurred after performing action a(p) in state s, β is the
discount factor, T(s,a(p),s’) is the transition probability, Z(s’,a,oi) is the observation probability,
and Voi (p) ( ′ s ) is the value function for being in state s’. Since the agent already knows the value
function of p for the future state s’, as well as the reward, transition, and observation functions,
the agent can calculate Vp(s).
However, most applications in the real world do not have a bound on the number of time steps
available (i.e. have an infinite horizon), and thus, the inclusion of the discount factor β will
eventually cause the solution of the value function to converge. Due to the nature of β, Richard
Bellman proved that as n→∞, Vp→Voi (p) . Thus, iteration of Equation 4.2 converges to the value
function:
Vp(s) = R(s,a(p)) + β T( ′ s ,a(p),s)′ s ∈S∑ Z( ′ s ,a(p),oi )V(p) (s)oi ∈O∑ (4.3)
which is referred to as the Bellman equation.
Since the agent will never know the exact state of the world, it must be able to determine the
value of executing p from some belief state b. This is just an expectation over world states of
executing p in each state:
Vp(b) = b(s)Vp(s)s∈S∑ (4.4)
Equation 4.4 is the value of executing p in every possible belief state. However, to find the
optimal value function, it is necessary to execute different policy trees from different initial
belief states. Let P be the finite set of all policy trees. Thus, the optimal value function for b can
be defined as:
(4.5) V *(b) = maxp∈P
b(s)Vp(s)s∈S∑
The actions that maximise the optimal value function (Equation 4.5) give the optimal courses of
actions to take, and is known as the optimal policy, π*(b). This policy maps belief states to
actions, which defines what action the agent should choose in a particular belief state b.
29
Unfortunately, the use of POMDPs in real-world systems remains limited due to the intractability
of the solution algorithms for finding an optimal policy. This had led researchers to develop
methods to deal with the complexity of policy spaces and the large number of states that exist in
a POMDP model. Several other optimal and approximate POMDP solutions are discussed in
Lovejoy (1991) and Poupart (2005).
4.3 Examples of POMDPs in Real-World Applications An increasing number of researchers in various fields are becoming interested in the application
of POMDPs because they have been showing promise in solving real-world problems.
Researchers at Carnegie Mellon University used a POMDP to model the high-level controller for
an intelligent robot, Nursebot, designed to assist elderly individuals with mild cognitive and
physical impairments in their daily activities such as taking medications, attending appointments,
eating, drinking, bathing, and toileting (Pineau, Montemerlo, Pollack, Roy, & Thrun, 2003).
Using variables such as the robot’s location, the user’s location, and the user’s status, the robot
would decide whether to take an action to provide the user a reminder or to guide the user where
to move. By maintaining an accurate model of the user’s daily plans and tracking their execution
of the plans by observation, the robot could adapt to the user’s behaviour and make decisions
about whether and when it was most appropriate to issue reminders. For example, if the user
must be reminded to take their medication every three hours and the next reminder is scheduled
for 13:30 during the user’s favourite television program, the robot might schedule a reminder at
13:25 so as to not interrupt them during the show.
A POMDP model was also used in a guidance system to assist persons with dementia during the
handwashing task (Hoey, von Bertoldi, Poupart, & Mihailidis, 2007). By tracking the positions
of the user’s hands and towel with a camera mounted above the sink, the system could estimate
the progress of the user during the handwashing task and provide assistance with the next step, if
needed. Assistance was given in the form of verbal and/or visual prompts, or through the
enlistment of a human caregiver’s help. An important feature of this system is the ability to
estimate and adapt to user states such as awareness, responsiveness, and overall dementia level
which affect the amount of assistance given to the user during the handwashing activity.
30
Givon and Grosfeld-Nir (2008) developed a POMDP system that could optimally control the
running of television shows on a broadcasting network. In this application, the (uncertain) state
of a show, which could either be “good” (i.e. it should be continued) or “bad” (i.e. it should be
changed), was inferred from the partial observability of the show’s ratings (i.e. the sampled
proportion of households tuned in to their channel at their time slots). By knowing the transition
probabilities of the ratings and maximising the expected value of profits from selling advertising
time, the system could either take the action of continuing to watch the show or changing to a
different show.
4.4 Justification for using a POMDP to Model Reaching Rehabilitation
Classical planning generally consists of agents which operate in environments that are fully
observable, deterministic, static, and discrete. Although these techniques can solve increasingly
large state-space problems, they are not suitable for most robotic applications, such as the
reaching task in upper-limb rehabilitation, as they usually have partial observability, stochastic
actions, and dynamic environments (Pineau, Gordon, & Thrun, 2006). Planning under
uncertainty aims to improve robustness by factoring in the types of uncertainties that can occur.
POMDPs are perhaps the most general representation for (single-agent) planning under
uncertainty. It surpasses other techniques in terms of representational power because it can
combine many important aspects for planning under uncertainty (Pineau et al., 2006) as
described below.
In reality, the state of the world cannot be known with certainty due to inaccurate measurements
of noisy and imperfect sensors, or instances where observations may be impossible and
inferences must be made. POMDPs can handle this uncertainty in state observability by
expressing the state of the world as a belief state – the probability distribution over all possible
states of the world – rather than actual world states. By capturing this uncertainty in the model,
the POMDP has the ability to make better decisions than fully observable techniques. For
example, the reaching rehabilitation system does not consist of physical sensors that can detect
user fatigue. By capturing observations in user compensation and control, POMDPs can use this
information to infer or estimate how fatigued the user is. Fully observable methods cannot
31
capture user fatigue in this way since it is impossible to observe fatigue. The only way for fully
observable techniques to work is to physically capture information about fatigue, such as using
electrical stimulation to measure muscle contractions (Dobkin, 2008). However, these
techniques are invasive and may not even guarantee full observability of the world state since
sensor measurements may be inaccurate.
The reaching exercise is a stochastic (dynamic) decision problem where there is uncertainty in
the outcome of actions and the environment is always changing. Thus, choosing a particular
action at a particular state does not always produce the same results. Instead, the action has a
random chance of producing a specific result with a known probability. POMDPs can account
for the realistic uncertainty of action effects into the decision process through its transition
probabilities and reward function. By knowing the probabilities and rewards of the outcomes of
taking an action in a specific state, the POMDP agent can estimate the likelihood of future
outcomes to determine the optimal course of action to take in the present. This ability to
consider the future effects of current actions or to “look ahead” allows the POMDP to trade off
between alternative ways to satisfy a goal and plan for multiple interacting goals. It also allows
the agent to build a policy (prescribing the choice of action for every possible belief state) that is
capable of handling unexpected outcomes far better than many classical planners.
Different stroke patients progress in different ways during rehabilitation depending on their
ability and state of health. It is imperative that the rehabilitation system is able to tailor and
adapt to each individual’s needs and abilities over time. POMDPs have the capability of
incorporating user abilities autonomously in real-time by keeping track of which actions have
been observed to be the most effective in the past. For example, the POMDP may decide to keep
the target location at d1 for a longer period of time for patients who are progressing slowly, but
may increase the target location to d2 or d3 at a quicker rate for those who are progressing faster
(see Sections 5.2.1 and 5.3.1 for a description of the variables used in the reaching exercise).
Since one of the objectives of a rehabilitation robotic system is to reduce health care costs by
having one therapist supervise multiple stroke patients simultaneously, it is imperative to design
the system in which no or very little explicit feedback from the therapist is required during the
therapy session. The system must be able to effectively guide the patient during the reaching
exercise without the need for explicit input (e.g. a button press to set a particular resistance
32
level), as any direct input from the therapist would be time consuming and prevent the user from
intensive repetition. POMDPs have this ability to operate autonomously through the estimation
of states and then automatically making decisions. For eventually practising therapy in the home
setting, it is especially important that the system does not require any explicit feedback since no
therapist will be present.
33
Chapter 5 Design of the POMDP Reaching Exercise Model
5 Design of the POMDP Reaching Exercise Model The following section discusses the development of the POMDP model used in this thesis.
5.1 Requirements Specification
5.1.1 Definition of the Reaching Exercise
Discussions with a team of experienced OTs and PTs at the Toronto Rehabilitation Institute
(TRI) identified that early-to-moderate stage exercises for the upper-limb in stroke patients,
specifically the reaching motion, is an area of rehabilitation that is in need of more efficient
tools. Moreover, reaching is one of the most important abilities to possess, as it is the basic
motion used for many activities of daily living. Hence, this project focused on delivering
reaching motion therapy.
The specific reaching exercise chosen for this thesis was based on a common task delivered by
the OTs at TRI. This task involved a seated patient placing his/her hand (of the affected upper
limb) palm-down on the flat surface of a four-legged stool, and pushing into the surface of the
stool. For the duration of the exercise, the therapist ensured that the patient’s trunk was not
rotated, the shoulder was not elevated, and the elbow stayed in the saggital plane (aligned with
the shoulder). The goal of this exercise was to rock the stool straight forward as far as possible
on its two front legs and bring it straight back in a controlled manner, ensuring that the patient’s
trunk, shoulder, and elbow stayed in the correct position. The therapist would also apply
increasing resistance as the patient showed signs of improvement (e.g. reached further).
The reaching exercise can be summarised as a targeted, load-bearing, forward-reaching motion
in which the patient must ensure proper posture and control at all times. Figure 5.1 provides a
basic overview of the reaching exercise. The reaching exercise begins with forward flexion of
the shoulder, and extension of the elbow and wrist (Figure 5.1a). Weight is translated through
34
the heel of the hand as it is pushed forward in the direction indicated by the arrow, until it
reaches the final position (Figure 5.1b). The reaching exercise occurs in the saggital plane, and
the return path brings the arm back to the initial position. It is important to note that a proper
reaching exercise is performed with control and without compensation.
Figure 5.1: The reaching exercise from the initial position (a) to the final position (b) (© Lam, 2007 – use of picture is by permission of the copyright holder)
Adaptive or compensatory movements are usually present when persons with an affected upper-
limb attempt to carry out a voluntary reaching action. These patterns of adaptive movement are
due to muscle weakness, loss of interjoint coordination (between elbow and shoulder joints), and
lack of joint and muscle flexibility as a result of soft tissue length changes and increased muscle
stiffness (Carr & Shepherd, 2003). Typical examples of compensatory strategies during the
reaching motion are (Carr & Shepherd, 2003; Gillen & Burkhardt, 2004):
• flexion of the upper body at the hips instead of shoulder flexion
• lateral flexion and rotation of the trunk
• shoulder elevation
35
• abduction of the shoulder with elbow flexion
• internal rotation of the shoulder
• pronation of forearm
When a stroke patient encounters the motor deficits mentioned above, they have difficulty
making smooth, continuous, and accurate reaching movements (Cirstea & Levin, 2000). Stroke
patients tend to produce reaching trajectories that lack smoothness and continuity by constantly
changing the hand direction and reaching the target in a series of small sequential movements.
Stroke patients are also inclined to have higher variability along the trajectory (i.e. tend to
deviate from the straight path) and lower accuracy in reaching the target compared to able-
bodied persons. In addition, reaching movements are generally slower (Cirstea & Levin, 2000).
Therapists usually apply resistive forces (to emulate load- or weight-bearing) during the reaching
exercise to strengthen the triceps and scapula musculature, which will help to provide postural
support and anchoring for other body movements (Gillen & Burkhardt, 2004), such as pushing
down on a chair to stand up or using stair handrails for support.
There are two types of fatigue in the rehabilitation literature: 1) objective, which corresponds to
the observable and measurable decrement in performance during the repetition of a physical or
mental task; and 2) subjective, which refers to the feeling of exhaustion, weariness, and aversion
to effort caused by the neurological impairment (Dobkin, 2008). Fatigue is a very common
occurrence in post-stroke rehabilitation that can affect functional abilities and thus, the
rehabilitation progress (Barnes et al., 2005). Therefore, it is important for therapists to monitor
stroke patients for signs of fatigue during rehabilitation since the more fatigued the patient is, the
more slowly they will improve. Some signs of fatigue are an (unconscious) increase in
compensatory movements, lack of control, slowing in the pace of movement, or a disengagement
from the activity (Dobkin, 2008).
The general progression during conventional reaching rehabilitation is to gradually increase
target distance, and then to increase the resistance level (D. Hébert, personal communication,
August 10, 2007). If patients are showing signs of fatigue during the exercise, therapists will
typically let patients rest for a few minutes and then continue with the therapy session. The goal
36
is to have patients successfully reach the furthest target distance at the maximum resistance level,
while performing the exercise with control and proper posture.
This thesis solely focused on compensatory strategies, control, and time to provide clues about
fatigue as described above. The signs of compensation were defined as trunk rotation and
flexion, and shoulder abduction and internal rotation; control was defined as the amount of
deviation along the reaching trajectory; and time was used to calculate the duration of reaching
movements. Although motivation, or a lack thereof, can also be a sign of user fatigue, it was not
included in this thesis as it would have resulted in the addition of another unobservable variable,
further complicating the POMDP model.
In the following sections and in the rest of the thesis, a trial is defined as one repetition of the
reaching exercise from the initial position (Figure 5.1a) to the final position (Figure 5.1b), and
then back to the initial position.
5.1.2 Development of the Robotic System
A novel robotic system was designed to automate the reaching exercise as well as to capture any
compensatory events. The system is comprised of three main components: the robotic device,
which emulates the load-bearing reaching exercise with haptic feedback, the postural sensors,
which identify abnormalities in the upper extremities during the exercise, and the virtual
environment, which provides the user with visual feedback of the exercise on a computer
monitor.
Figure 5.2 shows the actual diagram of the rehabilitation device. It features a non-restraining
platform for better usability and freedom of movement, and has two active and two passive
DOFs, which allow the reaching exercise to be performed in 3D space (Lam et al., 2008). The
robotic device also incorporates haptic technology, which provides feedback through sense of
touch. Haptic refers to the modality of touch and the sensation of shape and texture of virtual
objects (McLaughlin, Hespanha, & Sukhatme, 2001). For the purpose of this thesis, the haptic
device provided resistance and boundary guidance for the user during the exercise, which was
performed only in 2D space (in the horizontal plane parallel to the floor).
37
Figure 5.2: Actual diagram of rehabilitation robot (A) with end-effector (B)
Rotational encoders in the end-effector of the robotic device provided data to indicate shoulder
abduction and internal rotation of the user during the exercise (Lam et al., 2008). These
compensatory strategies affect the rotational hand position on the end-effector. By monitoring
the rotational axis of the end-effector, it was deduced when the user was experiencing shoulder
38
abduction and internal rotation, as these two movements are usually coupled together (Lam et al.,
2008).
Unobtrusive trunk sensors, as seen in Figure 5.3, provided data to indicate trunk rotation and
flexion (Lam et al., 2008). The trunk sensors are comprised of three photo-sensitive resistors
that are taped to the back of a chair, each in one of three locations: the lower back, lower left
scapula, and lower right scapula (Figure 5.3a). They are placed in these locations to distinguish
between left and right rotation and more severe flexion when the lower back is displaced (Lam et
al., 2008). The detection of light during the exercise indicated trunk rotation and flexion, as it
meant a gap was present between the chair and the user (Figure 5.3b).
Figure 5.3: Trunk photoresistor sensors placed in three locations: lower back, lower left scapula, and lower right scapula (a) (© Lam, 2007 – use of picture is by permission of the copyright holder) and its detection of light (b)
Lastly, the VE provided the user with visual feedback on target location and hand position during
the reaching exercise. Figure 5.4 shows a close up diagram of the VE. The reaching exercise
was disguised in the form of a 2D bull’s eye game. The goal of the game was for the user to
move the robot’s end-effector, which corresponds to the cross-tracker in the VE, to the bull’s eye
39
target. The rectangular box is the virtual (haptic) boundary, which kept the cross-tracker within
those walls during the exercise.
Figure 5.4: Virtual environment
For more details on the development of the robotic system, refer to Lam (2007).
5.1.3 Definition of the POMDP Model
In order for an AI system to operate effectively, the state of the world must be represented in a
way that accurately describes the task of interest to the system. The more accurately this
information is captured, the more successful the system will operate. The POMDP reaching
exercise model was constructed by:
40
• defining variables used to describe the state of the rehabilitation environment,
• determining appropriate actions for the system to take, and
• estimating the dynamics of the environment.
Two versions of the POMDP reaching exercise model were constructed during this thesis. The
first model, named STroke REhabilitatioN Guidance Tool in Haptic ENvironment
(STRENGTHEN), was performing fairly well, but made some decisions and estimated some
variables that were not aligned with those of conventional reaching therapy as described
previously in Section 5.1.1 (i.e. increase target distance first, then resistance level; and increase
the level of fatigue when signs were evident). A great deal of time was spent in modifying the
transition probabilities as an attempt to correct these problems. The second model, named
intelligent STroke Rehabilitation Exericse TeCHnology (iSTRETCH), attempted to improve the
performance of the first model as well as to find a more efficient way to modify the model and
its variables by eliminating the need of explicitly specifying every transition probability. This
was done by first identifying the underlying structure from the first model and then representing
it as a basic parametric form, the sigmoid function. In the end, iSTRETCH seemed to be
performing more in line to that of conventional reaching rehabilitation compared to
STRENGTHEN.
The remaining section presents the methods used to define both models, followed by a
comparison in performance of the two models.
5.2 STRENGTHEN Model The STRENGTHEN model was the first of two models developed during this thesis and can be
seen in Figure 5.5 as a dynamic Bayesian network (DBN). The model is described in further
detail below.
41
Figure 5.5: STRENGTHEN (POMDP) model as a DBN. It consists of the state, S, represented by a combination of ten variables; the actions, A; the observations, O; the reward function, R; and the dynamics, represented by the arrows. Variables at the next time step, t+1, are denoted with an apostrophe (e.g. hat’).
5.2.1 Definition of the Variables
The system was modelled as a discrete POMDP. Variables were chosen to meaningfully capture
the aspects of the reaching task that the system would require in order to effectively guide a
stroke patient during the exercise. A state is represented by a combination of instantiations of
42
each variable. All possible unique combinations make up the state space, which is every
possible state the system could be in. The following ten variables were chosen to appropriately
represent the exercise based on discussions with the OTs and PTs at TRI. The variable name is
shown in bold with its short form in bold parentheses. The different possible instantiations of
the variables are shown in braces, followed by a description of what the variable represents.
1. target distance (d) : {d1, d2, d3}
Denotes the locations of the targets, which are positioned only along the straight, saggital
path. The targets are separated by equal distances from each other, where d=d1 is the
closest to the starting position and d=d3 is the furthest.
2. resistance level (r) : {none, min, max}
Denotes the level of resistance applied during the exercise, where r=none has a force of 0
Newtons (N), r=min has a force of 1 N, and r=max has a force of 3 N.
3. hand-at-target (hat) : {yes, no}
Indicates whether the user’s hand on the robot’s end-effector has reached the target or
not.
4. fatigue (fat) : {yes, no}
Indicates whether the user is fatigued or not.
5.-7. user’s range at a particular resistance level (n(r)) : {none, d1, d2, d3}
Denotes the range or ability of the user, which depends on the target distance and
resistance level as shown in the following:
• user’s range at zero resistance (n(none)) : {none, d1, d2, d3}
• user’s range at minimum resistance (n(min)) : {none, d1, d2, d3}
• user’s range at maximum resistance (n(max)) : {none, d1, d2, d3}
43
The range is determined by the furthest target distance the user is able to reach at a
particular resistance level. For example, if r=min and the furthest target distance the user
can reach is d=d1, then the user’s range is n(min)=d1.
8. time-to-target (ttt) : {none, slow, norm}
Denotes the time it takes the user to reach the target from the starting position. Note that
ttt=none indicates that the user has failed to reach the target.
9. control (ctrl) : {none, min, max}
Indicates the user’s control level by their ability to keep on the straight path, from the
starting position to the target.
10. compensation (comp) : {yes, no}
Indicates any compensatory action (improper posture) in which the user performs during
the exercise. Signs of compensation are trunk rotation and flexion, and shoulder
abduction and internal rotation.
5.2.2 Definition of the Actions
There are ten possible actions the system can take. These are comprised of nine actions of which
each is a different combination of setting a target distance (three values) and resistance level
(three values), and one action to stop the exercise when the user is fatigued. Specifically, these
actions are:
1. setd1resnone (sets d=d1 and r=none)
2. setd2resnone (sets d=d2 and r=none)
3. setd3resnone (sets d=d3 and r=none)
4. setd1resmin (sets d=d1 and r=min)
5. setd2resmin (sets d=d2 and r=min)
6. setd3resmin (sets d=d3 and r=min)
44
7. setd1resmax (sets d=d1 and r=max)
8. setd2resmax (sets d=d2 and r=max)
9. setd3resmax (sets d=d3 and r=max)
10. stop (terminates the exercise)
5.2.3 Definition of the Observation Variables and Observation Function
The system has four observation variables that are fully observable, thus, making the observation
function deterministic (i.e. certain probability of 1.0). The observation variables OH={yes, no},
OT={none, slow, norm}, OCT={none, min, max}, and OCO={yes, no} correspond to the state
variables hat, ttt, ctrl, and comp, respectively. In other words, the state variables are actually the
observation variables. For example, if the system observed that the user reached the target
(OH=yes) in normal time (OT=norm) with maximum control (OCT=max) and no compensation
(OCO=no), then the state variables would be hat=yes, ttt=norm, ctrl=max, and comp=no with a
probability of 1.0. However, although the observations are fully observable, the states are still
not known with certainty since both the fatigue and user range variables are unobservable and
must be estimated.
5.2.4 Definition of the Transition Function
The transition function of the system was determined by first defining the interaction between
the different variables, namely how each variable affects other variables at the next time step,
and then estimating the probability of those effects.
The general dynamics of the reaching exercise variables are described in the Table 5.1. These
dynamics were discussed with the OTs at TRI.
45
Table 5.1: Description of the variable dynamics in the reaching exercise
Variable(s) Description of Dynamics
d and r • deterministically set by the action
fat • increasing the target distance and resistance level beyond the user’s current
range will increase the rate of the user becoming fatigued; in turn, setting d
and r at or below the current range will decrease the rate of the user becoming
fatigued
• the fatigue level of the user will slowly increase due to time passing (i.e.
repetition of the exercise)
hat • increasing the target distance and resistance level beyond the user’s current
range will increase the likelihood of the user failing to reach the target; in turn,
setting d and r at or below the current range will increase the chance of the
user successfully reaching the target
• the more fatigued the user is, the less likely the user will be able to reach the
target, and vice versa
n(r) • increasing the target distance and resistance level at or just beyond the user’s
current range will cause their range to slowly increase; for example, if the
user’s range is at d=d3 at a particular distance, then practicing at that distance
and resistance level will cause their range to increase at the next higher
resistance level from none to d1
• the range of the user will slowly increase due to time passing
ttt, ctrl, and
comp
• increasing the target distance and resistance level beyond the user’s current
range will increase the probability that the user will take longer to reach the
target, have less control, and display compensatory strategies; in turn, setting d
and r at or below the current range will increase the chance of the user
46
reaching the target in normal time, with maximum control, and no
compensation
• the more fatigued the user is, the less likely the user will be able to reach the
target in normal time, with maximum control, and without compensation, and
vice versa
The descriptive dynamic relationships listed above are represented as the arrows shown in Figure
5.1 for all actions except stop. The stop action just resets the fat variable to its initial distribution
(fat=yes with probability 0.05 and fat=no with probability 0.95), and the ranges are carried
forward. It is assumed that after resting, the user is no longer fatigued and the ranges are kept
the same.
The transition probabilities for the interactions captured in the DBNs were estimated and then
specified in conditional probability tables (CPTs). A CPT describes the probability of each value
of a variable occurring at t+1 for a specific action, given the values of influencing variables at the
prior time step, t. It is not feasible to list out the entire CPT for this POMDP model as it is very
large containing approximately 678 probabilities per action, thus, resulting in a total of 6,780
probabilities. However, instead of listing all the probabilities, an example of how a CPT is
constructed can be seen in Appendix I.
5.2.5 Definition of the Reward Function
The reward function was constructed to motivate the system to guide the user to exercise at the
maximum target distance and resistance level, with maximum control and no compensation. As
such, the system was given a large reward for getting the user to reach the furthest target distance
(d=d3) at maximum resistance (r=max). Rewards were also given when the user reached the
target in normal time, with maximum control, and without compensation. However, none was
given if ttt=none, ctrl=none, and comp=yes. The system also did not get rewarded (or very little)
for setting target distances and resistance levels less than the user’s range as this would hinder
the progress of the system towards the end. A penalty was given when the user became fatigued.
This penalty was assigned so that the system would set the target and resistance at a level where
47
the user had a chance of reaching with little fatigue, rather than simply setting the target and
resistance at the maximum value where the likelihood of being fatigued would be high. Table
5.2 shows the reward function in the final version of the STRENGTHEN model, with positive
values considered a reward and negative values considered a cost.
Table 5.2: Reward function for STRENGTHEN model
Aspect Definition Reward Value
r=none 1
r=min 15
Larger rewards were given for
setting r higher
r=max 250
d=d1 1
d=d2 5
Larger rewards were given for
setting d higher
d=d3 10
hat=yes 1 Reward was given when user
reached target hat=no 0
ttt=none 0
ttt=slow 0.5
Larger rewards were given when
user reached target in normal time
ttt=norm 1
ctrl=none 0
ctrl=min 1
Larger rewards were given when
user had control
ctrl=max 2
48
comp=yes 0 Reward was given when user did not
compensate comp=no 2
fat=yes -14 Penalty was given for user being
fatigued fat=no 0
• r=none; d=d1; n(none)=none
• r=min; d=d1; n(min)=none
• r=max; d=d1; n(max)=none
1
• r=none; d=d1; n(none)=d1
• r=min; d=d1; n(min)=d1
• r=max; d=d1; n(max)=d1
0.4
• r=none; d=d1; n(none)=d2
• r=min; d=d1; n(min)=d2
• r=max; d=d1; n(max)=d2
0.1
• r=none; d=d1; n(none)=d3
• r=min; d=d1; n(min)=d3
• r=max; d=d1; n(max)=d3
0
Did not get rewarded (or very little)
for setting d and r less than n(r)
• r=none; d=d2; n(none)=none
• r=min; d=d2; n(min)=none
• r=max; d=d2; n(max)=none
1
49
• r=none; d=d2; n(none)=d1
• r=min; d=d2; n(min)=d1
• r=max; d=d2; n(max)=d1
1
• r=none; d=d2; n(none)=d2
• r=min; d=d2; n(min)=d2
• r=max; d=d2; n(max)=d2
0.4
• r=none; d=d2; n(none)=d3
• r=min; d=d2; n(min)=d3
• r=max; d=d2; n(max)=d3
0.1
• r=none; d=d3; n(none)=none
• r=min; d=d3; n(min)=none
• r=max; d=d3; n(max)=none
1
• r=none; d=d3; n(none)=d1
• r=min; d=d3; n(min)=d1
• r=max; d=d3; n(max)=d1
1
• r=none; d=d3; n(none)=d2
• r=min; d=d3; n(min)=d2
• r=max; d=d3; n(max)=d2
1
• r=none; d=d3; n(none)=d3 0.4
50
• r=min; d=d3; n(min)=d3
• r=max; d=d3; n(max)=d3
5.2.6 Computation of the STRENGTHEN Model
After the dynamics of the POMDP model were defined, the model had to be solved. Solving the
model results in a fixed policy, which describes what action the agent should take at each state.
5.2.6.1 Selection of the Solution Method
Unique combinations of instantiations of the variables represent all the different possible states
of the rehabilitation exercise that the system can be in. For this model, there were 41,472
possible states.
There are several algorithmic methods for finding optimal POMDP policies as discussed in
Lovejoy (1991). However, the size of the rehabilitation exercise renders optimal solutions
impossible. The two sources of intractability that plague classic algorithms are:
• complex value function and policy representations, where the number of α-vectors
(representing value functions) may grow exponentially with the observation space and
double exponentially with the horizon, and
• a large state space, where the number of states is exponential in the number of state
variables (recall that each state corresponds to a unique combination of variables)
(Poupart, 2005).
Hence, approximations had to be used to solve the model. There are several methods proposed
in the literature that compactly represent the complexity of policy and value function spaces with
a small number of α-vectors, and others that exploit the POMDP structure to mitigate the
complexity of state spaces (Poupart, 2005). However, none of these techniques address both
sources of intractability simultaneously and as a result, they cannot solve much larger or more
difficult POMDPs.
51
The method chosen to solve this POMDP model can effectively overcome both sources of
intractability concurrently and is based on an algorithm called symbolic Perseus. Developed by
Poupart et al. at the University of Toronto, this technique is able to efficiently exploit the
structure of large POMDPs by first representing the model as an algebraic decision diagram
(ADD), and then employing a randomised point-based value iteration algorithm using the ADDs
to solve the model (Poupart, 2005). This point-based value iteration is based on the Perseus
algorithm developed by Spaan et al. for flat POMDPs (Spaan & Vlassis, 2005).
ADDs are able to compactly represent the dynamics and rewards of the POMDP model by
exploiting their regularities. This can be leveraged by determining the conditional
independencies between variables in the DBN and the additive separability of the reward
function. Conditional independence refers to the fact that some variables are probabilistically
independent of each other when the values of other variables are held fixed. This feature
contributes to the reduction of the size of the CPT. Additive separability refers to the fact that
reward functions often decompose into the sum of smaller reward functions, resulting in further
space reduction (Poupart, 2005). ADDs can essentially group these regularities together, as
opposed to explicitly representing them, and hence, can translate into substantial savings in
computational time. A review of ADDs can be found in Hoey, St-Aubin, Hu, and Boutilier
(1999).
The model can now be solved by applying the Perseus method to the ADDs (Poupart, 2005).
However, the concept of α-vectors must be explained before describing the Perseus method.
The value iteration algorithm for solving optimal policies defined in Section 4.2.3 is intractable
for most application problems. Sondik (1971) showed that the value function at any finite
horizon, n, is piecewise linear and convex (PWLC) and can be expressed by a set of vectors:
Γn={α0, α1, …, αm}. Each α-vector represents an |S|-dimensional hyper-plane, and defines the
value function over a bounded region of the belief:
Vn (b) = maxα∈Γn
α(s)b(s)s∈S∑ . (5.1)
In addition, each α-vector maximises the value function in a certain region of the belief and has
an action associated with it, which is the optimal action to take at that particular belief region.
Thus, the optimal value function at horizon n, Vn*(b), is represented by the upper surface of the
52
α-vectors in Γn as shown in an illustrative example in Figure 5.6. For infinite horizon problems
such as the reaching exercise, V*(b) can be approximated well by bounding the number of α-
vectors as this only causes minimal decrease in the quality of the solution (Hoey et al., 2007).
Figure 5.6: Example of an optimal value function in a two-state POMDP. The belief space is a one-dimensional vector of two non-negative numbers that sum to 1 [b(s0) = P(s0) = 1-P(s1)]. The x-axis, therefore, represents the whole belief space on which the value function Vn(b) is defined. The upper surface of the three α-vectors is the optimal value function, Vn*(b), which defines the optimal action to take in a particular belief state. At the belief state, b, the action associated with α2 should be taken.
The Perseus algorithm starts by first collecting a set of reachable belief states (B) by performing
a forward search from the initial belief state. This is done by executing a random policy by
sampling the actions and observations at each step. Then, value iteration is performed on the set
of collected belief points ensuring that in each backup stage the value of each point in the belief
set is improved. A pictorial example of a backup stage is presented in Figure 5.7. The key
feature of this algorithm is that a single backup may improve the value of many points in the set,
53
allowing value functions to be computed with only a small number of α-vectors (relative to the
belief set size) and thus, leading to faster computation time.
Figure 5.7: Example of a Perseus backup stage in a two-state POMDP. The x-axis represents the belief space and the y-axis represents V(b). Solid lines are the α-vectors from the current stage and dashed lines are the α-vectors from the previous stage. There are seven belief states {b1,…,b7} which comprise the set of reachable belief points (B) indicated by the tick marks. The backup stage computing Vn+1 from Vn proceeds as follows: (a) the value function at stage n; (b) the computation of Vn+1 starts by sampling b6, which produces an α-vector that improves the values of b6 and b7; (c) b3 is then sampled, which produces an α-vector that improves the values of b1 through b5; and (d) the values of all b ∈ B has improve and thus, the backup stage at n+1 is completed (© AI Access Foundation, 2005 – use of picture is by permission of the copyright holder).
5.2.6.2 Iteration Process and Solving the Model
Solving the model created the policy, which maps belief states into actions. Akin to a lookup
table, the policy determines which action the agent should take next when it is in a particular
54
state. The POMDP designed for the reaching task was solved using symbolic Perseus as
described in Section 5.2.6.1.
The complete model was built up through many iterations. At each iteration, new variables were
added, variable dynamics were changed, transition probabilities were adjusted, or the reward
function was modified, and the resulting partial model was solved. This allowed for debugging
of the model dynamics through the analysis of the resulting policies of the partial models.
There were a total of 63 partial versions of the model before the complete model was
successfully solved.
The final model was sampled with a set of 3,000 belief points that was generated from 20
different initial belief states: one for every range possibility. It had a discount factor (β) of 0.95
and was solved with MATLAB® on a dual AMD Opteron™ (2.4GHz) CPU using 100 α-vectors
and 150 iterations in approximately 5.18 hours.
5.3 iSTRETCH Model The iSTRETCH model was the second of two models developed during this thesis and can be
seen in Figure 5.8. It was evolved from the STRENGTHEN model. A detailed description of
the iSTRETCH POMDP model is discussed below.
55
Figure 5.8: iSTRETCH (POMDP) model as a DBN. It consists of the state, S, represented by a combination of nine variables; the actions, A; the observations, O; the reward function, R; and the dynamics, represented by the arrows.
5.3.1 Definition of the Variables
From the STRENGTHEN model, it was realised that the variables d and r could be represented
by one variable, stretch, that captured the difference between the target distance set by the action
and the user’s range at the resistance set by the action.
56
1. stretch beyond user’s range (stretch) : {+9, +8, +7, +6, +5, +4, +3, +2, +1, 0, -1, -2}
Indicates the amount the system is asking the user to go beyond their current range. For
example, if the user’s range is n(min)=d1, then setting the target at d=d2 at resistance
r=min is a stretch of 1.0, while setting the target at d=d1 at resistance r=max is a stretch
of 3.0. Note that stretch is a direct function of both target distance, d={d1, d2, d3}, and
resistance level, r={none, min, max}: it is a joint measure of how much a particular
distance and resistance are going to push a user beyond their range.
The variables hat and ttt from the STRENGTHEN model were combined into just ttt since hat
was really subsumed by ttt. If ttt=slow or ttt=norm, this implies that the user has reached the
target (hat=yes), and ttt=none indicates that the user did not reach the target (hat=no). In the
STRENGTHEN model, the combination of hat=yes and ttt=none were not reachable, meaning
that these states were impossible to achieve. Recognising these unreachable states allowed for a
more compact representation of the model in the iSTRETCH version.
2. time-to-target (ttt) : {none, slow, norm}
Denotes the time it takes the user to reach the target from the starting position. Note that
ttt=none indicates that the user has failed to reach the target.
A new variable, learnrate, was added to the iSTRETCH model to estimate how quickly the user
progresses during the reaching exercise. In the previous model, this estimation was hard-coded in
the model.
3. learning rate (learnrate) : {lo, med, hi}
Indicates how quickly the user is progressing.
The remaining six variables are the same as in the STRENGTHEN model.
4. fatigue (fat) : {yes, no}
Indicates whether the user is fatigued or not.
57
5.-7. user’s range at a particular resistance level (n(r)) : {none, d1, d2, d3}
Denotes the range or ability of the user, which depends on the target distance, d={d1, d2,
d3}, and resistance level, r={none, min, max}, as shown in the following:
• user’s range at zero resistance (n(none)) : {none, d1, d2, d3}
• user’s range at minimum resistance (n(min)) : {none, d1, d2, d3}
• user’s range at maximum resistance (n(max)) : {none, d1, d2, d3}
The range is determined by the furthest target distance the user is able to reach at a
particular resistance level. For example, if r=min and the furthest target distance the user
can reach is d=d1, then the user’s range is n(min)=d1.
8. control (ctrl) : {none, min, max}
Indicates the user’s control level by their ability to keep on the straight path, from the
starting position to the target.
9. compensation (comp) : {yes, no}
Indicates any compensatory action (improper posture) in which the user performs during
the exercise. Signs of compensation are trunk rotation and flexion, and shoulder
abduction and internal rotation.
5.3.2 Definition of the Actions
The ten actions for iSTRETCH are the same as those in the STRENGTHEN model. Again, these
actions are:
1. setd1resnone (sets d=d1 and r=none)
2. setd2resnone (sets d=d2 and r=none)
3. setd3resnone (sets d=d3 and r=none)
4. setd1resmin (sets d=d1 and r=min)
58
5. setd2resmin (sets d=d2 and r=min)
6. setd3resmin (sets d=d3 and r=min)
7. setd1resmax (sets d=d1 and r=max)
8. setd2resmax (sets d=d2 and r=max)
9. setd3resmax (sets d=d3 and r=max)
10. stop (terminates the exercise)
5.3.3 Definition of the Observation Variables and Observation Function
This model has essentially the same observation variables as the previous model, except for the
OH variable since the hat variable was removed. Now there are three fully observable
observation variables, OT={none, slow, norm}, OCT={none, min, max}, and OCO={yes, no}, that
correspond to the state variables ttt, ctrl, and comp, respectively. Again, the observation function
is deterministic, thus the state variables ttt, ctrl, and comp, are actually the observation variables.
5.3.4 Definition of the Transition Function
Instead of explicitly using CPTs to describe the transition probabilities of the variables as
described in the STRENGTHEN model in Section 5.2.4, the transition probabilities in the
iSTRETCH model were automatically generated by using a simple parametric function. The
performance and fatigue variables in the iSTRETCH model were functions of stretch and fat.
For example, if the user is not fatigued and the system sets a target with a stretch of 0 (i.e. at the
user’s range), then the user might have a 90% chance of reaching the target at normal time
(ttt=norm). However, if the stretch is set to 1, then this chance might decrease to 50%. Even if
the stretch is 0, but the user is fatigued, the chance of reaching the target at ttt=norm will also
decrease. This idea was applied to the other variables modelling the user’s control and
compensation, and even their fatigue levels. Certainly, a larger stretch will increase the
probability of the user becoming fatigued.
59
The sigmoid function was used as the common parametric function, which relates stretch and
fatigue levels to user performance. This function, named the pace function, φ(st,f), is a function
of stretch, st, and fatigue level, f:
φ(st, f ) =1
1+ e−(st−m−m( f )) /σ st (5.2)
where m is the mean stretch (the value of stretch for which the function φ is 0.5 if the user is not
fatigued), m(f) is a shift dependent on the user’s fatigue level (e.g. 0 if the user is not fatigued),
and σst is the slope of the pace function. There was one sigmoid function for every value of each
variable.
For each pace function, there were three parameters that needed to be specified: m, σst, and m(f),
where the latter is technically a function, but since the fatigue variable is a binary value in this
model, it is a single real-valued parameter. However, it was simpler to specify the pace function
in terms of upper and lower pace limits: the values of stretch where a user’s performance will
vary by a certain probability when the user is not fatigued (m(f)=0). For example, the upper pace
limit for a user to compensate (comp=yes) when not fatigued is the stretch at which the user will
compensate with a probability of φ+. Similarly, the lower pace limit for comp=yes is the stretch
at which the user will compensate with a probability of φ- (so succeed in reaching the target with
comp=no with a probability of 1-φ-). Denoting the upper and lower pace limits by st+ and st-,
respectively, the following two equations were derived:
φ+ =1
1+ e−(st+ −m) /σ st (5.3)
φ− =1
1+ e−(st− −m) /σ st (5.4)
which could be solved for m and σst:
m =st+β− − st−β+
(β− − β+) (5.5)
60
σ st =st+ − st−
(β+ − β−) (5.6)
where β+ = ln φ+
1− φ+
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟ and β− = ln φ−
1− φ−
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟ .
The fatigue effect, m(f), was the last parameter to specify, and is a negative number that shifts the
pace function downwards. The amount of shift indicates the amount the pace limits will be
shifted down when the user is fatigued. Figure 5.9 shows an example pace function for
comp=yes. Notice that both pace limits decrease when the user is fatigued (at the same
probability). In other words, the user is more likely to compensate when fatigued.
61
Figure 5.9: Example pace function for comp=yes, with φ+ = 0.9, φ- = 0.1, st+ = +3, st- = -1, m(f=yes) = 0.8, and m(f=no) = 0.0. Shown are the upper and lower pace limits, and the pace function for each condition of fat.
For the variables with three values, such as ttt and ctrl, two pace function need to be specified,
one for the lowest value and one for the highest. The middle value gets what is left of the
probability mass. Figure 5.10 shows an example pace function for ttt.
62
Figure 5.10: Example pace function for ttt, with φ+ = 0.9, φ- = 0.1, and m(f=no) = 0.0. Shown are the upper (st+ = -3) and lower (st- = +2) pace limits for ttt=norm, and the upper (st+ = +4) and lower (st- = +1) pace limits for ttt=none. The pace function for ttt=slow gets what is left of the probability mass.
The ranges in the current model were modelled separately, although they could also use the
concept of pace functions. They were modelled such that the higher learning rate, the faster the
user’s range increased and vice versa. Setting target distances at or just above the user’s current
range will cause the learning rate to increase (i.e. progress towards learnrate=hi) and in turn,
cause the range to slowly increase. However, when the user is fatigued, their range does not
progress as well.
63
5.3.5 Definition of the Reward Function
The reward function in the iSTRETCH model was constructed in the same way as the
STRENGTHEN model – to encourage the system to guide the user to exercise at maximum
target distance and resistance level, while performing the task with maximum control and
without compensation. Therefore, the system was given a large reward for getting the user to
reach the furthest target at maximum resistance (similar to the previous model). Smaller rewards
were given when targets were set at or above the user’s current range (i.e. when stretch >= 0),
and when the user was performing well (i.e. ttt=norm, ctrl=max, comp=no, and fat=no).
However, no reward was given when the user was fatigued, failed to reach the target, had no
control, or showed signs of compensation during the exercise. The system also did not get
rewarded for negative stretches as this would delay the progress towards its goal. Table 5.3
shows the reward function in the final version of the iSTRETCH model.
64
Table 5.3: Reward function for iSTRETCH model
Aspect Definition Reward Value
r=none 1
r=min 14
Larger rewards were given for
setting r higher
r=max 80
d=d1 1
d=d2 9
Larger rewards were given for
setting d higher
d=d3 11
ttt=none 0
ttt=slow 0.8
Larger rewards were given when
user reached target in normal time
ttt=norm 1
ctrl=none 0
ctrl=min 0.3
Larger rewards were given when
user had control
ctrl=max 1
comp=yes 0 Reward was given when user did not
compensate comp=no 1
fat=yes 0 Reward was given when user was
not fatigued fat=no 1
65
stretch= -2 0
stretch= -1 0
stretch= 0 0.4
stretch= +1 1
stretch= +2 1
stretch= +3 1
stretch= +4 1
stretch= +5 1
stretch= +6 1
stretch= +7 1
stretch= +8 1
Small rewards were given when d
and r were set at or above n(r); none
were given when d and r were set
below n(r)
stretch= +9 1
5.3.6 Computation of the iSTRETCH Model
5.3.6.1 Selection of the Solution Method
The model was solved using the same solution method as the previous model – symbolic
Perseus. This approximation method had to be used as iSTRETCH had 82,944 possible states,
which was double that of the STRENGTHEN model.
5.3.6.2 Iteration Process and Solving the Model
The full model was again built up through many iterations. This time at each iteration, either the
reward function was modified or the transition probabilities were changed through the
adjustment of the various pace limits. After each modification, the resulting partial model was
66
solved. There were a total of 64 partial versions of the model before the full model was
successfully solved.
The final model was sampled with a set of 3,000 belief points that was again generated from 20
different initial belief states (one for every range possibility). The iSTRETCH model also had a
discount factor (β) of 0.95 and was solved with MATLAB® on a dual AMD Opteron™
(2.4GHz) CPU using 150 α-vectors and 150 iterations in approximately 13.96 hours.
5.4 Comparison of STRENGTHEN and iSTRETCH Models Once the policy of a model is solved, there needs to be a way to determine how well the model is
performing in real-time. To do this, a simulation program was developed in MATLAB® which
was based on the decision cycle of a POMDP agent described in Section 4.2.2. The simulation
starts with an initial belief state and then the POMDP decides on an action for the system to take
(predetermined by the policy). Observation data is manually entered since the POMDP model
had not yet been integrated with the robot (once integrated, the POMDP would automatically
receive the observation data from the robotic system). A new belief state is then computed, the
next action is determined, and the cycle keeps continuing. If the action is to stop the exercise,
the simulation program resets the fatigue variable (i.e. user is un-fatigued after resting), carries
over the ranges, and the decision cycle starts once again. The following section discusses the
comparison of the STRENGTHEN and iSTRETCH models through a few simulation examples.
Note that although both models behave quite differently, every attempt was made to keep the
observation input the same for both models for direct comparison.
The performance of both the models was subjectively rated by the researcher and focused on
whether the system was in line to that of conventional reaching rehabilitation by:
• gradually increasing target distance first, then resistance level as the user performed well
(i.e. reached target with normal time, maximum control, and no compensation), and
• increasing the rate of fatigue if the user was not performing well (i.e. failed to reach the
target, had no control, or compensated).
67
In the first simulation example (Figures 5.11 and 5.12 show a portion of the entire simulation),
the user is assumed to have trouble reaching the maximum target, d=d3, at zero resistance,
r=none. The simulation of both models starts with the same initial belief state, which assumes
that the user’s range at each resistance (i.e. n(none), n(min), and n(max)) is likely to be none, and
that the user is not fatigued with a 95% probability. There are two addition variables in the
iSTRETCH model, stretch and learnrate, that are also estimated. In both models, the POMDP
slowly increases the target distance from d1, to d2, and then to d3 while keeping at the same
resistance level (r=none) when the user successfully reaches the target in normal time, with
maximum control, and with no compensation. However, according to the initial user
assumption, at d=d3 the user fails to reach the target (i.e. OH=no and OT=none), has minimum
control (OCT=min), and does not compensate (OCO=no). The updated belief state is shown in
Figure 5.11a for the STRENGTHEN model and Figure 5.11b for the iSTRETCH model. In the
STRENGTHEN model, the POMDP decides to reduce the target to d1 since the user had trouble
reaching d3. Here, the user has no problem reaching d1 and the updated belief state is shown in
Figure 5.12a. In the iSTRETCH model, after the user failed to reach d3, the POMDP decides to
keep the same target at d3 since stretch is about 75% likely to be 0 (i.e. at the user’s range).
Again, based on the initial user assumption, the user fails to reach the target with minimum
control and no compensation. This updated belief state is shown in Figure 5.12b.
68
STRENGTHEN
(a)
iSTRETCH
(b)
Figure 5.11: (a) Updated belief state of n(r) and fat after the user failed to reach d=d3, had minimum control and no compensation. The POMDP decides to set the next action at d=d1 at the same resistance (r=none); (b) Updated belief state of n(r), stretch, fat, and learnrate after the user failed to reach d=d3, had minimum control and no compensation. The POMDP decides to set the next action at d=d3 at the same resistance (r=none).
69
STRENGTHEN
(a)
iSTRETCH
(b)
Figure 5.12: (a) Updated belief state after the user successfully reached d=d1, with maximum control and no compensation; (b) Updated belief state after the user failed to reach d=d3, with minimum control and no compensation.
70
Although the decisions of both models seem believable (i.e. gradually increased target distance
before resistance level), the belief state of the fatigue variable in the STRENGTHEN model does
not. After the user failed to reach d3 in Figure 5.11a, the user was about 30% likely to be
fatigued. However, as soon as the user reached d1 in the next time step, the percentage of the
user being fatigued was reduced to less than 10% as seen in Figure 5.12a. The dynamics of the
fatigue variable in the STRENGTHEN model do not resemble that of real-life situations. When
a user is fatigued in the real world, it is impossible for them to get un-fatigued in the next time
step. Because of these dynamics, the POMDP produces the same actions (i.e. increase target
distance from d1, to d2, to d3, and then back down to d1) over and over again if the user’s
observations are the same (i.e. successfully reaches the target set at d1 and d2, but fails to reach
the target at d3 with minimum control and no compensation). The only way the user will
become fatigued (and the action stop is determined) is if the user fails to reach the target at d2
instead of d3. The complete first simulation example of the STRENGTHEN model can be seen
in Appendix II. On the other hand, the fatigue variable in the iSTRETCH model was behaving
quite well during this simulation example. Although the POMDP did not decrease the target as it
did in the other model so a direct comparison of fatigue cannot be established in this way, the
general behaviour of fatigue can be commented on. After each time step in the beginning of the
simulation, the fatigue level increased slowly due to progression of time. When the user failed to
reach d=d3 the first time (Figure 5.11b), the level of fatigue jumped from about 10% to 25%.
After the second failure, (Figure 5.12b), the level of fatigue increased even more to about 40%.
Thus, iSTRETCH was performing quite well in terms of the performance criteria, as the rate of
fatigue increased when the user did not perform well. The complete first simulation example of
the iSTRETCH model is also shown in Appendix II.
In the second simulation example (Figures 5.13, 5.14, and 5.15 show a portion of the entire
simulation), the user is assumed to be able to reach the maximum target, d=d3, at maximum
resistance, r=max, in the beginning, but then slowly starts to compensate and lose control.
Again, the simulation of both models starts with the same initial belief state, this time assuming
that the user’s range at both zero and minimum resistance (i.e. n(none) and n(min)) is likely to be
d3, and the user’s range at maximum resistance, n(max), is likely to be d1. In addition, the initial
belief state assumes that the user is not fatigued with a 95% probability. From this initial belief
state, both models set the first action to be d=d1 and r=max. According to the initial user
71
assumption, the user successfully reaches this target in normal time, with maximum control, and
with no compensation. In the next time step, the STRENGTHEN model decides to set the target
at d=d3 at the same resistance, skipping d=d2, which does not follow the performance criteria of
gradually increasing the target distance. Conversely, at the next time step, the iSTRETCH model
decides to set the target at d=d2. The iSTRETCH model keeps the target at d2 for one more time
step before setting it at d=d3 (assuming the user reaches the targets at d2 successfully). Both
models decide to set the target at d=d3 for two more time steps, assuming the user successfully
reaches the target, with maximum control, and no compensation. The updated belief state is
shown in Figure 5.13a for the STRENGTHEN model and Figure 5.13b for the iSTRETCH
model. Now, during this time step when both models again decide to set the target at d=d3, the
user starts to compensate but still is able to reach the target with maximum control. The updated
belief state is shown in Figure 5.14a and Figure 5.14b for the STRENGTHEN and iSTRETCH
models, respectively. Again, both models set the same target and the user compensates once
more. Figure 5.15a and 5.15b show the updated belief state for STRENGTHEN and
iSTRETCH, respectively. This time, the iSTRETCH model decides to stop the exercise because
it believes the user is fatigued due to performing compensatory movements for two consecutive
times. However, the STRENGTHEN model continues and still decides to set the target at d=d3.
This time, in addition to reaching the target and compensating, the user starts to lose control
(OCT=min) and takes longer to reach the target (OT=slow). This combination is performed again
in the next time step and the updated belief state can be seen in Figure 5.16. The
STRENGTHEN model decides to set the same target, d=d3, at the same resistance level. The
user starts to lose even more control (OT=none) at this time step and the updated belief state is
shown in Figure 5.17. Notice the transition in the fatigue variable from Figure 5.16 to Figure
5.17. Starting from the initial belief state until Figure 5.16, the fatigue level was gradually
increasing due to time passing, an increase in compensation, and a decrease in control and time
to target. However, as soon as the user showed no control between Figure 5.16 to 5.17, the
fatigue level decreased from a 65% to 55% chance of being fatigued. Again, this does not
follow what is typically seen during conventional therapy. A lack of control indicates the
presence of fatigue as described in Section 5.1.1. Figure 5.18 shows the next time step with the
fatigue level decreasing once again (assuming the same action and observations occurred). In
fact, this model keeps producing the same action over and over again (assuming the same
observations) since the fatigue variable will never be fat=yes. Thus, the exercise will never stop
72
to give the user a rest. The complete second simulation of both the STRENGTHEN and
iSTRETCH models can be seen in Appendix II.
73
STRENGTHEN
(a)
iSTRETCH
(b)
Figure 5.13: (a) Updated belief state of n(r) and fat after the user successfully reached d=d3, with maximum control and no compensation. The POMDP decides to set the next action at d=d3 at the same resistance (r=max); (b) Updated belief state of n(r), stretch, fat, and learnrate after the user successfully reached d=d3, with maximum control and no compensation. The POMDP decides to set the next action at d=d3 at the same resistance (r=max).
74
STRENGTHEN
(a)
iSTRETCH
(b)
Figure 5.14: (a) Updated belief state after the user successfully reached d=d3, with maximum control but this time with compensation. The POMDP decides to set the next action again at d=d3 at the same resistance (r=max); (b) Updated belief state after the user successfully reached d=d3, with minimum control but this time with compensation. The POMDP decides to set the next action again at d=d3 at the same resistance (r=max).
75
STRENGTHEN
(a)
iSTRETCH
(b)
Figure 5.15: (a) Updated belief state after the user again successfully reached d=d3, with maximum control and with compensation. The POMDP decides to set the next action again at d=d3 at the same resistance (r=max); (b) Updated belief state after the user again successfully reached d=d3, with minimum control and with compensation. The POMDP decides to stop the exercise.
76
Figure 5.16: STRENGTHEN model. Updated belief state of n(r) and fat after the user successfully reached d=d3 in slow time, with minimum control, and with compensa on. The POMDP decides to set the next action at d=d3 at the same resistance (r=max).
ti
77
Figure 5.17: STRENGTHEN model. Updated belief state after the user successfully reached d=d3 in slow time, with compensation but this time with no control. The POMDP decides to set the next action at d=d3 at the same resistance (r=max). Notice the reverse in the fatigue level from the previous Figure 5.16.
78
Figure 5.18: STRENGTHEN model. Updated belief state after the user again successfully reached d=d3 in slow time, with no control, and with compensation. The POMDP decides to set the next action at d=d3 at the same resistance (r=max). Notice again the reverse in the fatigue level from the previous Figure 5.17.
Table 5.4 summarises the pros and cons for each model during both simulations. The
iSTRETCH model had more advantages compared to the STRENGTHEN model during
simulation.
79
Table 5.4: Summary of pros and cons of both models during simulation
Model PROS CONS
STRENGTHEN • fatigue level slowly increases
due to passing time
• fatigue level decreases during
simulation (i.e. user gets
unfatigued), which does not
model real-life situations
• rate of fatigue level does not
increase faster when user does
not perform well (i.e. fails to
reach target, takes longer to
reach target, has no control,
compensates)
• target distance does not
gradually increase
iSTRETCH • fatigue level slowly increases
due to passing time
• fatigue level does NOT
decrease during simulation
• rate of fatigue level increases
faster when user does not
perform well (i.e. fails to reach
target, takes longer to reach
target, has no control,
compensates)
• target distance gradually
increases
• perhaps the exercise stops too
fast when user is not
performing well (i.e. fatigue
level progresses too fast to
fat=yes)
80
From a design perspective, the iSTRETCH model also had more advantages than the
STRENGTHEN model. Table 5.5 summarises the computational aspects of each model.
Although the number of partial versions (i.e. changes) for each model is the same, the
iSTRETCH model had fewer modifications relative to its number of states. Therefore, more
time was spent on developing the STRENGTHEN model. The iSTRETCH model took about
2.69 times longer than that of the STRENGTHEN model to computed its policy. However, as
the policy is computed offline, the difference in the number of computational hours was not
critical.
When a problem occurred during simulation, it was difficult to find where the problem lay in the
STRENGTHEN model. There were too many probabilities to work with, as they were explicitly
defined in the CPTs, which made it less intuitive as to where the problem was. iSTRETCH used
a pace function to automatically generate the transition probabilities. This made the model much
easier to modify since there were only a few parameters to change (i.e. pace limits).
Representing the model dynamics as a pace function made it easier to correct the problem.
Table 5.5: Summary of computational aspects of each model
Computation Features STRENGTHEN iSTRETCH
Number of states 41,472 82,944
Hours to compute policy 5.18 13.96
Number of partial versions of the model 63 64
From the comparison of the two models explained above, iSTRETCH was chosen to be the final
POMDP as it seemed to be better than the STRENGTHEN model in terms of performance and
computation. However, this assumption can only be proven through a clinical evaluation of the
comparison of the two models. Due to time constraints, this could not be performed for this
thesis – only the iSTRETCH model was tested.
81
Chapter 6 Integration of the POMDP Model with the Robotic System
6 Integration of the POMDP Model with the Robotic System
Figure 6.1 shows the diagram of the overall reaching rehabilitation system: the robotic system
(Figure 6.1a) and the POMDP agent (Figure 6.1b). As the user performs the reaching exercise,
data from the robotic system is used as observational input to the POMDP agent, where it
estimates the progress of the user and decides on an action for the system to take.
82
Figure 6.1: Diagram of the reaching rehabilitation system consisting of the robotic system (a) and the POMDP agent (b)
6.1 Acquisition of Data from the Robotic System The controller and VE of the robotic device were developed by Quanser Inc. Both were written
in the Python programming language. The controller was responsible for providing feedback
83
control, rendering the VE on the computer monitor, and calculating performance statistics during
the exercise. The VE was designed to reflect the POMDP model such that it could incorporate
three linear targets and three resistance levels. In the end, the reaching exercise took the form of
a 2D, linear bull’s eye game as previously shown in Figure 5.4.
A micro-controller was used to establish the communication between the photoresistor sensors
described in Section 5.1.2 and the computer. The micro-controller (Figure 6.2), a Massachusetts
Institute of Technology’s Handyboard, was programmed in the C language. The sensitivity
threshold for light detection in the photoreistors was designed to detect a gap of approximately
2cm (Lam et al., 2008). The output from the photoresistors were 4-bits in length and the transfer
to the computer was programmed to be bidirectional and asynchronous. The source code of the
micro-controller can be seen in Appendix III.
Figure 6.2: Massachusetts Institute of Technology’s Handyboard (micro-controller)
84
The following parameters from the robotic system were used as observation input to the POMDP
agent:
• Boolean flag to indicate whether target was reached: the robot tracks the position (x,y)
coordinates of the end-effector and if it gets within four millimetres (mm) of the target
position, the flag is set to 1 (i.e. target was reached)
• time: the robot keeps track of the time to determine how long it takes the user to reach the
target from the starting point
• deviation from straight path: is the average amount (in mm) the user strays from the zero
position on the y-axis, calculated from the starting point to the target
• rotation of the end-effector: is the average amount, in degrees, the user rotates the end-
effector from the zero position, calculated from the starting point to the target
• detection of light from the chair: a 4-bit data packet is continuously being sent to the
computer, with each value indicating which photoresistor sensors are detecting light (e.g.
0001 means light is detected from the right photoresistor, 0010 is from the lower back;
0011 is from the right and lower back; 0100 is from the left; 0101 is from the left and
right; 0110 is from the left and lower back; and 0111 is from the left, right, and lower
back)
6.2 Setting the Value Ranges for the Observation Variables Based on the parameters captured by the robotic system described above, the values for the
POMDP observation variables, OH (only in the STRENGTHEN model), OT, OCT, and OCO, were
first chosen by the researcher, and then evaluated by an OT at TRI to verify that the values were
suitable for moderate-level stroke patients. The following values were chosen as the final
observation variables.
The values of the OH variable were determined by the target’s Boolean flag. If the flag was set to
1, OH =yes. If not, OH =no.
85
The ranges for OT were determined from the time parameter as follows:
• OT = norm, if the time to reach the target was between 0 and 7.5 seconds
• OT = slow, if the time to reach the target was between 7.5 and 15 seconds
• OT = none, if the time to reach the target was longer than 15 seconds
Note that the user was given a maximum of 15 seconds to reach the target before a timeout
occurred, in which case the user failed to reach the target (OH =no).
The ranges for OCT were determined from the deviation parameter as follows:
• OCT = max, if the average deviation was between 0 and 7 mm
• OCT = min, if the average deviation was between 7 and 20 mm
• OCT = none, if the average deviation was greater than 20 mm
The values of the OCO variable were determined by both the rotation and light detection
parameters. If the end-effector was rotated by an average of greater than 45 degrees and/or any
of the photoresistors detected light, then OCO =yes. If neither occurred, OCO =no.
6.3 Merging the POMDP Agent with the Robotic Device Controller
The integration of the POMDP agent with the robot’s controller (including the photoresistor
micro-controller) was written in MATLAB®. This MATLAB® program handled:
• the communication of data between the Python controller and the POMDP agent,
• the transfer of data from the micro-controller, and
• the decision cycle of the POMDP agent (belief state update).
86
The following diagram (Figure 6.3) shows the interaction between the POMDP agent and both
controllers of the robotic system. The steps performed during the reaching exercise are
described as follows:
1. From the initial belief state, the POMDP agent decides on an action for the robotic
system to take (determined by the policy computed previously offline).
2. If the action was to stop the exercise, the MATLAB® program would terminate the
exercise cleanly and start from Step 1 after the user rested. If the action was to set a
particular target distance and resistance level, the MATLAB® program would send both
those target coordinates and resistance level to the Python controller, where it would then
set the appropriate target and resistance in both the real (robotic device) and virtual
environments.
3. As the user performs the reaching exercise by trying to reach the target, parameters from
the robotic controller and micro-controller, as described above, are sent as observation
input to the POMDP agent.
4. The POMDP agent then takes this observation input and determines the values of OH
(only in the STRENGTHEN model), OT, OCT, and OCO from the prescribed values
described in Section 6.2.
5. Now, the POMDP agent has enough information (i.e. the current observation, the
previous action, and the previous belief state) to compute the new belief state (as
described in Section 4.2.2.1).
6. From the new belief state, the POMDP agent decides on the next action for the system to
take (again, determined by the policy), and steps 2-6 are repeated.
87
Figure 6.3: Interaction between POMDP agent and robotic controller
88
Chapter 7 Evaluation Study
7 Evaluation Study Once the POMDP was successfully integrated with the robotic system, it was necessary to
develop a method for evaluating the decisions being made by the POMDP agent.
7.1 Questions to be Answered by the Study The study was designed to provide insight into the following questions:
1. Does the POMDP make decisions comparable to those made by human therapists in
guiding stroke patients during a reaching exercise?
2. Are there any aspects of the system which seem to get more positive or negative feedback
from patients and therapists?
3. What do these results mean in terms of future development of the POMDP model and
overall system?
7.2 Participants Twelve (12) stroke survivors (hereafter referred to as patient-participant(s) or patient(s)) and
twelve (12) occupational and/or physical therapists (hereafter referred to as therapist-
participant(s) or therapist(s)) were intended to be recruited for this study from the University
Centre of the Toronto Rehabilitation Institute, located in Toronto, Canada. The sample size was
chosen based on a statistical power analysis of a one sample, one tail t-test assuming a large
effect size of 0.8 (Cohen, 1992), with a significance level of 0.05 and meaningful power of at
least 0.8 (Faul, Erdfelder, Lang, & Buchner, 2007). The one sample t-test was chosen for this
study because it can test whether the observed mean (expressed as a percentage of agreement) is
89
distinctly different (large effect size) than a hypothetical value of 100%. To be included in this
study, therapists must have had at least one recent year of experience with conducting reaching
motion therapy for upper-limb stroke rehabilitation, was fluent in English, worked at TRI, and
was able to consent.
To be included in this study, patient-participants must:
• have been right-side hemiparetic resulting from unilateral stroke at least 6 months before
enrollment
• have scored between 3 to 5 (inclusive) on the arm section of the Chedoke-McMaster
Stroke Assessment (CMSA) Scale (Gowland et al., 1993).
• have been able to move to some degree, but still have impaired movements as determined
by their therapist
• have been fluent in English, such that they could understand and respond to simple
instructions
Patient-participants were excluded from the study if s/he:
• exhibited a hearing and/or visual impairment that may have interfered with their ability to
understand verbal instructions and/or observe graphics on the computer screen
• had a history of physical aggression, agitation and/or exit seeking behaviour
• experienced any upper-limb joint pain or range-of-motion limitations on the affected side
that may have interfered with the ability of the person to perform linear-reaching task
movements
Patient-participants were asked to continue any outpatient therapies in which they were enrolled
in at the time of study acceptance.
90
7.3 Testing Methodology Each patient participant was randomly paired up with one of the participating therapists, each
creating a unique patient-therapist pair. Each session lasted for approximately 1 hour and
10 minutes and was intended to be completed once per day, three times a week during weekdays
(Monday through Friday) for four weeks.
Another therapist (hereafter referred to as the CMSA-therapist), who was not one of the
therapist-participants, was responsible for administering the CMSA on each stroke subject at
three time points in the study: at the start (week 0), in the middle (week 2), and at the end (week
4). The length of time for this assessment was approximately 20-30 minutes for the arm. While
it is to be noted that the goal of the study was not to improve upper-limb motor function in
patients but to assess the decision-making strategy used by the POMDP system, CMSA
measurements of upper-limb motor function in patients were performed throughout the study to
ensure no negative effect (decrease in motor function, occurrence of pain) occurred due to
exercising with the system. This CMSA-therapist had decision-making authority as to whether
the patient-participant should be withdrawn early from the study.
For each session, the therapist brought his/her assigned patient to the testing room. Patient
participants were seated on a regular, straight-back chair positioned to the left of the robotic
device. The therapist was responsible for adjusting the position of the chair, placing the trunk
sensors at the appropriate spots (lower back, lower left scapula, and lower right scapula), and
adjusting the height of the robot to ensure that the end-effector was correctly positioned in the
saggital plane of the patient’s right shoulder.
At the start of the study, both participants were briefed by the researcher as to the purpose of
their participation and were encouraged to ask questions and voice any opinions at any time.
The participants were introduced to the rehabilitation system, which consisted of the haptic-
robotic device, unobtrusive trunk sensors, computer display, and POMDP. Once both
participants were familiar with the operation of the system, time was given to the patient to test
the equipment with the power off to familiarise themselves with pushing and pulling the end-
effector.
91
When both participants were comfortable with the device and ready to begin, the researcher
powered on the robotic device and started up the computer programs that controlled the POMDP
agent, robotic device, and virtual environment. The patient was asked to place their hand on the
end-effector, which was secured with a comfortable strap, and when ready, the researcher started
the exercise.
The exercise consisted of three parts:
Part A - after the POMDP made a decision (i.e. to set the target position and resistance
level, or to stop the exercise) the therapist either agreed or disagreed with the decision
made;
Part B - the researcher had the device either execute the decision made by the POMDP if
the therapist agreed or execute the decision made by the therapist if the therapist
disagreed; and
Part C - the patient then performed the reaching exercise by trying to reach the target on
the computer screen.
These parts were repeated in the order of A-B-C until the end of the session, which lasted for
approximately 45 minutes. Once the session had ended, each participant was asked to fill out a
questionnaire. The procedure of each session can be summarised in the following steps:
1. Introduction to study and system (5 minutes)
2. Familiarise patient participant with end-effector, if necessary (5 minutes)
3. Researcher to power on device and computer programs (5 minutes)
4. Perform exercise in order of A-B-C (45 minutes)
5. Fill out questionnaire (10 minutes)
Total duration: approximately 1 hour and 10 minutes (per session)
For the very first session of the study, all patients started with no resistance (r=none) at the
shortest target distance (d=d1). Depending on how they progressed, the POMDP agent adapted
92
to each patient differently and thus, each session thereafter started at a different resistance and
target distance for each patient.
7.4 Modification of Integrated System To incorporate the therapists’ decisions in the study, the MATLAB® program explained in
Section 6.3, which integrated the POMDP agent with the robotic controller, had to be modified.
Instead of automatically sending the target coordinates and the resistance level to the Python
controller once the action was determined from the policy, an extra step had to be taken. This
step involved the therapist either agreeing or disagreeing with the decision made by the policy.
If the therapist agreed, the same target coordinates and resistance level were sent to the
controller. If the therapist disagreed with the decision made, the therapist would choose which
action they thought to be correct, and that target distance and resistance level would be sent to
the controller (if the therapist chose to stop the exercise, the program would terminate). Figure
7.1 shows the modified diagram of Figure 6.3, which includes the therapist’s decision.
93
Figure 7.1: Interaction between POMDP agent and robotic controller, via the therapist
A graphical user interface (GUI) was developed for the therapist during the study to make the
decisions and choices clearer. Figure 7.2 shows an example GUI for the therapist, which was
also developed in MATLAB®. The GUI displays:
A. the decision made by the POMDP agent
B. a button to indicate whether the therapist agrees or disagrees with the decision made
C. a panel of buttons representing the ten actions to choose from (this is only displayed
when the therapist disagrees with the decision made)
D. the decision made by the therapist
E. a history of the previous actions and observations made during the session
F. an emergency stop button that will suddenly kill the exercise if any dangerous events
occur
94
Figure 7.3 is a picture of the final rehabilitation system in use, with a monitor for the therapist.
Figure 7.2: Therapist GUI displaying: (A) decision from POMDP, (B) therapist agreement of decision made, (C) action choice if therapist disagrees, (D) decision from therapist, (E) history of actions and observations, and (F) emergency stop button.
95
Figure 7.3: Final rehabilitation system in use consisting of: (A) virtual environment on the computer monitor, (B) therapist GUI on another monitor, (C) end-effector with rotational encoder, (D) haptic-robotic device, (E) trunk photoresistor sensors (not seen – placed on chair), and (F) robotic controller and POMDP agent
7.5 Capturing Decisions Made by POMDP and Therapist Throughout the duration of the study, every decision made by both the POMDP and the therapist
was saved to a local file for later analysis. Specifically, the following data were captured:
• the decision made by the POMDP, which consisted of setting a particular target distance
and resistance level, or of stopping the exercise
96
• the agreement or disagreement of the therapist to the decision made by the POMDP
• the decision made by the therapist, which consisted of setting a particular target distance
and resistance level, or of stopping the exercise
• a timestamp of when the therapist made the decision
• the observation data
• the initial belief state
• the final belief state after each session
• the final state of the user’s range after each session to be carried over to the next session
Note that the intermediate belief states of each session were not recorded as they could be re-
created later by simulation, starting from the initial belief state and entering the observations and
actions recorded from the study.
7.6 Questionnaire Questions were asked at the end of each session and at the completion of the study for both
participants. The questionnaire for the therapist-participant was designed to focus on rating the
decision-making strategy of the POMDP system. For the patient-participant, the questionnaire
focused on gathering feedback with respect to their satisfaction in using such a robotic system.
Both questionnaires consisted of quantitative and qualitative questions for statistical analysis and
to provide insight into future design improvements, respectively.
7.6.1 Questionnaire for Therapists
The questionnaire for the therapist can be seen in Appendix IV. The first “Participant
Information” page was to gather basic personal information from the therapist. This was used as
background information only.
97
The next section, “Evaluation of Decisions Made by Control System”, was filled out at the end
of every session. The first two questions were rated by circling the appropriate response on a
four-point Likert scale. The four-point scale was deliberate for simplicity and to discourage
neutral answers. The qualitative question encouraged elaboration on any aspects related to the
decisions made by the system.
The final “Overall Evaluation” was filled out once the study was completed. It consisted of five
qualitative questions that focused on the overall decisions made by the system and on the
potential of the system delivering upper-limb reaching rehabilitation.
7.6.2 Questionnaire for Patients
The questionnaire for the patient can be seen in Appendix V. Again, the “Participant
Information” page was to gather basic personal information from the patient.
The “Evaluation of System” section was to be filled out after every session. It consisted of five
four-point Likert scale questions. The qualitative question encouraged any other comments.
The “Overall Evaluation” form was filled out at the end of the study and consisted of eight
quantitative questions, followed by four qualitative questions. These questions focused on
different aspects of the physical system.
7.7 Ethics Approval The study protocol described above received approval five months after the original application
from the Toronto Rehabilitation Institute Research Ethics Board and the University of Toronto
Office of Research Ethics. Unfortunately, due to the delay in obtaining earlier approval, the
study could not be performed as intended, as this would have delayed the completion of this
project. As such, the recruitment time was shortened and the study duration was reduced. In the
end, only one patient-participant and one therapist-participant were recruited for this study,
which lasted for six sessions (each completed once per day, three times a week for two weeks).
Furthermore, the patient recruited for this study was not fluent in English and thus, could not
answer the session questionnaires. However, a translator was hired to help the patient answer the
final questionnaire at the end of the study.
98
Chapter 8 Results
8 Results The following section presents the quantitative and qualitative results from the study.
8.1 Subject Data Due to the delay in receiving ethics approval, only one therapist and one patient were recruited
for this pilot study. The therapist recruited for this study was a physical therapist with more than
nine years of experience in post-acute upper-limb stroke rehabilitation. The patient had suffered
a stroke 227 days (7 months and 14 days) before enrollment with a Chedoke-McMaster Stroke
Assessment (CMSA) arm score of 4, indicating moderate function of the arm (i.e. was able to
perform elbow flexion and extension, and shoulder flexion) (Gowland et al., 1993). At the end
of the two-week study, the patient’s CMSA score did not change. Tables 8.1 and 8.2 show the
participant information of both therapist and patient, respectively.
Table 8.1: Therapist information
ID Gender Profession Years of Experience
T01 Female Physical Therapist 9+ in post-acute (3 in acute care)
99
Table 8.2: Patient information
ID P01
Gender Male
Age (years) 54
Days Since Stroke Occurrence 227
Height (cm) 162
Weight (lbs) 155
Arm Length of Affected Arm - Shoulder
to Wrist (inches)
19
Height of Robot – to Top of End-effector
(inches)
28.5
CMSA of arm (week 0) 4
CMSA of arm (week 2) 4
8.2 Decisions from POMDP and Therapist Every decision made by both the POMDP and therapist, which occurred after a trial, were
broken down into three separate decisions: 1) the distance to set the target at, 2) the level to set
the resistance at, and 3) whether to stop the exercise or not. If the target distance and resistance
level were set (by either the POMDP or therapist), then stopping the exercise would be set to
“no”. However, if the decision was to stop the exercise (“yes”), then this decision would only
count as one (not three) since setting the target distance and resistance level would not be
applicable in this case. Raw quantitative data on the decisions made by the POMDP and
therapist can be seen in Appendix VI.
100
The level of agreement by the therapist to the decisions made by the POMDP was calculated
based on the three separate decisions as described above. A point of agreement would be given
if the therapist:
• set the same target distance as the POMDP, or
• set the same resistance level as the POMDP, or
• agreed with the POMDP to stop the exercise or not.
For example, if the POMDP decided to set the target at d1 with zero resistance (thus, not
stopping the exercise) but the therapist decided to set the target at d2 with zero resistance (again,
not stopping the exercise), then two points of agreement (out of three) would be given for setting
the same resistance level and not stopping the exercise. However, if the POMDP decided to stop
the exercise but the therapist decided to set the target and resistance at a particular level, then
zero points of agreement (out of only one) would be given in this case. The raw quantitative data
on the number of agreements between the POMDP and therapist can also be seen in Appendix
VI.
Figure 8.1 shows the percentage of the therapist’s agreement to the decisions made by the
POMDP system on target distance, resistance level, and stopping the exercise, as well as the
overall performance of the system for each session. For each decision, the percentage of
agreement generally seemed to improve over the course of the six sessions.
101
0
10
20
30
40
50
60
70
80
90
100
Percentage of Agreement Per
Session (%)
S01 59.09090909 59.09090909 30.48780488 40.47619048
S02 83.33333333 77.77777778 32.25806452 50
S03 100 92 28.72340426 52.08333333
S04 100 98.30508475 46.92307692 71.77419355
S05 100 96.36363636 54.28571429 76.74418605
S06 100 96.77419355 63.36633663 82.66666667
Target Resistance Stop Overall
Figure 8.1: Percentage of agreement per session
The following table (Table 8.3) shows the percentage of agreement for all sessions. Note that
there were 636 state transitions or decision points (i.e. total number of trials) and 1,154 decisions
made during the study.
102
Table 8.3: Percentage of agreement over all sessions
Number of
Agreement
Total Number of
Decisions
Percentage of
Agreement (%)
Target Distance 244 259 94.208
Resistance Level 235 259 90.734
Stop the Exercise (or not) 274 636 43.082
Overall 753 1154 65.251
8.3 Questionnaire Data Figure 8.2 summarises the therapist’s session responses, in terms of mean and standard deviation
(SD), regarding the appropriateness of the decisions made during the exercise and whether the
patient was given enough time to complete each trial before the next decision was made. These
ratings corresponded to questions a) and b) in the “Evaluation of Decisions Made by Control
System” questionnaire in Appendix IV. A four-point Likert scale was used with one
representing complete disagreement and four representing complete agreement. The raw
quantitative data of the therapist’s ratings per session can be seen in Appendix VII.
103
0
0.5
1
1.5
2
2.5
3
3.5
4
a) The decisions made during the exercise wereappropriate
b) The patient was given an appropriate amount oftime to complete each exercise before the next
decision was madeQuestions
Figure 8.2: Evaluation of POMDP decisions made by therapist on Likert scale with a mean and SD of 2.833 and 0.408, respectively, for question a) and a mean and SD of 3.167 and 0.408, respectively, for question b)
In addition to the quantitative ratings in the session questionnaire, a qualitative question was
asked to encourage the therapist to elaborate on any aspects related to the decisions made by the
POMDP system. In general, the therapist liked how the system kept setting the target at d3 and
the resistance at the maximum level (once the patient was able to perform the exercise at these
settings) for the patient to work on strengthening. However, the therapist would have liked the
system to be able to randomise different targets and resistances for the patient to work on
control. The raw qualitative data can also be seen in Appendix VII.
The overall questionnaire was filled out by the therapist at the end of the study. Table 8.4 lists
the qualitative responses from the therapist.
104
Table 8.4: Qualitative response from therapist for overall questionnaire
Question Therapist’s Response
Do the decisions seem believable?
If not, why?
Yes they did. Initially, it seemed to end early [i.e. stop the
exercise] for the “high level” patient whereas it could have
been used to strengthen perhaps.
Are there any other decisions you
feel the computer system should
make? If so, what are they?
For this patient, the parameters of control could perhaps
have been more stringent as his performance improved; i.e.
“perfect” vs. min or max control.
Can you envision using this
system as a therapy tool? Please
comment.
It would be great to use with people especially if there were
other dimensions of freedom, perhaps some variety of
targets just to keep interest for someone less motivated –
although this client did very well and was well motivated.
Independent use (in set up only) would be a helpful adjunct
to therapies.
Can you see this system being
used in the clinic, home setting, or
both? Please comment.
It could be integrated perhaps into a PS2 or Wii-type
system eventually to be used at home, but it would be
recommended to practice in clinic first -> Both.
Please elaborate on other
comments you might have.
Good for strengthening in one plane of movement. More
directions would be better to have variety of movements
instead of having a “stereotyped” movement.
As mentioned in Section 7.7, the patient recruited for this study was not fluent in English and
was not able to answer the session questionnaires. However, with the help of a translator, the
patient was able to answer the final questionnaire at the end of the study, which consisted of
eight quantitative four-point Likert scale questions and four qualitative questions. Tables 8.5 and
8.6 show the raw quantitative and qualitative data, respectively.
105
Table 8.5: Quantitative response from patient for overall questionnaire
Question Patient’s Response
Very jerky Very smooth How smooth do you find the
quality of motion of the
robotic device? 1 2 3 4
Not far Very far How do you feel regarding
how far the robot made you
reach? 1 2 3 4
Too little Too much How do you feel regarding
how much resistance the
robot applied? 1 2 3 4
Very
different Very alike How closely does the
exercise resemble the
reaching motion? 1 2 3 4
Very
different Very alike How closely does the
exercise compare to regular
upper-limb therapy? 1 2 3 4
No Yes Were you able to feel the
chair (trunk) sensors during
the exercise? 1 2 3 4
Very boring
Very
interesting How do you feel about the
game display on the
computer screen? 1 2 3 4
106
No Yes Would you use this robotic
system as your primary
therapy? 1 2 3 4
Table 8.6: Qualitative response from patient for overall questionnaire
Question Patient’s Response
What did you like about the system? Fairly good.
What did you not like about the system? [Nothing].
Is there anything you would like to change
about the system?
No.
Please elaborate on other comments you might
have.
(left blank)
107
Chapter 9 Discussion
9 Discussion
9.1 Study Analysis The small sample size of the study limited the use of hypothesis testing to interpret the data.
Section 7.2 calculated that the necessary sample size was 12. Therefore, the data collected in the
study from one therapist and one patient can only provide insight into the performance of the
system.
The therapist agreed with both the target distance and resistance level decisions made by the
POMDP approximately 94% and 90% of the time, respectively, during the study (shown in Table
8.3). Most of this agreement was with the POMDP repeatedly setting the target distance at d3
and the resistance at max. Since the patient was able to reach this setting within the first session
with proper posture and control, the POMDP continued to make this decision as it was given
large rewards for getting the user to reach the furthest target at maximum resistance. The
therapist generally agreed with these decisions as she wanted the patient to work on
strengthening.
However, the therapist only agreed with the POMDP approximately 43% of the time for the stop
decision. The POMDP wanted to stop the exercise to let the user take a break far more often
than the therapist wanted. If the therapist did not see any signs of fatigue from the user, she
would have the patient continue practising the exercise for a longer period of time and not stop.
After about 50 repetitions, the therapist would stop the exercise to let the user take a break. The
dynamics of the fatigue variable in the POMDP model caused its progression to fat=yes too
quickly. Decreasing this progression to match that of the therapist’s decision of stopping the
exercise can be fixed by adjusting the fatigue effects in the iSTRETCH model. Since the
percentage of agreement for the stop decision was low, the overall therapist agreement with the
POMDP decisions dropped to approximately 65%.
108
During each session, as soon as the POMDP estimated that the patient was fatigued, it
continually made the decision to stop the exercise no matter what decision the therapist entered
into the system. The therapist’s decisions alternated between having the patient work on muscle
strengthening (by repeatedly setting the distance and resistance at the highest level) and on
control (by randomising the target distance and resistance levels). However, repetition and
randomisation were not part of the POMDP’s initial objective and thus, the POMDP would never
make the decision to repeat or randomise the target distance and resistance levels. The low
percentage of agreement calculated for the stop decision may not have represented the POMDP’s
decision fairly as repetition and randomisation were not modelled. If the repeated stop decisions
were discarded, this percentage of agreement would have been approximately 94.167% (226 stop
agreements divided by 240 total stop decisions).
The therapist’s ratings on the appropriateness of the amount of time given to complete each trial
before the next decision was made were generally favourable with a mean score of more than 3.1
out of 4.0 on the Likert scale. However, the appropriateness of the decisions made by the
POMDP during the sessions was less favourable with a mean score of more than 2.8 out of 4.0.
Comments from the therapist suggested that randomising the target distance and resistance level
would be beneficial for the patient to work on control in addition to strengthening, which the
POMDP seemed to do by repeatedly setting the target distance at d3 and the resistance level at
max (once the patient was able to perform the exercise at these settings). These results imply
that perhaps in addition to the current model of gradually increasing target distance and then
resistance level, the rehabilitation exercise could include a timeframe where different target
distances and resistance levels would be randomly chosen.
The general qualitative results from the therapist for the final questionnaire can be summarised
as follows:
• the POMDP decisions were believable, except for the fact that the POMDP kept wanting
to stop the exercise too early
• the therapist could envision the rehabilitation system being used in both the clinic and
home setting, as long as the system could vary the locations of the target and not restrict
it to a straight path for more patient motivation, and was easy to set up for therapists
109
From the patient’s quantitative results for the final questionnaire (shown in Table 8.5), the
patient found the quality of motion of the robotic device to be very smooth with a score of 4.0
out of 4.0. The patient also felt that the robotic device made him reach very far during the
exercise with a score of 4.0 out of 4.0. However, the raw data in Appendix VI suggested that the
patient had no trouble reaching the furthest target distance at the maximum level of resistance
with proper posture, maximum control, and no compensation. This suggests that perhaps the
client did not fully understand the question. The patient also felt that the resistance applied by
the robotic device was too little with a score of 1.0 out of 4.0. Throughout the study, the patient
repeatedly commented that the exercise was “too easy”.
The patient was not able to feel the trunk sensors at all during the exercise, which suggests that
trunk compensatory movements can be captured unobtrusively. The patient also felt that the
bull’s eye game was somewhat interesting, scoring 3.0 out of 4.0.
The patient felt that the exercise closely resembled the reaching motion and conventional upper-
limb therapy, scoring 3.0 out of 4.0 for both. In addition, the patient believed he would use this
robotic system as his primary therapy, scoring 4.0 out of 4.0 on the Likert scale.
The patient did not elaborate on the qualitative questions, thus, feedback from this section of the
questionnaire was discarded.
9.2 Analysis of Other Upper Extremity Rehabilitation Robotic Systems
Compared with other rehabilitation robotic systems discussed in Section 3.1, this system was
able to operate autonomously without the explicit feedback from a therapist (or user) by
automatically adjusting exercise parameters from one trial to the next. Through observation and
estimation of states, the POMDP would automatically decide which target distance and
resistance level to set next.
The results from the study cannot conclude that this rehabilitation system can tailor the exercise
to each individual differently since only one patient was recruited and a comparison between
individuals could not be made. However, the system seemed to adjust the exercise parameters
110
according to the progression of the patient. In the beginning of the study, the patient was able to
reach the targets at particular resistance levels with ease. Thus, within the first session, the
POMDP had already set the exercise parameters to the furthest target distance and maximum
resistance level for the patient to practice at.
The POMDP system was also able to estimate user fatigue and in turn, make decisions to stop
the exercise for the user to take a break. Although the progression of fatigue may have been too
fast, the ability to capture fatigue is a great progress in the field of rehabilitation robotics as
fatigue can have a significant affect on a patient’s rehabilitation outcome.
9.3 Limitations The delay in receiving ethics approval limited the amount of time for recruitment, conducting the
study, and comparing the performance of the two models, STRENGTHEN and iSTRETCH, to
determine which one was better.
Given more time, the sample size could have been increased and thus, more data from different
therapists and patients could have been collected. The results described above may be quite
different from those involving more therapists and patients. In addition, the study could have
been expanded to involve both POMDP models. A similar study could have been performed
with one group of therapists and patients performing the exercise on the STRENGTHEN model
and the other group of therapists and patients performing the exercise on the iSTRETCH model.
An evaluation of each model and a comparison of both models would result, with further
analysis determining which model performed better.
9.4 Recommendations for Future Work It is recommended that the immediate future work for this project would be to test this POMDP
model with more participants in order to obtain significant results. It is not recommended to
change the POMDP model at this time before obtaining further results as the opinions of other
therapist may differ significantly from this therapist.
111
If more time permits, it is recommended that a similar clinical evaluation be performed to
compare the performance of the STRENGTHEN and iSTRETCH models. The following
suggestions can also be considered given more time:
• enhance the user interface to provide feedback for the user such as a scoring system or
sounds to indicate that the user has reached the target
• include target distances that are not restricted to the linear path
• develop an easier way to initialise the exercise such that both Python and MATLAB®
programs start automatically
112
Chapter 10 Conclusion
10 Conclusion Although no substantial conclusions can be made, the results from the study provided some
insight into the proposed questions.
1. Does the POMDP made decisions comparable to those made by human therapists in
guiding stroke patients during a reaching exercise? Based on one therapist’s point-of-
view, the decisions made by the POMDP for setting the target distance and resistance
level seemed comparable to those made by the therapist. However, the therapist
disagreed with the POMDP’s decision to stop the exercise more than half the time.
Overall, the therapist agreed with the decisions made by the POMDP approximately 65%
of the time.
2. Are there any aspects of the system which seem to get more positive or negative feedback
from patients and therapists? Overall, the patient was content with the system and would
use this system as their primary therapy. The only negative feedback from the patient
was that the resistances applied by the robotic device during the exercise were too little.
The therapist thought the POMDP decisions were believable and could envision this
system being used first, in the clinic and then eventually in the home setting. The
suggestions of improvement received from the therapist were to randomise the target
distances and resistance levels during the exercise, include target distances that were not
located on the straight path, increase the amount of repetition before stopping the
exercise, and develop an easier method of initialising the exercise.
3. What do these results mean in terms of future development of the POMDP model and
overall system? These results suggest that the dynamics of the fatigue variable in the
POMDP model may need to be changed in order for the POMDP to stop the exercise less
often. In addition, the POMDP model may have to be expanded to include more targets
(not located on the linear path) and randomisation of different target distances and
113
resistance levels. In terms of the robotic system itself, future changes may include
increasing the resistance levels applied by the device and developing an easier way to set
up the exercise.
This research demonstrates that POMDPs have promising potential to provide autonomous
upper-limb rehabilitation for stroke patients, which may allow clients to perform guided
rehabilitation when and where they prefer and enable them to progress at the best possible pace.
114
References Amirabdollahian, F., Loureiro, R., Gradwell, E., Collin, C., Harwin, W., and Johnson, G. (2007).
Multivariate analysis of the Fugl-Meyer outcome measures assessing the effectiveness of GENTLE/S robot-mediated stroke therapy. Journal of NeuroEngineering and Rehabilitation, 4(4), 1-16.
Barnes, M., Dobkin, B., and Bogousslavsky, J. (2005). Recovery after stroke. United Kingdom: Cambridge University Press.
Brewer, B. R., McDowell, S. K., and Worthen-Chaudhari, L. C. (2007). Poststroke upper extremity rehabilitation: A review of robotic systems and clinical results. Topics in Stroke Rehabilitation, 14(6), 22-44.
Buchanan, B. G. (2005). A (very) brief history of artificial intelligence. AI Magazine, 26(4), 53-60.
Canadian Stroke Network. (2007). Stroke 101. About Stroke. Retrieved July 21, 2008, from http://www.canadianstrokenetwork.ca/eng/about/stroke101.php
Caplan, L. R. (2000). Caplan’s stroke: A clinical approach, 3rd edition. Massachusetts: Butterworth-Heinemann.
Caplan, L. R. (2006). Stroke. New York: Demos Medical Publishing.
Carr, J. and Shepherd, R. (2003). Stroke rehabilitation: Guidelines for exercise and training to optimize motor skill. United Kingdom: Butterworth-Heinemann.
Cirstea, M. C. and Levin, M. F. (2000). Compensatory strategies for reaching in stroke. Brain, 123(5), 940-953.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.
Dobkin, B. H. (2008). Fatigue versus activity-dependent fatigability in patients with central or peripheral motor impairments. Neurorehabilitation and Neural Repair, 22(2), 105-110.
Erol, D., Mallapragada, V., Sarkar, N., Uswatte, G., and Taub, E. (2006). Autonomously adapting robotic assistance for rehabilitation therapy. Paper presented at the First IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, Pisa, Italy.
Fasoli, S. E., Krebs, H. I., and Hogan, N. (2004). Robotic technology and stroke rehabilitation: Translating research into practice. Topics in Stroke Rehabilitation, 11(4), 11-19.
Fasoli, S. E., Krebs, H. I., Stein, J., Frontera, W. R., Hughes, R., and Hogan, N. (2004). Robotic therapy for chronic motor impairments after stroke: Follow-up results. Archives of Physical Medicine and Rehabilitation, 85(7), 1106-1111.
115
Faul, F., Erdfelder, E., Lang, A.G., and Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191.
Ferraro, M., Palazzolo, J. J., Krol, J., Krebs, H. I., Hogan, N., and Volpe, B. T. (2003). Robot-aided sensorimotor arm training improves outcome in patients with chronic stroke. Neurology, 61(11), 1604-1607.
Friel, K. M. and Nudo, R. J. (1998). Restraint of the unimpaired hand is not sufficient to retain spared primary motor hand representation after focal cortical injury. Society for Neuroscience Abstracts, 24, 405.
Gillen, G. and Burkhardt, A. (2004). Stroke rehabilitation: A function-based approach, 2nd edition. Missouri: Mosby.
Givon, M. and Grosfeld-Nir, A. (2008). Using partially observed Markov processes to select optimal termination time of TV shows. Omega, 36(3), 477-485.
Gowland, C., Stratford, P., Ward, M., Moreland, J., Torresin, W., Van Hullenaar, S., et al. (1993). Measuring physical impairment and disability with the Chedoke-McMaster Stroke Assessment. Stroke, 24(1), 58-63.
Heart and Stroke Foundation of Canada. (2006, June 5). Tipping the scales of progress. Heart Disease and Stroke in Canada, 1-18.
Heart and Stroke Foundation of Canada. (2008). Stroke. Statistics. Retrieved July 21, 2008, from http://www.heartandstroke.com/site/c.ikIQLcMWJtE/b.3483991/k.34A8/ Statistics.htm#stroke
Hidler, J., Nichols, D., Pelliccio, M., and Brady, K. (2005). Advances in the understanding and treatment of stroke impairment using robotic devices. Topics in Stroke Rehabilitation, 12(2), 22-35.
Hoey, J., St-Aubin, R., Hu, A., and Boutilier, C. (1999 July). SPUDD: Stochastic planning using decision diagrams. Paper presented at the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden.
Ju, M. S., Lin, C. C. K., Lin, D. H., Hwang, I. S., and Chen, S. M. (2005). A rehabilitation robot with force-position hybrid fuzzy controller: Hybrid fuzzy control of rehabilitation robot. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(3), 349-358.
Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99-134.
Kahn, L. E., Zygman, M. L., Rymer, W. Z., and Reinkensmeyer, D. J. (2006). Robot-assisted reaching exercise promotes arm movement recovery in chronic hemiparetic stroke: A randomized controlled pilot study. Journal of NeuroEngineering and Rehabilitation, 3(12), 1-13.
116
Krebs, H. I., Ferraro, M., Buerger, S. P., Newbery, M. J., Makiyama, A., Sandmann, M., et al. (2004). Rehabilitation robotics: Pilot trial of a spatial extension for MIT-Manus. Journal of NeuroEngineering and Rehabilitation, 1(5), 1-15.
Krebs, H. I., Hogan, N., Aisen, M. L., and Volpe, B. T. (1998). Robot-aided neurorehabilitation. IEEE Transactions on Rehabilitation Engineering, 6(1), 75-87.
Lam, P. T. Y. (2007). Development of a haptic-robotic platform for moderate level upper-limb stroke rehabilitation. M.A.Sc. thesis, Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, Canada.
Lam, P., Hébert, D., Boger, J., Lacheray, H., Gardner, D., Apkarian, J., et al. (2008). A haptic-robotic platform for upper-imb reaching stroke therapy: Preliminary design and evaluation results. Journal of NeuroEngineering and Rehabilitation, 5(15), 1-13.
Liepert, J., Bauder, H., Miltner, W. H. R., Taub, E., and Weiller, C. (2000). Treatment-induced cortical reorganization after stroke in humans. Stroke, 31, 1210-1216.
Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28, 47-66.
Lum, P. S., Burgar, C. G., Shor, P. C., Majmundar, M., and Van der Loos, M. (2002). Robot-assisted movement training compared with conventional therapy techniques for the rehabilitation of upper-limb motor function after stroke. Archives of Physical Medicine and Rehabilitation, 83(7), 952-959.
Lum, P. S., Burgar, C. G., Van der Loos, M., Shor, P. C., Majmundar, M., and Yap, R. (2006). MIME robotic device for upper-limb neurorehabilitation in subacute stroke subjects: A follow-up study. Journal of Rehabilitation Research and Development, 43(5), 631-642.
MacClellan, L. R., Bradham, D. D., Whitall, J., Volpe, B., Wilson, P. D., Ohlhoff, J., et al. (2005). Robotic upper-limb neurorehabilitation in chronic stroke patients. Journal of Rehabilitation Research and Development, 42(6), 717-722.
McLaughlin, M. L., Hespanha, J. P., and Sukhatme, G. S. (2001). Touch in virtual environments: Haptics and the design of interactive systems. New Jersey: Prentice Hall.
Nudo, R. J. and Milliken, G. W. (1996). Reorganization of movement representations in primary motor cortex following focal ischemic infarcts in adult squirrel monkeys. Journal of Neurophysiology, 75(5), 2144-2149.
Nudo, R. J., Wise, B. M., SiFuentes, F., and Milliken, G. W. (1996). Neural substrates for the effects of rehabilitative training on motor recovery after ischemic infarct. Science, 272(5269), 1791-1794.
Pineau, J., Gordon G., and Thrun, S. (2006). Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 27, 335-380.
117
Pineau, J., Montemerlo, M., Pollack, M., Roy, N., and Thrun, S. (2003). Towards robotic assistants in nursing homes: Challenges and results. Robotics and Autonomous Systems, 42(3-4), 271-281.
Poupart, P. (2005). Exploiting structure to efficiently solve large scale partially observable Markov decision processes. Ph.D. thesis, Department of Computer Science, University of Toronto, Toronto, Canada.
Reinkensmeyer, D. J., Kahn, L. E., Averbuch, M., McKenna-Cole, A., Schmit, B. D., and Rymer, W. Z. (2000). Understanding and treating arm movement impairment after chronic brain injury: Progress with the ARM guide. Journal of Rehabilitation Research and Development, 37(6), 653-662.
Russell, S. and Norvig, P. (2003). Artificial intelligence: A modern approach, 2nd edition. New Jersey: Prentice Hall.
Sondik, E. J. (1971). The optimal control of partially observable Markov decision processes. Ph.D. thesis, Department of Electrical Engineering, University of Stanford, Stanford, California.
Spaan, M. T. J. and Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24, 195-220.
Volpe, B. T., Krebs, H. I., Hogan, N., Edelstein, L., Diels, C., and Aisen, M. (2000). A novel approach to stroke rehabilitation: Robot-aided sensorimotor stimulation. Neurology, 54(10), 1938-1944.
Young, E. (2007, April 7). Tireless, reliable physio-robots take on stroke paralysis. New Scientist, 194(2598), 24-25.
118
Appendix I – Example Construction of a Conditional Probability Table
Conditional probability tables (CPTs) are a way of capturing the transition function of a POMDP model. It describes the probability of each value of a variable occurring at time step, t+1, for a specific action, given the values of influencing variables at the prior time step, t. An example of a CPT is illustrated below. Figure I.1 shows the dynamic Bayesian network (DBN) that describes the relationship of the fat [fat : {yes,no}] and n(none) [n(none) : {none, d1, d2, d3}] variables at time t to fat’ at time t+1 (extracted from the POMDP model in Figure 5.5).
Figure I.1: Relationship of fat and n(none) to fat’
The dynamics for this DBN for the action setd1resnone (sets target distance at d=d1 and resistance level at r=none) are enumerated in the CPT of Table I.1.
Table I.1: CPT for fat’ for action setd1resnone
fat n(none) fat’(yes) fat’(no) yes none 1.0 0.0 yes d1 1.0 0.0 yes d2 1.0 0.0 yes d3 1.0 0.0 no none 0.2 0.8 no d1 0.1 0.9 no d2 0.067 0.933 no d3 0.05 0.95
119
From Table I.1, it can be observed that the instances of the fat(yes) variables are deterministic. This means that these variables will progress to a specific value with certainty, namely that they will transition to a future value with a probability of 1.0. For these instances, fat’(yes) will always have a probability of 1.0 because if fatigued at time t, the user will, for certain, be fatigued at time t+1 regardless of the user’s range. The other instances (i.e. fat(no)) in Table I.1 are probabilistic. The probability of fat’(yes) decreases as the user’s range increases when the action is to set d=d1 and r=none. In other words, as the user’s current range increases beyond that of the set target and resistance, the rate of the user becoming fatigued decreases (i.e. exercise becomes “easier”). It is also important to note that the probabilities for each combination of instantiations of the variables at time t sum to 1.0. The ideas presented in the above example can be expanded to accommodate the interaction of multiple variables found in the full reaching exercise model. CPTs can also be applied to the observation function in the same way.
120
Appendix II – Simulation Examples of the STRENGTHEN and iSTRETCH Models
Simulation 01 (STRENGTHEN Model) Initial Belief State:
ACTION: d=d1; r=none OBSERVATION: OH=yes OT=norm OCT=max OCO=no
ACTION: d=d2; r=none OBSERVATION: OH=yes OT=norm OCT=max OCO=no
121
ACTION: d=d3; r=none OBSERVATION: OH=no OT=none OCT=min OCO=no
ACTION: d=d1; r=none OBSERVATION: OH=yes OT=norm OCT=max OCO=no
ACTION: d=d2; r=none OBSERVATION: OH=yes OT=norm OCT=max OCO=no
122
ACTION: d=d3; r=none OBSERVATION: OH=no OT=none OCT=min OCO=no
ACTION: d=d1; r=none OBSERVATION: OH=yes OT=norm OCT=max OCO=no
ACTION: d=d2; r=none OBSERVATION: OH=yes OT=norm OCT=max OCO=no
123
ACTION: d=d3; r=none OBSERVATION: OH=no OT=none OCT=none OCO=yes
ACTION: d=d1; r=none OBSERVATION: OH=yes OT=norm OCT=max OCO=no
ACTION: d=d2; r=none OBSERVATION: OH=no OT=none OCT=none OCO=yes
124
ACTION: stop
125
Simulation 01 (iSTRETCH Model)
ACTION: d=d1; r=none OBSERVATION: OT=norm OCT=max OCO=no
ACTION: d=d2; r=none OBSERVATION: OT=norm OCT=max OCO=no
126
ACTION: d=d3; r=none OBSERVATION: OT=none OCT=min OCO=no
ACTION: d=d3; r=none OBSERVATION: OT=none OCT=min OCO=no
ACTION: d=d3; r=none OBSERVATION: OT=none OCT=none OCO=yes
127
ACTION: stop
128
Simulation 02 (STRENGTHEN Model) Initial Belief State:
ACTION: d=d1; r=max OBSERVATION: OH=yes OT=norm OCT=max OCO=no
ACTION: d=d3; r=max OBSERVATION: OH=yes OT=norm OCT=max OCO=no
129
ACTION: d=d3; r=max OBSERVATION: OH=yes OT=norm OCT=max OCO=no
ACTION: d=d3; r=max OBSERVATION: OH=yes OT=norm OCT=max OCO=no
ACTION: d=d3; r=max OBSERVATION: OH=yes OT=norm OCT=max OCO=yes
130
ACTION: d=d3; r=max OBSERVATION: OH=yes OT=norm OCT=max OCO=yes
ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=min OCO=yes
ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=min OCO=yes
131
ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=none OCO=yes
*notice the decrease (reverse) in the fatigue level ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=none OCO=yes
ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=none OCO=yes
132
ACTION: d=d3; r=max *keeps producing the same action over and over again since fat=yes keeps decreasing…
133
Simulation 02 (iSTRETCH Model)
ACTION: d=d1; r=max OBSERVATION: OT=norm OCT=max OCO=no
ACTION: d=d2; r=max OBSERVATION: OT=norm OCT=max OCO=no
134
ACTION: d=d2; r=max OBSERVATION: OT=norm OCT=max OCO=no
ACTION: d=d3; r=max OBSERVATION: OT=norm OCT=max OCO=no
ACTION: d=d3; r=max OBSERVATION: OT=norm OCT=max OCO=no
135
ACTION: d=d3; r=max OBSERVATION: OT=norm OCT=max OCO=no
ACTION: d=d3; r=max OBSERVATION: OT=norm OCT=max OCO=yes
ACTION: d=d3; r=max OBSERVATION: OT=norm OCT=max OCO=yes
136
ACTION: stop
137
Appendix III – Micro-controller Software Code /* serialPhotoSensor.ic Handyboard serial communication with Matlab compiled with Interactive C v4.3 requires serial communication with PC running Matlab script send one byte of data from photo sensors status to PC this file is based on seriolio.c described below */ /* serialio.c ;low-level serial I/O for the Handy Board ;also works with the 6.270 board Dr. Fred G. Martin Learning and Epistemology Group Media Laboratory Massachusetts Institute of Technology [email protected] */ /*****************************************************************************/ /* */ /* function declarations */ /* */ /*****************************************************************************/ void disable_pcode_serial() /* disable board handshaking with IC */ /* on the host computer, allowing user */ /* programs to receive serial data */ { poke(0x3c, 1); } /*****************************************************************************/ void enable_pcode_serial() /* enable board handshaking with IC on */ /* the host computer */ { poke(0x3c, 0); } /*****************************************************************************/ void serial_putchar(int c) /* send a serial character. Note: the */ /* program hangs until the character is */ /* sent! There is no timeout! */ { while (!(peek(0x102e) & 0x80)); /* wait until it's okay to send */ poke(0x102f, c); /* send the character */ } /*****************************************************************************/ int serial_getchar() /* read a serial character. Note: the */ /* program hangs until a character is */ /* received! There is no timeout! */ { while (!(peek(0x102e) & 0x20)); /* wait until a character arrives */
138
return (peek(0x102f)); /* return it as an int */ } /*****************************************************************************/ void main() { int phThreshold = 130; int phThresholdRight = 50; int phRight, phLeft, phLow; int serialOut; /* int serialIn; */ printf("Program Started\n"); disable_pcode_serial(); /* disable handshaking to ENABLE serial use */ while(!stop_button()) { serialOut = 0; phRight = analog(2); /* photo sensor behind right shoulder */ phLeft = analog(4); /* photo sensor behind left shoulder */ phLow = analog(6); /* photo sensor behind upper lumbar */ if (phRight < phThresholdRight) /* if right shoulder moved away*/ { serialOut = serialOut + 0b00000001; /* right = first bit*/ } if (phLeft < phThreshold) /* if left shoulder moved away*/ { serialOut = serialOut + 0b00000100; /* left = third bit*/ } if (phLow < phThreshold) /* if lower back moved away*/ { serialOut = serialOut + 0b00000010; /* lower back = second bit*/ } serial_putchar(serialOut); /* serial out the one byte of data */ /* serial_putchar(10); LF terminator NOT NEEDED*/ /* serialIn = serial_getchar(); */ printf("%d\n", serialOut); sleep(0.05); /* wait a while for next cycle */ } /* when Stop is pressed */ serialOut = 0b00001000; serial_putchar(serialOut); /* serial out 8 */ printf("%d\n", serialOut); sleep(1.0); enable_pcode_serial(); printf("Program Terminated\n"); }
139
Appendix IV – Questionnaire for the Therapist (Note: to be completed once at beginning of trials) Date: Participant ID: T## Participant Information Participant ID: T## Gender F / M Profession
Occupational Therapist Physical Therapist Other:
______________________________ Years working with stroke patients in upper-limb rehabilitation _______________
140
(Note: to be completed per session) Start Time: End Time: Date: Participant ID: T##
Evaluation of Decisions Made by Control System Please circle one rating for each question. Given the decisions that the computer system can make:
Disagree Agree a) The decision(s) made during the exercise(s) were appropriate.
1 2 3 4
b) The patient was given an appropriate amount of time to complete each exercise before the next decision was made.
1 2 3 4
Please elaborate if there were any aspects of the decisions made by the system that you felt were especially strong or weak. __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________________
141
(Note: to be completed once at end of trials) Date: Participant ID: T## Overall Evaluation Please answer the following: Do the decisions seem believable? If not, why? ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________ Are there any other decisions you feel the computer system should make? If so, what are they? ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________ Can you envision using this system as a therapy tool? Please comment. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________
142
Can you see this system being used in the clinic, home setting, or both? Please comment. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________ Please elaborate on other comments you might have. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________
143
Appendix V – Questionnaire for the Patient (Note: to be completed once at beginning of trials) Date: Participant ID: P## Participant Information Participant ID: P## Gender F / M Age _______________ Height _______________ Weight _______________ Arm length of affected arm (shoulder to wrist) _______________ Time since stroke occurrence _________________ Height of robot _________________ (to be filled in by CMSA-therapist): CMSA Score at Week 0 _________________ CMSA Score at Week 2 _________________ CMSA Score at Week 4 _________________
144
(Note: to be completed per session) Start Time: End Time: Date: Participant ID: P##
Evaluation of System Please circle one rating for each question.
Very un-comfortable Very
comfortable a) How comfortable did you feel during the exercise?
1 2 3 4
Very unsafe Very safe b) How safe did you feel during the exercise?
1 2 3 4
Not good Very good c) Did you feel you had a good arm workout during this session?
1 2 3 4
Not tired at all Very tired
d) Does your arm feel tired after the exercise?
1 2 3 4
No pain A lot of paine) Do you feel pain in your arm after the exercise?
1 2 3 4
145
Please elaborate on any other comments you might have. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________
146
(Note: to be completed once at end of trials) Date: Participant ID: P## Overall Evaluation Please circle one rating for each question. Very jerky Very
smooth a) How smooth do you find the quality of motion of the robotic device?
1 2 3 4
Not far Very far b) How do you feel regarding how far the robot made you reach?
1 2 3 4
Too little Too much c) How do you feel regarding how much resistance the robot applied?
1 2 3 4
Very different Very alike
d) How closely does the exercise resemble the reaching motion?
1 2 3 4
Very different Very alike
e) How closely does the exercise compare to regular upper-limb therapy?
1 2 3 4
147
No Yes f) Were you able to feel the chair (trunk) sensors during the exercise?
1 2 3 4
Very boring Very interesting
g) How do you feel about the game display on the computer screen?
1 2 3 4
No Yes h) Would you use this robotic system as your primary therapy?
1 2 3 4
Please answer the following: What did you like about the system? __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ What did you not like about the system? __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
148
Is there anything you would like to change about the system? If so, what are they? __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Please elaborate on other comments you might have. __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
149
Appendix VI – Raw Quantitative Data on Decisions Made by POMDP and Therapist
The following data are presented according to session, exercise, and trial. An exercise is defined as a group of trials, where the last trial occurs when the therapist decides to stop the exercise to let the user take a break. Session 01, Exercise 01
Trial #
POMDP Therapist Agree
1 Target 1 2 0 Resistance 0 0 1 Stop No No 1 2 Target 2 3 0 Resistance 0 1 0 Stop No No 1 3 Target 1 3 0 Resistance 2 2 1 Stop No No 1 4 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 6 /10 Target 0 /3 Resistance 2 /3 Stop 4 /4
Session 01, Exercise 02
Trial #
POMDP Therapist Agree
1 Target 1 3 0 Resistance 2 0 0 Stop No No 1 2 Target 1 3 0 Resistance 2 1 0 Stop No No 1 3 Target 1 3 0 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1
150
Stop No No 1 6 Target 3 3 1 Resistance 2 1 0 Stop No No 1 7 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 13 /19 Target 3 /6 Resistance 3 /6 Stop 7 /7
Session 01, Exercise 03
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 0 0 Stop No No 1 2 Target 3 3 1 Resistance 2 1 0 Stop No No 1 3 Target 3 3 1 Resistance 2 1 0 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 2 0 Resistance 2 2 1 Stop No No 1 6 Target n/a 2 Resistance n/a 2 Stop Yes No 0 7 Target n/a 2 Resistance n/a 2 Stop Yes No 0 8 Target n/a 2 Resistance n/a 2 Stop Yes No 0 9 Target n/a 2 Resistance n/a 2 Stop Yes No 0
10 Target n/a 3 Resistance n/a 2 Stop Yes No 0
11 Target n/a 3 Resistance n/a 2 Stop Yes No 0
12 Target n/a 3
151
Resistance n/a 2 Stop Yes No 0
13 Target n/a 2 Resistance n/a 2 Stop Yes No 0
14 Target n/a 1 Resistance n/a 2 Stop Yes No 0
15 Target n/a 1 Resistance n/a 2 Stop Yes No 0
16 Target n/a 1 Resistance n/a 2 Stop Yes No 0
17 Target n/a 2 Resistance n/a 2 Stop Yes No 0
18 Target n/a 2 Resistance n/a 2 Stop Yes No 0
19 Target n/a 2 Resistance n/a 2 Stop Yes No 0
20 Target n/a 2 Resistance n/a 2 Stop Yes No 0
21 Target n/a 3 Resistance n/a 2 Stop Yes No 0
22 Target n/a 3 Resistance n/a 2 Stop Yes No 0
23 Target n/a 3 Resistance n/a 2 Stop Yes No 0
24 Target n/a 3 Resistance n/a 2 Stop Yes No 0
25 Target n/a 2 Resistance n/a 1 Stop Yes No 0
26 Target n/a 3 Resistance n/a 1 Stop Yes No 0
27 Target n/a 3 Resistance n/a 1 Stop Yes No 0
28 Target n/a 2 Resistance n/a 1 Stop Yes No 0
152
29 Target n/a 3 Resistance n/a 1 Stop Yes No 0
30 Target n/a 3 Resistance n/a 0 Stop Yes No 0
31 Target n/a 1 Resistance n/a 0 Stop Yes No 0
32 Target n/a 2 Resistance n/a 0 Stop Yes No 0
33 Target n/a 2 Resistance n/a 2 Stop Yes No 0
34 Target n/a 3 Resistance n/a 2 Stop Yes No 0
35 Target n/a 3 Resistance n/a 2 Stop Yes No 0
36 Target n/a 3 Resistance n/a 2 Stop Yes No 0
37 Target n/a 3 Resistance n/a 2 Stop Yes No 0
38 Target n/a 3 Resistance n/a 2 Stop Yes No 0
39 Target n/a 3 Resistance n/a 2 Stop Yes No 0
40 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 12 /50 Target 4 /5 Resistance 2 /5 Stop 6 /40
Session 01, Exercise 04
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1
153
Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 2 0 Resistance 2 0 0 Stop No No 1 8 Target 3 2 0 Resistance 2 0 0 Stop No No 1 9 Target n/a 2 Resistance n/a 2 Stop Yes No 0
10 Target n/a 3 Resistance n/a 0 Stop Yes No 0
11 Target n/a 3 Resistance n/a 2 Stop Yes No 0
12 Target n/a 3 Resistance n/a 0 Stop Yes No 0
13 Target n/a 1 Resistance n/a 0 Stop Yes No 0
14 Target n/a 2 Resistance n/a 2 Stop Yes No 0
15 Target n/a 3 Resistance n/a 0 Stop Yes No 0
16 Target n/a 2 Resistance n/a 1 Stop Yes No 0
17 Target n/a 1 Resistance n/a 2 Stop Yes No 0
18 Target n/a 2 Resistance n/a 0 Stop Yes No 0
19 Target n/a 3
154
Resistance n/a 2 Stop Yes No 0
20 Target n/a 3 Resistance n/a 2 Stop Yes No 0
21 Target n/a 3 Resistance n/a 2 Stop Yes No 0
22 Target n/a 3 Resistance n/a 2 Stop Yes No 0
23 Target n/a 3 Resistance n/a 2 Stop Yes No 0
24 Target n/a 3 Resistance n/a 2 Stop Yes No 0
25 Target n/a 3 Resistance n/a 2 Stop Yes No 0
26 Target n/a 3 Resistance n/a 2 Stop Yes No 0
27 Target n/a 3 Resistance n/a 2 Stop Yes No 0
28 Target n/a 3 Resistance n/a 2 Stop Yes No 0
29 Target 3 3 Resistance 2 2 Stop No No 0
30 Target 3 3 Resistance 2 2 Stop No No 0
31 Target 3 n/a Resistance 2 n/a Stop No Yes 0 20 /47 Target 6 /8 Resistance 6 /8 Stop 8 /31
Session 02, Exercise 01
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1
155
2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1
10 Target 3 3 1 Resistance 2 2 1 Stop No No 1
11 Target 3 3 1 Resistance 2 2 1 Stop No No 1
12 Target 3 2 0 Resistance 2 0 0 Stop No No 1
13 Target 3 1 0 Resistance 2 2 1 Stop No No 1
14 Target n/a 3 Resistance n/a 0 Stop Yes No 0
15 Target n/a 2 Resistance n/a 1 Stop Yes No 0
16 Target n/a 3 Resistance n/a 2 Stop Yes No 0
17 Target n/a 1 Resistance n/a 0 Stop Yes No 0
18 Target n/a 3 Resistance n/a 0
156
Stop Yes No 0 19 Target n/a 2 Resistance n/a 1 Stop Yes No 0
20 Target n/a 2 Resistance n/a 0 Stop Yes No 0
21 Target n/a 1 Resistance n/a 2 Stop Yes No 0
22 Target n/a 3 Resistance n/a 0 Stop Yes No 0
23 Target n/a 3 Resistance n/a 2 Stop Yes No 0
24 Target n/a 3 Resistance n/a 2 Stop Yes No 0
25 Target n/a 3 Resistance n/a 2 Stop Yes No 0
26 Target n/a 3 Resistance n/a 2 Stop Yes No 0
27 Target n/a 3 Resistance n/a 2 Stop Yes No 0
28 Target n/a 3 Resistance n/a 2 Stop Yes No 0
29 Target n/a 3 Resistance n/a 2 Stop Yes No 0
30 Target n/a 3 Resistance n/a 2 Stop Yes No 0
31 Target n/a 3 Resistance n/a 2 Stop Yes No 0
32 Target n/a 3 Resistance n/a 2 Stop Yes No 0
33 Target n/a 3 Resistance n/a 2 Stop Yes No 0
34 Target n/a 3 Resistance n/a 2 Stop Yes No 0
35 Target n/a 3
157
Resistance n/a 2 Stop Yes No 0
36 Target n/a 3 Resistance n/a 2 Stop Yes No 0
37 Target n/a 3 Resistance n/a 2 Stop Yes No 0
38 Target n/a 3 Resistance n/a 2 Stop Yes No 0
39 Target n/a 3 Resistance n/a 2 Stop Yes No 0
40 Target n/a 3 Resistance n/a 2 Stop Yes No 0
41 Target n/a 3 Resistance n/a 2 Stop Yes No 0
42 Target n/a 3 Resistance n/a 2 Stop Yes No 0
43 Target n/a 3 Resistance n/a 2 Stop Yes No 0
44 Target n/a 3 Resistance n/a 2 Stop Yes No 0
45 Target n/a 3 Resistance n/a 2 Stop Yes No 0
46 Target n/a 3 Resistance n/a 2 Stop Yes No 0
47 Target n/a 3 Resistance n/a 2 Stop Yes No 0
48 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 37 /74 Target 11 /13 Resistance 12 /13 Stop 14 /48
Session 02, Exercise 02
Trial #
POMDP Therapist Agree
158
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 0 0 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 1 0 Stop No No 1 9 Target 3 1 0 Resistance 2 0 0 Stop No No 1
10 Target n/a 2 Resistance n/a 2 Stop Yes No 0
11 Target n/a 1 Resistance n/a 0 Stop Yes No 0
12 Target n/a 3 Resistance n/a 0 Stop Yes No 0
13 Target n/a 3 Resistance n/a 2 Stop Yes No 0
14 Target n/a 3 Resistance n/a 0 Stop Yes No 0
15 Target n/a 2 Resistance n/a 1 Stop Yes No 0
16 Target n/a 1 Resistance n/a 1 Stop Yes No 0
17 Target n/a 3 Resistance n/a 0
159
Stop Yes No 0 18 Target n/a 3 Resistance n/a 0 Stop Yes No 0
19 Target n/a 2 Resistance n/a 1 Stop Yes No 0
20 Target n/a 3 Resistance n/a 1 Stop Yes No 0
21 Target n/a 3 Resistance n/a 2 Stop Yes No 0
22 Target n/a 3 Resistance n/a 2 Stop Yes No 0
23 Target n/a 3 Resistance n/a 2 Stop Yes No 0
24 Target n/a 3 Resistance n/a 2 Stop Yes No 0
25 Target n/a 3 Resistance n/a 2 Stop Yes No 0
26 Target n/a 3 Resistance n/a 2 Stop Yes No 0
27 Target n/a 3 Resistance n/a 2 Stop Yes No 0
28 Target n/a 3 Resistance n/a 2 Stop Yes No 0
29 Target n/a 3 Resistance n/a 2 Stop Yes No 0
30 Target n/a 3 Resistance n/a 2 Stop Yes No 0
31 Target n/a 3 Resistance n/a 2 Stop Yes No 0
32 Target n/a 3 Resistance n/a 2 Stop Yes No 0
33 Target n/a 3 Resistance n/a 2 Stop Yes No 0
34 Target n/a 3
160
Resistance n/a 2 Stop Yes No 0
35 Target n/a 3 Resistance n/a 2 Stop Yes No 0
36 Target n/a 3 Resistance n/a 2 Stop Yes No 0
37 Target n/a 3 Resistance n/a 2 Stop Yes No 0
38 Target n/a 3 Resistance n/a 2 Stop Yes No 0
39 Target n/a 3 Resistance n/a 2 Stop Yes No 0
40 Target n/a 3 Resistance n/a 2 Stop Yes No 0
41 Target n/a 3 Resistance n/a 2 Stop Yes No 0
42 Target n/a 3 Resistance n/a 0 Stop Yes No 0
43 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 24 /61 Target 8 /9 Resistance 6 /9 Stop 10 /43
Session 02, Exercise 03
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1
161
5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 16 /16 Target 5 /5 Resistance 5 /5 Stop 6 /6
Session 02, Exercise 04
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 0 0 Stop No No 1 6 Target 3 2 0 Resistance 2 0 0 Stop No No 1 7 Target 3 1 0 Resistance 2 0 0 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 1 0 Resistance 2 1 0 Stop No No 1
10 Target n/a 1 Resistance n/a 2 Stop Yes No 0
11 Target n/a 2 Resistance n/a 0 Stop Yes No 0
12 Target n/a 3 Resistance n/a 1
162
Stop Yes No 0 13 Target n/a 2 Resistance n/a 0 Stop Yes No 0
14 Target n/a 3 Resistance n/a 0 Stop Yes No 0
15 Target n/a 3 Resistance n/a 2 Stop Yes No 0
16 Target n/a 3 Resistance n/a 0 Stop Yes No 0
17 Target n/a 3 Resistance n/a 2 Stop Yes No 0
18 Target n/a 3 Resistance n/a 2 Stop Yes No 0
19 Target n/a 3 Resistance n/a 2 Stop Yes No 0
20 Target n/a 3 Resistance n/a 2 Stop Yes No 0
21 Target n/a 3 Resistance n/a 2 Stop Yes No 0
22 Target n/a 3 Resistance n/a 2 Stop Yes No 0
23 Target n/a 3 Resistance n/a 2 Stop Yes No 0
24 Target n/a 3 Resistance n/a 2 Stop Yes No 0
25 Target n/a 3 Resistance n/a 2 Stop Yes No 0
26 Target n/a 3 Resistance n/a 2 Stop Yes No 0
27 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 21 /45 Target 6 /9 Resistance 5 /9 Stop 10 /27
163
Session 03, Exercise 01
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target n/a 3 Resistance n/a 2 Stop Yes No 0 9 Target n/a 3 Resistance n/a 2 Stop Yes No 0
10 Target n/a 3 Resistance n/a 2 Stop Yes No 0
11 Target n/a 3 Resistance n/a 2 Stop Yes No 0
12 Target n/a 3 Resistance n/a 2 Stop Yes No 0
13 Target n/a 3 Resistance n/a 0 Stop Yes No 0
14 Target n/a 1 Resistance n/a 1 Stop Yes No 0
15 Target n/a 2 Resistance n/a 0 Stop Yes No 0
164
16 Target n/a 2 Resistance n/a 2 Stop Yes No 0
17 Target n/a 1 Resistance n/a 1 Stop Yes No 0
18 Target n/a 3 Resistance n/a 0 Stop Yes No 0
19 Target n/a 3 Resistance n/a 1 Stop Yes No 0
20 Target n/a 3 Resistance n/a 2 Stop Yes No 0
21 Target n/a 1 Resistance n/a 0 Stop Yes No 0
22 Target n/a 3 Resistance n/a 2 Stop Yes No 0
23 Target n/a 2 Resistance n/a 0 Stop Yes No 0
24 Target n/a 1 Resistance n/a 2 Stop Yes No 0
25 Target n/a 3 Resistance n/a 2 Stop Yes No 0
26 Target n/a 2 Resistance n/a 1 Stop Yes No 0
27 Target n/a 1 Resistance n/a 2 Stop Yes No 0
28 Target n/a 3 Resistance n/a 0 Stop Yes No 0
29 Target n/a 3 Resistance n/a 2 Stop Yes No 0
30 Target n/a 3 Resistance n/a 2 Stop Yes No 0
31 Target n/a 3 Resistance n/a 2 Stop Yes No 0
32 Target n/a 3 Resistance n/a 2
165
Stop Yes No 0 33 Target n/a 3 Resistance n/a 2 Stop Yes No 0
34 Target n/a 3 Resistance n/a 2 Stop Yes No 0
35 Target n/a 3 Resistance n/a 2 Stop Yes No 0
36 Target n/a 3 Resistance n/a 2 Stop Yes No 0
37 Target n/a 3 Resistance n/a 2 Stop Yes No 0
38 Target n/a 3 Resistance n/a 2 Stop Yes No 0
39 Target n/a 3 Resistance n/a 0 Stop Yes No 0
40 Target n/a 1 Resistance n/a 0 Stop Yes No 0
41 Target n/a 1 Resistance n/a 2 Stop Yes No 0
42 Target n/a 3 Resistance n/a 1 Stop Yes No 0
43 Target n/a 2 Resistance n/a 0 Stop Yes No 0
44 Target n/a 3 Resistance n/a 2 Stop Yes No 0
45 Target n/a 1 Resistance n/a 0 Stop Yes No 0
46 Target n/a 2 Resistance n/a 2 Stop Yes No 0
47 Target n/a 3 Resistance n/a 2 Stop Yes No 0
48 Target n/a 3 Resistance n/a 0 Stop Yes No 0
49 Target n/a 1
166
Resistance n/a 0 Stop Yes No 0
50 Target n/a 3 Resistance n/a 1 Stop Yes No 0
51 Target n/a 2 Resistance n/a 2 Stop Yes No 0
52 Target n/a 3 Resistance n/a 2 Stop Yes No 0
53 Target n/a 3 Resistance n/a 2 Stop Yes No 0
54 Target n/a 3 Resistance n/a 0 Stop Yes No 0
55 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 22 /69 Target 7 /7 Resistance 7 /7 Stop 8 /55
Session 03, Exercise 02
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1
167
8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1
10 Target 3 3 1 Resistance 2 2 1 Stop No No 1
11 Target 3 3 1 Resistance 2 2 1 Stop No No 1
12 Target 3 3 1 Resistance 2 2 1 Stop No No 1
13 Target 3 3 1 Resistance 2 2 1 Stop No No 1
14 Target 3 3 1 Resistance 2 2 1 Stop No No 1
15 Target 3 3 1 Resistance 2 2 1 Stop No No 1
16 Target 3 3 1 Resistance 2 2 1 Stop No No 1
17 Target 3 3 1 Resistance 2 0 0 Stop No No 1
18 Target n/a 3 Resistance n/a 2 Stop Yes No 0
19 Target n/a 3 Resistance n/a 1 Stop Yes No 0
20 Target n/a 3 Resistance n/a 2 Stop Yes No 0
21 Target n/a 3 Resistance n/a 2 Stop Yes No 0
22 Target 3 3 1 Resistance 2 0 0 Stop No No 1
23 Target n/a 3 Resistance n/a 0 Stop Yes No 0
24 Target n/a 3 Resistance n/a 2
168
Stop Yes No 0 25 Target n/a 3 Resistance n/a 2 Stop Yes No 0
26 Target n/a 3 Resistance n/a 0 Stop Yes No 0
27 Target n/a 2 Resistance n/a 0 Stop Yes No 0
28 Target n/a 2 Resistance n/a 2 Stop Yes No 0
29 Target n/a 3 Resistance n/a 2 Stop Yes No 0
30 Target n/a 3 Resistance n/a 2 Stop Yes No 0
31 Target n/a 3 Resistance n/a 2 Stop Yes No 0
32 Target n/a 3 Resistance n/a 2 Stop Yes No 0
33 Target n/a 3 Resistance n/a 2 Stop Yes No 0
34 Target n/a 3 Resistance n/a 2 Stop Yes No 0
35 Target n/a 3 Resistance n/a 2 Stop Yes No 0
36 Target n/a 3 Resistance n/a 2 Stop Yes No 0
37 Target n/a 3 Resistance n/a 2 Stop Yes No 0
38 Target n/a 3 Resistance n/a 2 Stop Yes No 0
39 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 53 /75 Target 18 /18 Resistance 16 /18 Stop 19 /39
169
Session 04, Exercise 01
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target n/a 3 Resistance n/a 2 Stop Yes No 0
10 Target n/a 3 Resistance n/a 2 Stop Yes No 0
11 Target n/a 3 Resistance n/a 2 Stop Yes No 0
12 Target n/a 1 Resistance n/a 0 Stop Yes No 0
13 Target n/a 2 Resistance n/a 2 Stop Yes No 0
14 Target n/a 3 Resistance n/a 1 Stop Yes No 0
15 Target n/a 3 Resistance n/a 2 Stop Yes No 0
170
16 Target n/a 1 Resistance n/a 0 Stop Yes No 0
17 Target n/a 3 Resistance n/a 0 Stop Yes No 0
18 Target n/a 2 Resistance n/a 0 Stop Yes No 0
19 Target n/a 3 Resistance n/a 2 Stop Yes No 0
20 Target n/a 1 Resistance n/a 1 Stop Yes No 0
21 Target n/a 2 Resistance n/a 0 Stop Yes No 0
22 Target n/a 3 Resistance n/a 0 Stop Yes No 0
23 Target n/a 1 Resistance n/a 1 Stop Yes No 0
24 Target n/a 3 Resistance n/a 2 Stop Yes No 0
25 Target n/a 3 Resistance n/a 2 Stop Yes No 0
26 Target n/a 2 Resistance n/a 2 Stop Yes No 0
27 Target n/a 1 Resistance n/a 0 Stop Yes No 0
28 Target n/a 3 Resistance n/a 2 Stop Yes No 0
29 Target n/a 1 Resistance n/a 1 Stop Yes No 0
30 Target n/a 2 Resistance n/a 1 Stop Yes No 0
31 Target n/a 3 Resistance n/a 2 Stop Yes No 0
32 Target n/a 3 Resistance n/a 2
171
Stop Yes No 0 33 Target n/a 3 Resistance n/a 2 Stop Yes No 0
34 Target n/a 3 Resistance n/a 2 Stop Yes No 0
35 Target n/a 3 Resistance n/a 2 Stop Yes No 0
36 Target n/a 3 Resistance n/a 2 Stop Yes No 0
37 Target n/a 3 Resistance n/a 2 Stop Yes No 0
38 Target n/a 3 Resistance n/a 2 Stop Yes No 0
39 Target n/a 3 Resistance n/a 2 Stop Yes No 0
40 Target n/a 3 Resistance n/a 2 Stop Yes No 0
41 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 25 /57 Target 8 /8 Resistance 8 /8 Stop 9 /41
Session 04, Exercise 02
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1
172
Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1
10 Target 3 3 1 Resistance 2 2 1 Stop No No 1
11 Target 3 3 1 Resistance 2 2 1 Stop No No 1
12 Target 3 3 1 Resistance 2 2 1 Stop No No 1
13 Target 3 3 1 Resistance 2 2 1 Stop No No 1
14 Target 3 3 1 Resistance 2 2 1 Stop No No 1
15 Target 3 3 1 Resistance 2 2 1 Stop No No 1
16 Target 3 3 1 Resistance 2 2 1 Stop No No 1
17 Target 3 3 1 Resistance 2 2 1 Stop No No 1
18 Target 3 3 1 Resistance 2 2 1 Stop No No 1
19 Target 3 3 1 Resistance 2 2 1 Stop No No 1
20 Target 3 3 1 Resistance 2 2 1 Stop No No 1
21 Target 3 3 1 Resistance 2 2 1 Stop No No 1
173
22 Target 3 3 1 Resistance 2 2 1 Stop No No 1
23 Target 3 3 1 Resistance 2 2 1 Stop No No 1
24 Target 3 3 1 Resistance 2 2 1 Stop No No 1
25 Target 3 3 1 Resistance 2 2 1 Stop No No 1
26 Target 3 3 1 Resistance 2 2 1 Stop No No 1
27 Target 3 3 1 Resistance 2 2 1 Stop No No 1
28 Target 3 3 1 Resistance 2 0 0 Stop No No 1
29 Target n/a 3 Resistance n/a 0 Stop Yes No 0
30 Target n/a 3 Resistance n/a 2 Stop Yes No 0
31 Target n/a 3 Resistance n/a 1 Stop Yes No 0
32 Target n/a 3 Resistance n/a 2 Stop Yes No 0
33 Target n/a 3 Resistance n/a 0 Stop Yes No 0
34 Target n/a 3 Resistance n/a 2 Stop Yes No 0
35 Target n/a 1 Resistance n/a 2 Stop Yes No 0
36 Target n/a 2 Resistance n/a 0 Stop Yes No 0
37 Target n/a 1 Resistance n/a 0 Stop Yes No 0
38 Target n/a 2 Resistance n/a 1
174
Stop Yes No 0 39 Target n/a 2 Resistance n/a 2 Stop Yes No 0
40 Target n/a 1 Resistance n/a 2 Stop Yes No 0
41 Target n/a 2 Resistance n/a 2 Stop Yes No 0
42 Target n/a 3 Resistance n/a 0 Stop Yes No 0
43 Target n/a 3 Resistance n/a 2 Stop Yes No 0
44 Target n/a 3 Resistance n/a 0 Stop Yes No 0
45 Target n/a 3 Resistance n/a 1 Stop Yes No 0
46 Target n/a 3 Resistance n/a 2 Stop Yes No 0
47 Target n/a 3 Resistance n/a 2 Stop Yes No 0
48 Target n/a 3 Resistance n/a 2 Stop Yes No 0
49 Target n/a 3 Resistance n/a 2 Stop Yes No 0
50 Target n/a 3 Resistance n/a 2 Stop Yes No 0
51 Target n/a 3 Resistance n/a 2 Stop Yes No 0
52 Target n/a 3 Resistance n/a 2 Stop Yes No 0
53 Target n/a 3 Resistance n/a 2 Stop Yes No 0
54 Target n/a 3 Resistance n/a 2 Stop Yes No 0
55 Target n/a n/a
175
Resistance n/a n/a Stop Yes Yes 1 84 /111 Target 28 /28 Resistance 27 /28 Stop 29 /55
Session 04, Exercise 03
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1
10 Target n/a 3 Resistance n/a 2 Stop Yes No 0
11 Target n/a 3 Resistance n/a 2 Stop Yes No 0
12 Target n/a 3 Resistance n/a 2 Stop Yes No 0
13 Target n/a 3 Resistance n/a 2 Stop Yes No 0
176
14 Target n/a 3 Resistance n/a 2 Stop Yes No 0
15 Target n/a 3 Resistance n/a 2 Stop Yes No 0
16 Target n/a 3 Resistance n/a 2 Stop Yes No 0
17 Target n/a 3 Resistance n/a 2 Stop Yes No 0
18 Target n/a 3 Resistance n/a 2 Stop Yes No 0
19 Target n/a 3 Resistance n/a 2 Stop Yes No 0
20 Target 3 3 1 Resistance 2 2 1 Stop No No 1
21 Target 3 3 1 Resistance 2 2 1 Stop No No 1
22 Target 3 3 1 Resistance 2 2 1 Stop No No 1
23 Target 3 3 1 Resistance 2 2 1 Stop No No 1
24 Target 3 3 1 Resistance 2 2 1 Stop No No 1
25 Target 3 3 1 Resistance 2 2 1 Stop No No 1
26 Target 3 3 1 Resistance 2 2 1 Stop No No 1
27 Target 3 3 1 Resistance 2 2 1 Stop No No 1
28 Target 3 3 1 Resistance 2 2 1 Stop No No 1
29 Target 3 3 1 Resistance 2 2 1 Stop No No 1
30 Target 3 3 1 Resistance 2 2 1
177
Stop No No 1 31 Target 3 3 1 Resistance 2 2 1 Stop No No 1
32 Target 3 3 1 Resistance 2 2 1 Stop No No 1
33 Target 3 3 1 Resistance 2 2 1 Stop No No 1
34 Target 3 n/a Resistance 2 n/a Stop No Yes 0 69 /80 Target 23 /23 Resistance 23 /23 Stop 23 /34
Session 05, Exercise 01
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1
10 Target 3 3 1
178
Resistance 2 2 1 Stop No No 1
11 Target 3 3 1 Resistance 2 2 1 Stop No No 1
12 Target 3 3 1 Resistance 2 2 1 Stop No No 1
13 Target 3 3 1 Resistance 2 2 1 Stop No No 1
14 Target 3 3 1 Resistance 2 2 1 Stop No No 1
15 Target 3 3 1 Resistance 2 2 1 Stop No No 1
16 Target 3 3 1 Resistance 2 2 1 Stop No No 1
17 Target 3 3 1 Resistance 2 2 1 Stop No No 1
18 Target 3 3 1 Resistance 2 2 1 Stop No No 1
19 Target 3 3 1 Resistance 2 2 1 Stop No No 1
20 Target 3 3 1 Resistance 2 2 1 Stop No No 1
21 Target 3 3 1 Resistance 2 2 1 Stop No No 1
22 Target 3 3 1 Resistance 2 2 1 Stop No No 1
23 Target 3 3 1 Resistance 2 2 1 Stop No No 1
24 Target 3 3 1 Resistance 2 2 1 Stop No No 1
25 Target 3 3 1 Resistance 2 2 1 Stop No No 1
26 Target 3 3 1 Resistance 2 2 1 Stop No No 1
179
27 Target 3 3 1 Resistance 2 2 1 Stop No No 1
28 Target 3 3 1 Resistance 2 2 1 Stop No No 1
29 Target 3 3 1 Resistance 2 0 0 Stop No No 1
30 Target n/a 3 Resistance n/a 1 Stop Yes No 0
31 Target n/a 1 Resistance n/a 1 Stop Yes No 0
32 Target n/a 2 Resistance n/a 0 Stop Yes No 0
33 Target n/a 2 Resistance n/a 2 Stop Yes No 0
34 Target n/a 1 Resistance n/a 2 Stop Yes No 0
35 Target n/a 3 Resistance n/a 0 Stop Yes No 0
36 Target n/a 3 Resistance n/a 0 Stop Yes No 0
37 Target n/a 2 Resistance n/a 2 Stop Yes No 0
38 Target n/a 1 Resistance n/a 1 Stop Yes No 0
39 Target n/a 1 Resistance n/a 2 Stop Yes No 0
40 Target n/a 1 Resistance n/a 0 Stop Yes No 0
41 Target n/a 3 Resistance n/a 0 Stop Yes No 0
42 Target n/a 2 Resistance n/a 2 Stop Yes No 0
43 Target n/a 3 Resistance n/a 1
180
Stop Yes No 0 44 Target n/a 2 Resistance n/a 0 Stop Yes No 0
45 Target n/a 3 Resistance n/a 2 Stop Yes No 0
46 Target n/a 3 Resistance n/a 2 Stop Yes No 0
47 Target n/a 3 Resistance n/a 2 Stop Yes No 0
48 Target n/a 3 Resistance n/a 2 Stop Yes No 0
49 Target n/a 3 Resistance n/a 2 Stop Yes No 0
50 Target n/a 3 Resistance n/a 2 Stop Yes No 0
51 Target n/a 3 Resistance n/a 2 Stop Yes No 0
52 Target n/a 3 Resistance n/a 2 Stop Yes No 0
53 Target n/a 3 Resistance n/a 2 Stop Yes No 0
54 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 87 /112 Target 29 /29 Resistance 28 /29 Stop 30 /54
Session 05, Exercise 02
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1
181
Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1
10 Target 3 3 1 Resistance 2 2 1 Stop No No 1
11 Target 3 3 1 Resistance 2 2 1 Stop No No 1
12 Target 3 3 1 Resistance 2 2 1 Stop No No 1
13 Target 3 3 1 Resistance 2 2 1 Stop No No 1
14 Target 3 3 1 Resistance 2 2 1 Stop No No 1
15 Target 3 3 1 Resistance 2 2 1 Stop No No 1
16 Target 3 3 1 Resistance 2 2 1 Stop No No 1
17 Target 3 3 1 Resistance 2 2 1 Stop No No 1
18 Target 3 3 1 Resistance 2 2 1 Stop No No 1
19 Target 3 3 1 Resistance 2 2 1 Stop No No 1
182
20 Target 3 3 1 Resistance 2 2 1 Stop No No 1
21 Target 3 3 1 Resistance 2 2 1 Stop No No 1
22 Target 3 3 1 Resistance 2 2 1 Stop No No 1
23 Target 3 3 1 Resistance 2 2 1 Stop No No 1
24 Target 3 3 1 Resistance 2 2 1 Stop No No 1
25 Target 3 3 1 Resistance 2 2 1 Stop No No 1
26 Target 3 3 1 Resistance 2 0 0 Stop No No 1
27 Target n/a 1 Resistance n/a 2 Stop Yes No 0
28 Target n/a 2 Resistance n/a 1 Stop Yes No 0
29 Target n/a 3 Resistance n/a 0 Stop Yes No 0
30 Target n/a 2 Resistance n/a 0 Stop Yes No 0
31 Target n/a 2 Resistance n/a 2 Stop Yes No 0
32 Target n/a 1 Resistance n/a 2 Stop Yes No 0
33 Target n/a 3 Resistance n/a 1 Stop Yes No 0
34 Target n/a 1 Resistance n/a 0 Stop Yes No 0
35 Target n/a 2 Resistance n/a 2 Stop Yes No 0
36 Target n/a 3 Resistance n/a 0
183
Stop Yes No 0 37 Target n/a 3 Resistance n/a 2 Stop Yes No 0
38 Target n/a 3 Resistance n/a 0 Stop Yes No 0
39 Target n/a 3 Resistance n/a 2 Stop Yes No 0
40 Target n/a 3 Resistance n/a 1 Stop Yes No 0
41 Target n/a 3 Resistance n/a 2 Stop Yes No 0
42 Target n/a 3 Resistance n/a 2 Stop Yes No 0
43 Target n/a 3 Resistance n/a 2 Stop Yes No 0
44 Target n/a 3 Resistance n/a 2 Stop Yes No 0
45 Target n/a 3 Resistance n/a 2 Stop Yes No 0
46 Target n/a 3 Resistance n/a 2 Stop Yes No 0
47 Target n/a 3 Resistance n/a 2 Stop Yes No 0
48 Target n/a 3 Resistance n/a 2 Stop Yes No 0
49 Target n/a 3 Resistance n/a 2 Stop Yes No 0
50 Target n/a 3 Resistance n/a 2 Stop Yes No 0
51 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 78 /103 Target 26 /26 Resistance 25 /26 Stop 27 /51
184
Session 06, Exercise 01
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1
10 Target 3 3 1 Resistance 2 2 1 Stop No No 1
11 Target 3 3 1 Resistance 2 2 1 Stop No No 1
12 Target 3 3 1 Resistance 2 2 1 Stop No No 1
13 Target 3 3 1 Resistance 2 2 1 Stop No No 1
14 Target 3 3 1 Resistance 2 2 1 Stop No No 1
15 Target 3 3 1 Resistance 2 2 1 Stop No No 1
185
16 Target 3 3 1 Resistance 2 2 1 Stop No No 1
17 Target 3 3 1 Resistance 2 2 1 Stop No No 1
18 Target 3 3 1 Resistance 2 2 1 Stop No No 1
19 Target 3 3 1 Resistance 2 2 1 Stop No No 1
20 Target 3 3 1 Resistance 2 2 1 Stop No No 1
21 Target 3 3 1 Resistance 2 2 1 Stop No No 1
22 Target 3 3 1 Resistance 2 2 1 Stop No No 1
23 Target 3 3 1 Resistance 2 2 1 Stop No No 1
24 Target 3 3 1 Resistance 2 2 1 Stop No No 1
25 Target 3 3 1 Resistance 2 2 1 Stop No No 1
26 Target 3 3 1 Resistance 2 2 1 Stop No No 1
27 Target 3 3 1 Resistance 2 2 1 Stop No No 1
28 Target 3 3 1 Resistance 2 2 1 Stop No No 1
29 Target 3 3 1 Resistance 2 2 1 Stop No No 1
30 Target 3 3 1 Resistance 2 2 1 Stop No No 1
31 Target 3 3 1 Resistance 2 2 1 Stop No No 1
32 Target 3 3 1 Resistance 2 2 1
186
Stop No No 1 33 Target 3 3 1 Resistance 2 0 0 Stop No No 1
34 Target n/a 2 Resistance n/a 0 Stop Yes No 0
35 Target n/a 1 Resistance n/a 2 Stop Yes No 0
36 Target n/a 3 Resistance n/a 0 Stop Yes No 0
37 Target n/a 2 Resistance n/a 1 Stop Yes No 0
38 Target n/a 1 Resistance n/a 0 Stop Yes No 0
39 Target n/a 1 Resistance n/a 1 Stop Yes No 0
40 Target n/a 1 Resistance n/a 0 Stop Yes No 0
41 Target n/a 1 Resistance n/a 0 Stop Yes No 0
42 Target n/a 3 Resistance n/a 0 Stop Yes No 0
43 Target n/a 2 Resistance n/a 1 Stop Yes No 0
44 Target n/a 3 Resistance n/a 2 Stop Yes No 0
45 Target n/a 3 Resistance n/a 2 Stop Yes No 0
46 Target n/a 3 Resistance n/a 2 Stop Yes No 0
47 Target n/a 3 Resistance n/a 2 Stop Yes No 0
48 Target n/a 3 Resistance n/a 2 Stop Yes No 0
49 Target n/a 3
187
Resistance n/a 2 Stop Yes No 0
50 Target n/a 3 Resistance n/a 2 Stop Yes No 0
51 Target n/a 3 Resistance n/a 2 Stop Yes No 0
52 Target n/a 3 Resistance n/a 2 Stop Yes No 0
53 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 99 /119 Target 33 /33 Resistance 32 /33 Stop 34 /53
Session 06, Exercise 02
Trial #
POMDP Therapist Agree
1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target n/a 3 Resistance n/a 2 Stop Yes No 0
188
10 Target 3 3 1 Resistance 2 2 1 Stop No No 1
11 Target 3 3 1 Resistance 2 2 1 Stop No No 1
12 Target 3 3 1 Resistance 2 2 1 Stop No No 1
13 Target 3 3 1 Resistance 2 2 1 Stop No No 1
14 Target 3 3 1 Resistance 2 2 1 Stop No No 1
15 Target 3 3 1 Resistance 2 2 1 Stop No No 1
16 Target 3 3 1 Resistance 2 2 1 Stop No No 1
17 Target 3 3 1 Resistance 2 2 1 Stop No No 1
18 Target 3 3 1 Resistance 2 2 1 Stop No No 1
19 Target 3 3 1 Resistance 2 2 1 Stop No No 1
20 Target 3 3 1 Resistance 2 2 1 Stop No No 1
21 Target 3 3 1 Resistance 2 2 1 Stop No No 1
22 Target 3 3 1 Resistance 2 2 1 Stop No No 1
23 Target 3 3 1 Resistance 2 2 1 Stop No No 1
24 Target 3 3 1 Resistance 2 2 1 Stop No No 1
25 Target 3 3 1 Resistance 2 2 1 Stop No No 1
26 Target 3 3 1 Resistance 2 2 1
189
Stop No No 1 27 Target 3 3 1 Resistance 2 2 1 Stop No No 1
28 Target 3 3 1 Resistance 2 2 1 Stop No No 1
29 Target 3 3 1 Resistance 2 2 1 Stop No No 1
30 Target 3 3 1 Resistance 2 0 0 Stop No No 1
31 Target n/a 3 Resistance n/a 0 Stop Yes No 0
32 Target n/a 3 Resistance n/a 0 Stop Yes No 0
33 Target n/a 1 Resistance n/a 0 Stop Yes No 0
34 Target n/a 2 Resistance n/a 2 Stop Yes No 0
35 Target n/a 2 Resistance n/a 0 Stop Yes No 0
36 Target n/a 3 Resistance n/a 2 Stop Yes No 0
37 Target n/a 3 Resistance n/a 0 Stop Yes No 0
38 Target n/a 3 Resistance n/a 2 Stop Yes No 0
39 Target n/a 3 Resistance n/a 2 Stop Yes No 0
40 Target n/a 3 Resistance n/a 2 Stop Yes No 0
41 Target n/a 3 Resistance n/a 2 Stop Yes No 0
42 Target n/a 3 Resistance n/a 2 Stop Yes No 0
43 Target n/a 3
190
Resistance n/a 2 Stop Yes No 0
44 Target n/a 3 Resistance n/a 2 Stop Yes No 0
45 Target n/a 3 Resistance n/a 2 Stop Yes No 0
46 Target n/a 3 Resistance n/a 2 Stop Yes No 0
47 Target n/a 3 Resistance n/a 2 Stop Yes No 0
48 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 87 /106 Target 29 /29 Resistance 28 /29 Stop 30 /48
191
Appendix VII – Raw Quantitative and Qualitative Data on Therapist’s Ratings Per Session
Session Question a) Question b) Other Comments
01 2 4
High-level patient therefore wanted to randomise some parts to work on increasing control with different targets and different resistances. Then wanted to work on strengthening therefore furthest target most resistance <- this is what the computer wanted to do at the beginning before the control part.
02 3 3
Decisions at beginning were good (target 3 and max resistance) but then I wanted to do some more random targets/resistances. As [the patient] was high level, would have liked perhaps to increase speed.
03 3 3
As per [last session]. Would like to be able to randomise. Perhaps increase options to be able to increase speed of performance, change of angle…
04 3 3
It seemed to ask for more reps at higher level before stopping today which I thought was appropriate for this particular patient.
05 3 3 Zero change from yesterday. 06 3 3 (left blank)