Design of an Adaptive System for Upper-Limb Stroke Rehabilitation · 2010-02-08 · iii Acknowledgments First and foremost, I would like to thank my supervisor, Dr. Alex Mihailidis,

Design of an Adaptive System for Upper-Limb Stroke Rehabilitation

by

Patricia Wai Ling Kan

A thesis submitted in conformity with the requirements for the degree of Master of Applied Science

Institute of Biomaterials and Biomedical Engineering University of Toronto

© Copyright by Patricia Wai Ling Kan 2008

ii

Design of an Adaptive System for Upper-Limb Stroke

Rehabilitation

Patricia Wai Ling Kan

Master of Applied Science

Institute of Biomaterials and Biomedical Engineering University of Toronto

2008

Abstract

Stroke is the primary cause of adult disability. To support this large population in recovery,

robotic technologies are being developed to assist in the delivery of rehabilitation. A partially

observable Markov decision process (POMDP) system was designed for a rehabilitation robotic

device that guides stroke patients through an upper-limb reaching task. The performance of the

POMDP system was evaluated by comparing the decisions made by the POMDP system with

those of a human therapist. Overall, the therapist agreed with the POMDP decisions

approximately 65% of the time. The therapist thought the POMDP decisions were believable

and could envision this system being used in both the clinic and home. The patient would use

this system as the primary method of rehabilitation. Limitations of the current system have been

identified which require improvement in future research stages. This research has shown that

POMDPs have promising potential to facilitate upper extremity rehabilitation.

iii

Acknowledgments First and foremost, I would like to thank my supervisor, Dr. Alex Mihailidis, for his continuous

advice, guidance, and support throughout the project. I thank Debbie Hébert for sharing her

expertise in the field of occupational therapy, especially in the area of upper-limb stroke

rehabilitation. I am especially grateful to Dr. Jesse Hoey for teaching me all I know about

POMDPs, as well as his endless patience and assistance in the construction of the POMDP

model used in this project – all while living overseas! Thank you to Dr. Jacob Apkarian, Hervé

Lacheray, and Don Gardner from Quanser Inc. for all their technical support on the robotic

device and virtual environment. I also thank the members of my thesis committee, Dr. Milos

Popovic, Dr. Craig Boutilier, and Dr. Tom Chau, for their time and helpful comments. And, of

course, a big thank you to everyone in the IATSL lab for their help and friendship, especially Jen

Boger for her constant advice and assistance!

To my girls, thanks for all your support over these past few years and for keeping me sane! A

special thanks to my fiancé, Michael Liau, for encouraging me to pursue my Master’s degree

even though we’d be apart, and for always believing in me. I also want to thank my family –

Mom, Dad, Christine – for their unfailing love, support, and encouragement. I love you all!

I would like to recognise the CITO-Precarn Alliance Program and Quanser Inc. for funding this

project. Finally, I would like to thank my only therapist-patient pair at TRI for participating in

the study and providing some insight into the future development of this rehabilitation system.

iv

Table of Contents ABSTRACT ................................................................................................................................................................II

ACKNOWLEDGMENTS........................................................................................................................................ III

TABLE OF CONTENTS ......................................................................................................................................... IV

LIST OF TABLES...................................................................................................................................................VII

LIST OF FIGURES............................................................................................................................................... VIII

LIST OF APPENDICES.........................................................................................................................................XII

LIST OF ACRONYMS......................................................................................................................................... XIII

LIST OF SYMBOLS............................................................................................................................................... XV

CHAPTER 1 INTRODUCTION..............................................................................................................................1

1.1 PROBLEM STATEMENT.....................................................................................................................................1 1.2 OBJECTIVES .....................................................................................................................................................3 1.3 RESEARCH QUESTIONS ....................................................................................................................................3 1.4 SCOPE OF RESEARCH .......................................................................................................................................3

1.4.1 Development of the Intelligent System ....................................................................................................5 1.4.2 Integration of the POMDP Model with the Robotic System....................................................................5 1.4.3 Development of the Evaluation Study......................................................................................................5 1.4.4 Conducting the Evaluation Study ............................................................................................................5

1.5 CONTRIBUTIONS ..............................................................................................................................................5

CHAPTER 2 BACKGROUND ................................................................................................................................7

2.1 STROKE............................................................................................................................................................7 2.2 STROKE RECOVERY .........................................................................................................................................7 2.3 REHABILITATION OF MOTOR SKILLS ...............................................................................................................8 2.4 ROLE OF THERAPISTS.....................................................................................................................................10 2.5 REDUCING HEALTH CARE AND THERAPIST BURDEN .....................................................................................11

CHAPTER 3 LITERATURE REVIEW................................................................................................................12

3.1 CURRENT REHABILITATION ROBOTIC SYSTEMS FOR UPPER EXTREMITIES ....................................................12 3.2 DISCUSSION ...................................................................................................................................................19

CHAPTER 4 PARTIALLY OBSERVABLE MARKOV DECISION PROCESS.............................................21

4.1 ARTIFICIAL INTELLIGENCE ............................................................................................................................21 4.2 DEFINITION OF PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES ...................................................22

4.2.1 Components...........................................................................................................................................22

v

4.2.2 Acting Optimally ...................................................................................................................................24 4.2.2.1 Computing the Belief State ............................................................................................................................. 25

4.2.3 Finding the Optimal Policy: Value Iteration.........................................................................................26 4.3 EXAMPLES OF POMDPS IN REAL-WORLD APPLICATIONS.............................................................................29 4.4 JUSTIFICATION FOR USING A POMDP TO MODEL REACHING REHABILITATION ............................................30

CHAPTER 5 DESIGN OF THE POMDP REACHING EXERCISE MODEL.................................................33

5.1 REQUIREMENTS SPECIFICATION.....................................................................................................................33 5.1.1 Definition of the Reaching Exercise ......................................................................................................33 5.1.2 Development of the Robotic System ......................................................................................................36 5.1.3 Definition of the POMDP Model...........................................................................................................39

5.2 STRENGTHEN MODEL................................................................................................................................40 5.2.1 Definition of the Variables ....................................................................................................................41 5.2.2 Definition of the Actions........................................................................................................................43 5.2.3 Definition of the Observation Variables and Observation Function.....................................................44 5.2.4 Definition of the Transition Function....................................................................................................44 5.2.5 Definition of the Reward Function ........................................................................................................46 5.2.6 Computation of the STRENGTHEN Model ...........................................................................................50

5.2.6.1 Selection of the Solution Method.................................................................................................................... 50 5.2.6.2 Iteration Process and Solving the Model......................................................................................................... 53

5.3 ISTRETCH MODEL.......................................................................................................................................54 5.3.1 Definition of the Variables ....................................................................................................................55 5.3.2 Definition of the Actions........................................................................................................................57 5.3.3 Definition of the Observation Variables and Observation Function.....................................................58 5.3.4 Definition of the Transition Function....................................................................................................58 5.3.5 Definition of the Reward Function ........................................................................................................63 5.3.6 Computation of the iSTRETCH Model ..................................................................................................65

5.3.6.1 Selection of the Solution Method.................................................................................................................... 65 5.3.6.2 Iteration Process and Solving the Model......................................................................................................... 65

5.4 COMPARISON OF STRENGTHEN AND ISTRETCH MODELS........................................................................66

CHAPTER 6 INTEGRATION OF THE POMDP MODEL WITH THE ROBOTIC SYSTEM.....................81

6.1 ACQUISITION OF DATA FROM THE ROBOTIC SYSTEM.....................................................................................82 6.2 SETTING THE VALUE RANGES FOR THE OBSERVATION VARIABLES...............................................................84 6.3 MERGING THE POMDP AGENT WITH THE ROBOTIC DEVICE CONTROLLER...................................................85

CHAPTER 7 EVALUATION STUDY ..................................................................................................................88

7.1 QUESTIONS TO BE ANSWERED BY THE STUDY ...............................................................................................88 7.2 PARTICIPANTS................................................................................................................................................88 7.3 TESTING METHODOLOGY ..............................................................................................................................90

vi

7.4 MODIFICATION OF INTEGRATED SYSTEM.......................................................................................................92 7.5 CAPTURING DECISIONS MADE BY POMDP AND THERAPIST .........................................................................95 7.6 QUESTIONNAIRE ............................................................................................................................................96

7.6.1 Questionnaire for Therapists.................................................................................................................96 7.6.2 Questionnaire for Patients ....................................................................................................................97

7.7 ETHICS APPROVAL.........................................................................................................................................97

CHAPTER 8 RESULTS .........................................................................................................................................98

8.1 SUBJECT DATA ..............................................................................................................................................98 8.2 DECISIONS FROM POMDP AND THERAPIST...................................................................................................99 8.3 QUESTIONNAIRE DATA ................................................................................................................................102

CHAPTER 9 DISCUSSION .................................................................................................................................107

9.1 STUDY ANALYSIS ........................................................................................................................................107 9.2 ANALYSIS OF OTHER UPPER EXTREMITY REHABILITATION ROBOTIC SYSTEMS..........................................109 9.3 LIMITATIONS ...............................................................................................................................................110 9.4 RECOMMENDATIONS FOR FUTURE WORK....................................................................................................110

CHAPTER 10 CONCLUSION.............................................................................................................................112

REFERENCES ........................................................................................................................................................114

APPENDIX I – EXAMPLE CONSTRUCTION OF A CONDITIONAL PROBABILITY TABLE ...............118

APPENDIX II – SIMULATION EXAMPLES OF THE STRENGTHEN AND ISTRETCH MODELS........120

APPENDIX III – MICRO-CONTROLLER SOFTWARE CODE.....................................................................137

APPENDIX IV – QUESTIONNAIRE FOR THE THERAPIST.........................................................................139

APPENDIX V – QUESTIONNAIRE FOR THE PATIENT ...............................................................................143

APPENDIX VI – RAW QUANTITATIVE DATA ON DECISIONS MADE BY POMDP AND

THERAPIST............................................................................................................................................................149

APPENDIX VII – RAW QUANTITATIVE AND QUALITATIVE DATA ON THERAPIST’S RATINGS

PER SESSION .........................................................................................................................................................191

vii

List of Tables Table 1.1: Contributions in the development of the upper-limb rehabilitation system .................. 6

Table 5.1: Description of the variable dynamics in the reaching exercise ................................... 45

Table 5.2: Reward function for STRENGTHEN model............................................................... 47

Table 5.3: Reward function for iSTRETCH model ...................................................................... 64

Table 5.4: Summary of pros and cons of both models during simulation .................................... 79

Table 5.5: Summary of computational aspects of each model ..................................................... 80

Table 8.1: Therapist information .................................................................................................. 98

Table 8.2: Patient information ...................................................................................................... 99

Table 8.3: Percentage of agreement over all sessions................................................................. 102

Table 8.4: Qualitative response from therapist for overall questionnaire................................... 104

Table 8.5: Quantitative response from patient for overall questionnaire.................................... 105

Table 8.6: Qualitative response from patient for overall questionnaire...................................... 106

viii

List of Figures Figure 1.1: Block diagram of the upper-limb rehabilitation system............................................... 4

Figure 1.2: Major research phases for the development and evaluation of the intelligent

reaching rehabilitation system ........................................................................................................ 4

Figure 3.1: ARM Guide (© D. Reinkensmeyer, 2000 – use of picture is by permission of the

copyright holder)........................................................................................................................... 13

Figure 3.2: MIME system in bimanual mode (© Elsevier, 2002 – use of picture is by

permission of the copyright holder) .............................................................................................. 14

Figure 3.3: GENTLE/s system (© W. Harwin, 2007 – use of picture is by permission of the


Figure 3.4: MIT-MANUS (© H. Krebs, 2004 – use of picture is by permission of the


Figure 4.1: Diagram of the relationship of the POMDP components........................................... 23

Figure 4.2: Decision cycle of a POMDP agent............................................................................. 25

Figure 4.3: A n-step policy tree .................................................................................................... 27

Figure 5.1: The reaching exercise from the initial position (a) to the final position (b) (© Lam,

2007 – use of picture is by permission of the copyright holder) .................................................. 34

Figure 5.2: Actual diagram of rehabilitation robot (A) with end-effector (B).............................. 37

Figure 5.3: Trunk photoresistor sensors placed in three locations: lower back, lower left

scapula, and lower right scapula (a) (© Lam, 2007 – use of picture is by permission of the

copyright holder) and its detection of light (b) ............................................................................. 38

Figure 5.4: Virtual environment ................................................................................................... 39

ix

Figure 5.5: STRENGTHEN (POMDP) model as a DBN. It consists of the state, S, represented

by a combination of ten variables; the actions, A; the observations, O; the reward function, R;

and the dynamics, represented by the arrows. Variables at the next time step, t+1, are denoted

with an apostrophe (e.g. hat’). ...................................................................................................... 41

Figure 5.6: Example of an optimal value function in a two-state POMDP. The belief space is a

one-dimensional vector of two non-negative numbers that sum to 1 [b(s0) = P(s0) = 1-P(s1)].

The x-axis, therefore, represents the whole belief space on which the value function Vn(b) is

defined. The upper surface of the three α-vectors is the optimal value function, Vn*(b), which

defines the optimal action to take in a particular belief state. At the belief state, b, the action

associated with α2 should be taken. .............................................................................................. 52

Figure 5.7: Example of a Perseus backup stage in a two-state POMDP. The x-axis represents

the belief space and the y-axis represents V(b). Solid lines are the α-vectors from the current

stage and dashed lines are the α-vectors from the previous stage. There are seven belief states

{b1,…,b7} which comprise the set of reachable belief points (B) indicated by the tick marks.

The backup stage computing Vn+1 from Vn proceeds as follows: (a) the value function at stage

n; (b) the computation of Vn+1 starts by sampling b6, which produces an α-vector that

improves the values of b6 and b7; (c) b3 is then sampled, which produces an α-vector that

improves the values of b1 through b5; and (d) the values of all b ∈ B has improve and thus, the

backup stage at n+1 is completed (© AI Access Foundation, 2005 – use of picture is by

permission of the copyright holder). ............................................................................................. 53

Figure 5.8: iSTRETCH (POMDP) model as a DBN. It consists of the state, S, represented by

a combination of nine variables; the actions, A; the observations, O; the reward function, R;

and the dynamics, represented by the arrows. .............................................................................. 55

Figure 5.9: Example pace function for comp=yes, with φ+ = 0.9, φ- = 0.1, st+ = +3, st- = -1,

m(f=yes) = 0.8, and m(f=no) = 0.0. Shown are the upper and lower pace limits, and the pace

function for each condition of fat.................................................................................................. 61

Figure 5.10: Example pace function for ttt, with φ+ = 0.9, φ- = 0.1, and m(f=no) = 0.0. Shown

are the upper (st+ = -3) and lower (st- = +2) pace limits for ttt=norm, and the upper (st+ = +4)

x

and lower (st- = +1) pace limits for ttt=none. The pace function for ttt=slow gets what is left of

the probability mass. ..................................................................................................................... 62

Figure 5.11: (a) Updated belief state of n(r) and fat after the user failed to reach d=d3, had

minimum control and no compensation. The POMDP decides to set the next action at d=d1 at

the same resistance (r=none); (b) Updated belief state of n(r), stretch, fat, and learnrate after

the user failed to reach d=d3, had minimum control and no compensation. The POMDP

decides to set the next action at d=d3 at the same resistance (r=none). ....................................... 68

Figure 5.12: (a) Updated belief state after the user successfully reached d=d1, with maximum

control and no compensation; (b) Updated belief state after the user failed to reach d=d3, with

minimum control and no compensation........................................................................................ 69

Figure 5.13: (a) Updated belief state of n(r) and fat after the user successfully reached d=d3,

with maximum control and no compensation. The POMDP decides to set the next action at

d=d3 at the same resistance (r=max); (b) Updated belief state of n(r), stretch, fat, and

learnrate after the user successfully reached d=d3, with maximum control and no

compensation. The POMDP decides to set the next action at d=d3 at the same resistance

(r=max). ........................................................................................................................................ 73

Figure 5.14: (a) Updated belief state after the user successfully reached d=d3, with maximum

control but this time with compensation. The POMDP decides to set the next action again at

d=d3 at the same resistance (r=max); (b) Updated belief state after the user successfully

reached d=d3, with minimum control but this time with compensation. The POMDP decides

to set the next action again at d=d3 at the same resistance (r=max)............................................. 74

Figure 5.15: (a) Updated belief state after the user again successfully reached d=d3, with

maximum control and with compensation. The POMDP decides to set the next action again at

d=d3 at the same resistance (r=max); (b) Updated belief state after the user again successfully

reached d=d3, with minimum control and with compensation. The POMDP decides to stop

the exercise.................................................................................................................................... 75

Figure 5.16: STRENGTHEN model. Updated belief state of n(r) and fat after the user

successfully reached d=d3 in slow time, with minimum control, and with compensation. The

POMDP decides to set the next action at d=d3 at the same resistance (r=max)........................... 76

xi

Figure 5.17: STRENGTHEN model. Updated belief state after the user successfully reached

d=d3 in slow time, with compensation but this time with no control. The POMDP decides to

set the next action at d=d3 at the same resistance (r=max). Notice the reverse in the fatigue

level from the previous Figure 5.16.............................................................................................. 77

Figure 5.18: STRENGTHEN model. Updated belief state after the user again successfully

reached d=d3 in slow time, with no control, and with compensation. The POMDP decides to

set the next action at d=d3 at the same resistance (r=max). Notice again the reverse in the

fatigue level from the previous Figure 5.17.................................................................................. 78

Figure 6.1: Diagram of the reaching rehabilitation system consisting of the robotic system (a)

and the POMDP agent (b)............................................................................................................. 82

Figure 6.2: Massachusetts Institute of Technology’s Handyboard (micro-controller)................. 83

Figure 6.3: Interaction between POMDP agent and robotic controller ........................................ 87

Figure 7.1: Interaction between POMDP agent and robotic controller, via the therapist............. 93

Figure 7.2: Therapist GUI displaying: (A) decision from POMDP, (B) therapist agreement of

decision made, (C) action choice if therapist disagrees, (D) decision from therapist, (E)

history of actions and observations, and (F) emergency stop button............................................ 94

Figure 7.3: Final rehabilitation system in use consisting of: (A) virtual environment on the

computer monitor, (B) therapist GUI on another monitor, (C) end-effector with rotational

encoder, (D) haptic-robotic device, (E) trunk photoresistor sensors (not seen – placed on

chair), and (F) robotic controller and POMDP agent ................................................................... 95

Figure 8.1: Percentage of agreement per session........................................................................ 101

Figure 8.2: Evaluation of POMDP decisions made by therapist on Likert scale with a mean

and SD of 2.833 and 0.408, respectively, for question a) and a mean and SD of 3.167 and

0.408, respectively, for question b)............................................................................................. 103

xii

List of Appendices APPENDIX I – EXAMPLE CONSTRUCTION OF A CONDITIONAL PROBABILITY TABLE ...............118

APPENDIX II – SIMULATION EXAMPLES OF THE STRENGTHEN AND ISTRETCH MODELS........120

APPENDIX III – MICRO-CONTROLLER SOFTWARE CODE.....................................................................137

APPENDIX IV – QUESTIONNAIRE FOR THE THERAPIST.........................................................................139

APPENDIX V – QUESTIONNAIRE FOR THE PATIENT ...............................................................................143

APPENDIX VI – RAW QUANTITATIVE DATA ON DECISIONS MADE BY POMDP AND

THERAPIST............................................................................................................................................................149

APPENDIX VII – RAW QUANTITATIVE AND QUALITATIVE DATA ON THERAPIST’S RATINGS

PER SESSION .........................................................................................................................................................191

xiii

List of Acronyms 2D Two-dimensional

3D Three-dimensional

ADD Algebraic decision diagram

ADL Activities of daily living

AI Artificial intelligence

ANN Artificial neural network

ARM Guide Assisted Rehabilitation and Measurement Guide

CIMT Constrained-induced movement therapy

CMSA Chedoke-McMaster Stroke Assessment

CPT Conditional probability table

DBN Dynamic Bayesian network

DOF Degree of freedom

FM Fugl-Meyer

GUI Graphical user interface

iSTRETCH intelligent STroke Rehabilitation Exercise TeCHnology

MIME Mirror Image Movement Enabler

MIT-MANUS Massachusetts Institute of Technology

OT Occupational therapist

POMDP Partially observable Markov decision process

PI Proportional-integral

PT Physical therapist

PWLC Piecewise linear and convex

SD Standard deviation

STRENGTHEN STroke REhabilitatioN Guidance Tool in Haptic ENvironment

xiv

TRI Toronto Rehabilitation Institute

VE Virtual environment

xv

List of Symbols A / a Action space / action

α-vector An |S|-dimensional hyper-plane

B Set of reachable belief states

b Belief state

β Discount factor

f Fatigue

Φ Pace function

Γn Set of α-vectors

h Horizon

m Mean stretch

m(f) Fatigue effect

n Finite horizon

O / o Observation space / observation

P / p Policy tree space / policy tree

R Reward function

S / s State space / state

st Stretch

σst Slope of pace function

T Transition function

t Time step

V Value function

Z Observation function

1

Chapter 1 Introduction

1 Introduction

1.1 Problem Statement Stroke is the leading cause of physical disability and third leading cause of death in most

countries around the world, including Canada (Canadian Stroke Network, 2007; Caplan, 2006).

Every year more than 50,000 Canadians will suffer a stroke – one person every ten minutes.

This number is expected to increase with Canada’s aging population, since the risk of stroke

doubles every decade after the age of 55 (Heart and Stroke Foundation of Canada, 2008). The

consequences of stroke are devastating with approximately 75% of stroke victims left with a

permanent disability. Statistics from the Heart and Stroke Foundation show that the general

effects after stroke are:

• 10 percent of stroke survivors recover completely

• 25 percent recover with a minor impairment

• 40 percent are left with a moderate to severe impairment

• 10 percent require long-term care as they are left with a severe impairment

• 15 percent die (Heart and Stroke Foundation of Canada, 2008)

The expense of stroke in Canada is estimated at approximately $2.7 billion a year in physician

services, hospital costs, lost wages, and decreased productivity (Heart and Stroke Foundation of

Canada, 2006).

A growing body of research has shown that stroke rehabilitation can substantially reduce the

limitations and disabilities that arise from stroke, and improve motor function, allowing stroke

survivors to regain their quality of life and independence. It is generally agreed that intensive

(e.g. constraint-induced movement therapy), repetitive, and goal-directed rehabilitation improves

2

motor function and cortical reorganisation in stroke patients with both acute and long-term

(chronic) impairments (Fasoli, Krebs, & Hogan, 2004). However, this long and physically

demanding rehabilitation process is both slow and tedious, usually involving extensive

interaction between one or more therapists and one patient. One of the main motivations for

developing rehabilitation robotic devices is to automate interventions that are normally repetitive

and labour-intensive. These robots can provide stroke patients with intensive, reproducible, and

task-oriented movement training in time-unlimited durations, which can alleviate physical strain

on therapists. In addition, these devices can provide therapists with accurate measures on patient

performance and function (e.g. range of motion, speed, smoothness, strength) over the course of

a therapeutic intervention, and provide quantitative diagnosis and assessments of motor

impairments such as spasticity, tone, and strength (Hilder, Nichols, Pelliccio, & Brady, 2005).

This technology makes it possible for a single therapist to supervise multiple patients

simultaneously, which can contribute in the reduction of health care costs. It must be

emphasised that the goal of therapy robots is not to replace physical and occupational therapists,

but rather to complement existing treatment options. The use of rehabilitation robots would

provide therapists with more freedom to apply their expertise in educating patients to live with

their new or relearned motor skills in functional activities, and on pain management (Young,

2007).

The upper extremities are typically affected more than the lower extremities after stroke. The leg

most often recovers enough to allow standing and walking, whereas arm recovery is usually not

as complete (Caplan, 2006). Stroke patients with an affected upper-limb have great difficulties

performing many activities of daily living (ADL), such as reaching to grasp objects. Although

there are many robotic systems designed to assist and improve upper-limb stroke rehabilitation

(Brewer, McDowell, & Worthen-Chaudhari, 2007), none of them are able to operate

autonomously (without any explicit feedback from the therapist) and account for the specific

needs and abilities of each individual, which will change over time. These features are

especially important to reduce stroke patients’ direct dependence on therapists in the clinic and to

eventually have patients practicing efficient therapy at home.

The main goal of this thesis was to design and develop an intelligent system to autonomously

facilitate upper-limb reaching rehabilitation for moderate level stroke survivors using a partially

observable Markov decision process (POMDP).

3

1.2 Objectives The objectives of this research were to:

1. Design an adaptive system to guide stroke patients through a targeted, load-bearing,

linear-reaching exercise for the upper-limb.

2. Have professional therapists evaluate the performance of the system through the

comparison of the decisions made by the system versus those by a human therapist.

1.3 Research Questions This study attempted to answer the following questions:

1. Can a POMDP make decisions that are in line with those made by human therapists to

guide stroke patients through a targeted, load-bearing, linear-reaching exercise for the

upper-limb?

2. What aspects of the system seem to get more positive or negative feedback from

therapists and patients?

3. What future work is needed to improve the development of the POMDP and overall

system?

1.4 Scope of Research The overall upper-limb rehabilitation system can be represented by the block diagram in Figure

1.1. The user performs the exercise on the robotic device, and at the same time receives visual

feedback from the virtual environment on the computer display. The POMDP system analyses

performance data from the robotic system, makes a decision, and selects an action for the system

to execute (i.e. sets the exercise parameters). A therapist is present to oversee and control the

entire system.

4

Figure 1.1: Block diagram of the upper-limb rehabilitation system

The primary focus of this thesis was on the software aspect (POMDP) of the overall

rehabilitation system. Figure 1.2 shows the four main research phases for this project.

Figure 1.2: Major research phases for the development and evaluation of the intelligent reaching rehabilitation system

5

1.4.1 Development of the Intelligent System

An artificial intelligence system for guiding stroke users during the upper-limb exercise was

developed using a POMDP model. The chosen exercise was analysed and the resulting model

dynamics were defined.

1.4.2 Integration of the POMDP Model with the Robotic System

The final POMDP model was integrated with all aspects of the robotic system, including the

postural sensors, computer interface, and haptic technology.

1.4.3 Development of the Evaluation Study

The goal of the study was to obtain feedback from both therapists and stroke patients to evaluate

and enhance not only the decision-making strategy used by the system, but on the overall

rehabilitation system itself. Two questionnaires were developed: one to gather professional

therapists’ opinions on the decision-making ability of the POMDP model, and the other to gain

insight on the overall rehabilitation system from stroke patients.

1.4.4 Conducting the Evaluation Study

A pilot study of one therapist and one patient was conducted. The study sessions were held three

times a week for two weeks.

1.5 Contributions Table 1.1 outlines the specific contributions of the author and other parties in the development of

the overall upper-limb rehabilitation system. Quanser Inc.1 was the project’s industry partner,

and Paul Lam was a previous student whose thesis focused on designing and testing the

hardware aspect (robotic device) of the system.

1 Contact Quanser Inc. by phone at +1 905 940 3575 or visit their website at www.quanser.com

6

Table 1.1: Contributions in the development of the upper-limb rehabilitation system

Patricia Kan Paul Lam Quanser Inc.

• designed and developed

POMDP system

• integrated POMDP system

with all aspects of robotic

system

• developed and conducted

evaluation study of overall

integrated system

• designed concept of

robotic device

• developed trunk sensors

(micro-controller)

• developed and conducted

usability study of

hardware platform

• developed robotic device

• developed haptic

controller

• developed virtual

environment

7

Chapter 2 Background

2 Background

2.1 Stroke Stroke is defined as injury to the brain caused by an abnormality of blood supply to part of the

brain. There are two major categories of brain damage in stroke patients: ischemia – a lack of

blood flow depriving brain cells of needed fuel and oxygen; and hemorrhage – the release of

blood either into the brain or into the extravascular spaces within the skull. Bleeding damages

the brain by tearing and disconnecting vital nerve centres and pathways, and by causing pressure

inside the cranium (Caplan, 2000; Caplan, 2006).

An ischemic stroke is the most common form of stroke, accounting for approximately 80% of all

stroke occurrences (Caplan, 2006). Ischemic infarctions can be caused by three different

mechanisms: thrombosis (obstruction of blood flow due to narrowing of blood vessels),

embolism (blockage of blood flow due to lodging of foreign materials in blood vessels), and

systemic hypoperfusion (diminished blood flow to brain due to abnormal performance of the

heart). Hemorrhagic strokes account for the remaining 20% of stroke occurrences (Caplan,

2006).

Both types of stroke cause death in brain cells, and that portion of the brain becomes unable to

perform its normal functions. The effects of stroke include motor and sensory dysfunction,

cognitive and behavioural changes, loss of memory, and language and visual abnormalities

(Caplan, 2006).

2.2 Stroke Recovery A majority of stroke patients improve from the effects of stroke, with some even returning to

normal or near-normal functioning (Caplan, 2006). This improvement results from three general

8

changes in the sensorimotor networks: restitution, substitution, and compensation (Barnes,

Dobkin, & Bogousslavsky, 2005).

Restitution is an early, spontaneous recovery that is independent of external variables such as

physical and cognitive stimulation. It usually occurs within the first three to six months after a

stroke (Caplan, 2000) and is typically attributed to the biochemical and gene-induced events that

help to restore the functionality of the injured brain tissue, such as reduction of edema,

absorption of heme, and restoration of ionic currents and axonal transport (Barnes et al., 2005;

Gillen & Burkhardt, 2004).

Further intrinsic recovery is due to a reserve system that involves a redundancy built into the

central nervous system pathways. If a portion of the total pathway controlling an activity is

destroyed by a stroke, the remainder can take over the task. Conversely, an activity may be

controlled through multiple pathways, and if the predominant pathway is destroyed by the stroke,

one of the others can take over (Caplan, 2006). This reorganisation of the undamaged system in

the brain is referred to as plasticity, and is influenced by external stimuli such as practise with the

affected hemiparetic limb during rehabilitation (Barnes et al., 2005; Gillen & Burkhardt, 2004).

This process takes time, accounting for the slow and gradual recovery of patients after the first

three months of stroke (Caplan, 2006). Essentially, a new system is “substituted” for the

function of the old one.

Compensating or adapting to the disabilities that arise from stroke is learning to function

independently using movements alternative to the ones used before the stroke (Caplan, 2006).

This can include developing a new skill that replaces the defective one such as learning to dress

with one hand as opposed to two, and adjusting intentions such as training to use a wheelchair

because walking is not feasible (Barnes et al., 2005).

2.3 Rehabilitation of Motor Skills Knowing that functional recovery results in the ongoing nature of the reorganisational processes

in the nervous system in response to use and activity, designs for optimal stroke rehabilitation

can be created (Carr & Shepherd, 2003). Rehabilitation focuses on recovery, and helps to

9

minimise any handicaps that relate to neurologic impairments following a stroke (Caplan, 2006).

Studies of animals and humans with brain lesions provide insight into the process of functional

recovery and on the relationship between neural reorganisation and rehabilitation.

A study on squirrel monkeys, which modelled the effects of a focal ischemic infarct within the

hand motor area of the cortex, found that there was further loss of hand representation in the area

adjacent to the lesion when the monkeys had no post-infarct intervention (Nudo & Milliken,

1996). In the follow up study, the monkey’s unimpaired hand was restrained while the impaired

hand had daily repetitive training in skilled use of retrieving food pellets from small wells. Not

only was tissue loss prevented, but there was also a net gain of approximately ten percent in the

total hand area adjacent to the lesion (Nudo, Wise, SiFuentes, & Milliken, 1996). In a further

study, in which the monkey’s unimpaired hand was restrained but no training was given to the

unimpaired hand, the size of the total hand area was decreased (Friel & Nudo, 1998). The results

of these studies show that active use, such as repetitive training and skilled use, of the limb is

necessary for the survival of undamaged neurons adjacent to those damaged by cortical injury.

Studies involving both healthy and hemiparetic subjects have also provided evidence that

functional plasticity after stroke is associated with meaningful use of a limb during task-oriented

or task-specific repetitive exercises (Carr & Shepherd, 2003), especially when performed in a

massed or contextual interference paradigm (Barnes et al., 2005). Constrained-induced

movement therapy (CIMT) is a type of massed practice, where the unaffected limb is restricted

to force training of the impaired limb. Liepert, Bauder, Miltner, Taub, and Weiller (2000)

showed the relationship between CIMT and reorganisation of the cerebral cortex in persons

several years after stroke. After CIMT, the authors reported a significant enlargement in the size

of the cortical area of the affected hand muscle, which corresponded to a greatly improved motor

performance of the impaired limb. These changes were maintained six months later in a follow-

up examination, with the area of the cortical representation in the affected hemisphere almost

identical to the unaffected side (Liepert et al., 2000). The results of several other studies also

support the idea that use-dependent activities result in functional reorganization in the adult

cerebral cortex after stroke (Carr & Shepherd, 2003). In addition, Barnes et al. (2005) suggest

that more intensive, task-oriented practice seems to enhance learning and performance.

10

There is evidence that brain reorganisation and functional recovery from brain lesions are

dependent on intensive, repetitive, and task-oriented movements. Thus, the rehabilitation

environment must offer possibilities for intensive and meaningful exercise and training (Caplan,

2006). The three primary means of rehabilitation are physical therapy, occupational therapy, and

speech language pathology.

2.4 Role of Therapists The main role of a physical therapist (PT) is to train the patient for ambulation. This includes

range-of-motion, strengthening, and endurance exercises. They also instruct patients on how to

use various aids, such as canes and walkers, if needed. An occupational therapist (OT) helps to

retrain the skills needed for activities of daily living. Patients and OTs work on improving fine

motor skills so activities such as feeding, bathing, dressing, and cooking can be accomplished.

Speech language pathologists work with patients to relearn language and communication skills

such as speaking, reading, and writing (Caplan, 2000).

In general, the primary role of a therapist is to facilitate the motor relearning process. This is

done by identifying the problems faced by the patient and by analysing their movements through

observation and comparison with normal movements. The therapist also identifies components

that are lacking or poorly controlled, and teaches the patient to perform these missing

components using goal setting, instruction, feedback, and manual guidance (Barnes et al., 2005).

These movements are then practiced, followed by training of the task in a more functional

context to promote transfer or carry-over in real-life situations. The patient is encouraged by the

therapist to practice these exercises extensively, not only under the therapist’s supervision but

also independently in a variety of environments (Barnes et al., 2005).

The process of conventional therapy requires a large amount of therapist involvement. It is time

consuming and most often requires one-on-one therapist-patient interaction. This interaction

places excessive physical demands on therapists that sometimes results in repetitive strain

injuries, lower back problems, and severe fatigue (Hilder et al., 2005). Not only does

rehabilitation place a great deal of burden on therapists, it contributes to the high health care

costs, for example, during instances where more than one therapist is needed to provide a

11

therapeutic intervention, such as gait training in a severely impaired stroke patient (Hilder et al.,

2005).

2.5 Reducing Health Care and Therapist Burden Efforts toward developing robotic treatments are motivated by the increasing pressure to contain

and reduce health care costs that have resulted in a cutback of the time and resources available

for post-stroke rehabilitation (Barnes et al., 2005). These factors emphasise the need for new

approaches to increase the effectiveness and efficiency for motor therapy after stroke.

Integration of robotic therapy into current practice may improve inpatient rehabilitation, where it

can unburden the therapist of repetitive, time-consuming tasks and allow more time to focus on

care delivery and individual patient needs (Young, 2007). Robots may even provide a means of

delivering high-quality outpatient treatments in places that incur lower costs, including

community care centres, skilled nursing facilities, assisted living centres, and eventually,

patients’ homes (Barnes et al., 2005).

Robotic technology may also improve the quality and effectiveness of rehabilitation in the

following ways (Barnes et al., 2005):

• provide better control on movement delivery

• allow increased intensity or dosage

• provide better responsiveness and adaptation to a patient’s changing needs and abilities

• provide accurate measures on performance and assessment

12

Chapter 3 Literature Review

3 Literature Review This chapter presents a review of some of the current robotic applications in post-stroke therapy

for the upper extremity.

3.1 Current Rehabilitation Robotic Systems for Upper Extremities

There have been several types of robotic devices designed to deliver upper-limb rehabilitation

for people with paralysed upper extremities. The Assisted Rehabilitation and Measurement

(ARM) Guide developed by Reinkensmeyer et al. (2000) was designed to mimic the reaching

motion. It consists of a single motor and chain drive that is used to move the user’s hand along a

linear constraint, which can be manually oriented in different angles to allow reaching in various

directions (Figure 3.1). The ARM Guide implements a technique called “active assist therapy”,

in which its essential principle is to complete a desired movement for the user if he/she is unable

to do so. This assistance is achieved with a control algorithm that:

• allows the user to initiate movement through at least one centimetre along the track in

the forward direction,

• completes the reaching movement in a smooth fashion by driving the arm along the

desired minimum-jerk trajectory if the user cannot complete the movement, and

• does not apply assistance if the user follows the desired trajectory within a one

centimetre dead-band (Kahn, Zygman, Rymer, & Reinkensmeyer, 2006).

A pilot study was performed to compare the effects of active-assisted versus free reaching

exercises to improve arm movement after stroke. Nineteen chronic individuals were randomised

into two groups: one performed the active-assisted exercises on the ARM Guide, while the other

13

performed a task-matched amount of unassisted training. The study concluded that the

improvements were not significantly different between the two groups. However, Kahn et al.

(2006) suggested that the inconclusive results might have been due to the study’s small sample

size.

Figure 3.1: ARM Guide (© D. Reinkensmeyer, 2000 – use of picture is by permission of the copyright holder)

The Mirror Image Movement Enabler (MIME) therapy system was designed through a

collaborative effort between the Veteran Administration Medical Center in Palo Alto and

Stanford University (Lum, Burgar, Shor, Majmundar, & Van der Loos, 2002). It consists of a

six-degree of freedom (DOF) robot manipulator, which is attached to the orthosis supporting the

user’s affected arm (Figure 3.2), and applies forces to the limb during both unimanual and

bimanual goal-directed movements in 3-dimensional (3D) space. Unilateral movements involve

the robot moving or assisting the paretic limb towards a target in pre-programmed trajectories.

The bimanual mode works in a slave configuration where the robot-assisted affected limb

14

mirrors the unimpaired arm movements as seen in Figure 3.2.

Figure 3.2: MIME system in bimanual mode (© Elsevier, 2002 – use of picture is by permission of the copyright holder)

A randomised controlled study involving 27 chronic stroke subjects was performed to compare

the effects of robot-assisted movement training with conventional rehabilitation techniques. For

the robot group, subjects practised shoulder and elbow movements while assisted by the robot in

four different modes of operation:

• passive – the subject’s arm was passively moved by the robot along a predetermined

trajectory, while the subject relaxed the paretic limb

15

• active-assisted – the subject would first initialise the movement with volitional force

toward the target, and then both the user and robot would work together to move the limb

• active-constrained – the robot provided resistive forces in the direction of the desired

movement

• bilateral – the robot assisted the affected limb by continuously moving the affected

forearm to the unaffected forearm’s mirror image position and orientation (i.e. the two

forearms were kept in mirror symmetry)

The control group practised various tasks with their arm, which targeted proximal upper-limb

function. It was found from this study that subjects who received MIME therapy made

statistically higher gains in proximal arm function (Fugl-Meyer (FM) scores), strength, and

reaching. However, at the six-month follow up, there were no statistical differences in function

between the two groups (Lum et al., 2002). Similar results were also found for individuals with

subacute stroke (Lum et al., 2006).

The GENTLE/s project, funded by the European Union under the Quality of Life initiative of

Framework Five, was also designed to deliver upper-limb robot-mediated therapy for stroke

patients (Amirabdollahian et al., 2007). The GENTLE/s system (Figure 3.3) is comprised of a

commercially available 3-DOF robot, the HapticMASTER (FCS Robotics Inc.), which is

attached to a wrist splint via a passive gimbal mechanism with 3-DOF. The gimbal allows for

pronation/supination of the elbow as well as flexion and extension of the wrist. The seated user,

whose arm is suspended from a sling to eliminate gravity effects, can perform reaching

movements through interaction with the virtual environment (VE) on the computer screen.

16

Figure 3.3: GENTLE/s system (© W. Harwin, 2007 – use of picture is by permission of the copyright holder)

A randomised controlled study to assess the effect of the robot-mediated therapies on the

GENTLE/s system compared to sling suspension therapies was performed with 31 chronic stroke

participants. Subjects in the robot group performed reaching tasks in three different modes:

• patient passive – the robot moved the user’s arm following a predefined path

• patient active assisted – the robot would only start to move if the user initiated a

movement by providing a nominal force in the correct direction

• patient active – the robot stayed passive until the user deviated from the planned

trajectory; only then would the robot assist the user to return to the path

Subjects in the control group practised reaching-type exercises while the paretic arm was

suspended from a frame eliminating gravity. The study results indicated that both groups

17

improved function, as measured by the FM scale’s upper-limb section. However, the

improvements were not significantly different between the two groups (Amirabdollahian et al.,

2007).

The rehabilitation robotic device that has received the most clinical testing is the Massachusetts

Institute of Technology (MIT)-MANUS (Krebs, Hogen, Aisen, & Volpe, 1998). The MIT-

MANUS consists of a 2-DOF robot manipulator that assists shoulder and elbow movements by

moving the user’s hand in the horizontal plane (Figure 3.4). In a previous randomised study

involving 56 subacute stroke patients, those who received 25 hours of robot exercise in addition

to their conventional therapy had greater gains in proximal arm strength, reduced motor

impairment at the shoulder and elbow, and greater recovery of ADL function when compared

with controls who received only minimal exposure (five hours) to the robot (Volpe et al., 2000).

The robot group practised goal-directed reaching movements in active or passive modes. The

robot would guide the user’s hand to the desired target if the user did not move; otherwise, the

robot would be left in passive mode. The control group interacted with the robot in passive

mode only. If the user could not perform the task with the affected limb, s/he used the

unimpaired limb to complete the movement (Volpe et al., 2000). Unfortunately, these results are

not definitive since the treatment group received five times of additional therapy compared with

the control group. The additional time spent on therapy, not the robotic device, may have

accounted for the different results.

18

Figure 3.4: MIT-MANUS (© H. Krebs, 2004 – use of picture is by permission of the copyright holder)

Further studies evaluating the effect of robotic therapy with the MIT-Manus in reducing chronic

motor impairments show that there were statistically significant improvements in motor function

(Ferraro et al., 2003; Fasoli et al., 2004; MacClellan et al., 2005). These participants were at or

near a plateau in their ability to move the paretic arm at the time of study admission. However,

these studies were not compared with conventional therapy.

Researchers in the artificial intelligence community have started to design robot-assisted

rehabilitation devices that implement artificial intelligence methods to improve on the active

assistance techniques found in the previous systems mentioned above. However, very few have

been developed.

Ju, C. Lin, D. Lin, Hwang, and Chen (2005) developed an elbow and shoulder rehabilitation

robot that uses a hybrid position/force fuzzy logic controller to assist the user’s arm along

19

predetermined linear or circular trajectories with specified loads. The robot helps to constrain

the movements in the desired direction, if the user deviates from the predetermined path. Fuzzy

logic was incorporated in the position and force control algorithms to cope with the nonlinear

dynamics (i.e. uncertainty of the dynamics model of the user) of the robotic system to ensure

operation for different users.

Erol, Mallapragada, Sarkar, Uswatte, and Taub (2006) developed an artificial neural network

(ANN) based proportional-integral (PI) gain scheduling direct force controller to provide robotic

assistance for upper extremity rehabilitation. The controller has the ability to automatically

select appropriate PI gains to accommodate a wide range of users with varying physical

conditions by training the ANN with estimated human arm parameters. The idea is to

automatically tune the gains of the force controller based on the condition of each patient’s arm

parameters in order for it to apply the desired assistive force in an efficient and precise manner.

3.2 Discussion Although these robotic systems have shown promising results, none of them are able to provide

an autonomous rehabilitation regime that accounts for the specific needs and abilities of each

individual. Each user progresses in different ways and thus, exercises must be tailored to each

individual differently. For example, the difficulty of an exercise should increase faster for those

who are progressing well compared to those who are having trouble performing the exercise.

The GENTLE/s system requires the user or therapist to constantly press a button in order for the

system to be in operational mode (Amirabdollahian et al., 2007). It is imperative that a

rehabilitation system can operate with no or very little feedback as any direct input from the

therapist (or user), such as setting a particular resistance level, prevents the user from performing

the exercise uninterrupted. The system should be able to autonomously adjust different exercise

parameters in accordance to each individual’s needs.

The rehabilitation systems discussed above also do not account for physiological factors, such as

fatigue, which can have a significant affect on rehabilitation progress (Barnes et al., 2005). A

20

system that can incorporate and estimate user fatigue can provide information as to when the

user should take a break and rest, which may benefit rehabilitation progress.

This thesis aims to fill in these existing gaps by using partially observable Markov decision

process (POMDP) techniques to autonomously guide stroke patients during upper-limb reaching

rehabilitation, tailor exercise parameters for each individual, and estimate user fatigue.

21

Chapter 4 Partially Observable Markov Decision Process

4 Partially Observable Markov Decision Process

4.1 Artificial Intelligence Artificial intelligence (AI) is a field that not only tries to understand how humans think, but also

attempts to build intelligent entities (agents) that are capable of thinking and acting in a rational

manner (Russell & Norvig, 2003). AI in its formative years was influenced by ideas from many

disciplines including philosophy, mathematics, economics, neuroscience, psychology, computer

engineering, control theory, and linguistics (Russell & Norvig, 2003). However, AI has now

grown beyond these lines of work and has, in turn, occasionally influenced them. Only in the

last half century have there been computational devices and programming languages powerful

enough to create and solve experimental tests of ideas about what intelligence is (Buchanan,

2005).

For an agent to operate interactively, it must be able to perceive its environment through sensors

and act upon that environment through actuators (Russell & Norvig, 2003). Through senses such

as sight, sound, and touch, humans are able to perceive their environment, make decisions based

on this input, and then affect their environment through actuators (body parts) such as speech,

gestures, and movement. An AI agent operates in the same fashion except its sensors and

actuators may differ depending on the particular problem. For example, a robotic agent designed

to navigate through a maze may have cameras and infrared range finders for sensors and various

motors for actuators, and a software agent may receive keystrokes as sensory inputs and act on

the environment by displaying characters on the screen. In any case, to design an agent that is

rational and effective, it is important to comprehend the problem at hand, which will guide the

selection of suitable sensors and actuators, as well as the type of AI employed to solve the

problem.

There are many models and techniques of AI available to solve problems in various areas from

speech recognition to game playing. However, each type has a different technique that is better

22

suited to solve some problems over others. Fuzzy logic, neural networks, and decision theory are

some examples of commonly used AI techniques. A POMDP is a decision-theoretic model that

assumes partial observability of the environment. It is a combination of probability and utility

theory, and is the type of AI chosen for use in this thesis. POMDPs can provide a natural

framework for modelling complex planning problems with partial observability, uncertain action

effects, incomplete knowledge of the state of the environment, and multiple interacting

objectives.

4.2 Definition of Partially Observable Markov Decision Processes

A POMDP model can represent a planning problem under uncertainty: to optimally choose

sequences of actions in a partially observable environment that will achieve a particular goal. It

is based on decision theory, which is a combination of probability theory (describes what the

agent should believe on the basis of evidence) and utility theory (describes what the agent wants)

that describes what the agent should do. The POMDP agent uses decision theory to make

decisions by considering all possible actions and choosing the one that leads to the best expected

outcome. A POMDP is also a sequential decision model, in which the agent’s utility depends on

a sequence of states (an environment history) rather than on a single state to make a decision

(Russell & Norvig, 2003). This feature allows more complex, real-world problems to be solved.

For a more detailed review of POMDPs refer to Kaelbling, Littman, and Cassandra (1998). The

following equations (Equations 4.1 - 4.5) are also based on the paper by Kaelbling et al. (1998).

4.2.1 Components

POMDPs can be described as having eight components: the state space S, the action space A, the

transition function T, the observation space O, the observation function Z, the reward function R,

the horizon h, and the discount factor β. The relationship of these components can be seen in

Figure 4.1. The POMDP described below assumes discrete time steps.

23

Figure 4.1: Diagram of the relationship of the POMDP components

State space (S): The world is represented by a finite set (S) of distinct states (s).

Action space (A): The action space (A) is comprised of a finite number of actions (a) available to

the agent. The agent’s goal is to choose actions that will influence the world in such a way that

desirable states are visited more frequently.

Transition function (T): As opposed to classical planning models, POMDPs can model the

uncertainty in the effects of actions. This means that the current state of the world (s) has a

certain probability of making a transition to any state (s’) in S as a result of executing an action

(a). P(s’|s,a) denotes the probability of the world making a transition to state s’ when action a is

executed in state s. Note that this transition function operates under the Markov assumption,

which declares that the probability of transition to some state s’ at the next time step, t+1,

depends only on the state s and action a at the current time step, t. It is independent of the

previous states and actions.

Observation space (O): The observation space (O) is comprised of a finite number of

observations (o) the agent can experience of its world. Observations correspond to features of

the world that are directly perceptible by the agent’s sensors.

24

Observation function (Z): Observations provide information about the current state of the world.

The observation function describes the probability of the agent experiencing observation o after

executing action a and making a transition to state s’ denoted by P(o|a,s’). Note that

observations only provide partial information to the agent since the same observation may be

experienced in different states.

Reward function (R): In order for the agent to decide which action to choose, there must be

motivation to pick one action over another. The reward function, R(s,a), dictates how much the

agent earns when the world is in state s and executes some action a. Knowing these rewards

allows the agent to choose which action to take by following some strategy, such as attempting

to maximise its cumulative reward. Note that rewards can be both positive and negative (i.e.

cost). The reward function can model both simple and complex concurrent goals, which allows

the agent to combine multiple goals and make rational tradeoffs with respect to those goals. For

example, the agent may take actions that will penalise it in the short term, but may yield the

agent a better probability of success in the long term.

Horizon h and discount factor β: In decision theory, the agent’s goal is to maximise the expected

utility earned over some time frame. This time frame is known as the horizon h, which specifies

the number of time steps the agent must plan for. It can be finite or infinite. A discount factor β

is used to indicate how rewards received by the agent at different time steps should be weighted.

If β is set to 1, then future rewards will be worth as much to the agent as current rewards. If 0 ≤

β < 1, then future rewards will be worth less than current ones, each scaled down for every time

step delay. This thesis assumes infinite horizon POMDPs.

4.2.2 Acting Optimally

The decision cycle of a POMDP agent can be seen in Figure 4.2. Basically, the agent makes an

observation of the world, and then generates an action. The agent’s goal remains to maximise

the expected discounted sum of future rewards.

25

Figure 4.2: Decision cycle of a POMDP agent

Since knowledge of the state of the world is uncertain, the POMDP agent keeps an internal belief

state, b, that represents the probability distribution over all possible states of the world (S).

These distributions encode the agent’s subjective probability about the state of the world and

provide a basis for acting under uncertainty. In addition, the belief state summarises its previous

experiences due to the Markovian assumption.

Given the agent’s belief state, the policy decides which action to generate. It is a mapping of

belief states to actions. The agent then makes an observation from the resulting state of the

world. The state estimator is responsible for updating the belief state based on the last action

executed, the current observation, and the previous belief state. From the updated belief state,

the policy decides on the next action to execute. This decision cycle continues repeating until

the agent has reached its goal.

4.2.2.1 Computing the Belief State

A belief state, b, is a probability distribution over S. b(s) denotes the probability assigned to

some world state s according to the distribution of belief state b. The axioms of probability

require that:

26

• 0 ≤ b(s) ≤ 1 for all s ∈ S, and

• Σs∈S b(s) = 1.

Given the old belief state b, an action a, and an observation o, the state estimator must update a

new belief state b’. The new degree of belief in some state s’, b’(s’), can be obtained using

Bayes’ Rule and basic probability theory as follows:

′ b ( ′ s ) = P( ′ s | o,a,b)

=P(o | ′ s ,a,b)P( ′ s | a,b)

P(o | a,b)

=P(o | ′ s ,a) P( ′ s | a,b,s)P(s | a,b)s∈S∑

P(o | a,b)

=Z( ′ s ,a,o) T(s,a, ′ s )b(s)s∈S∑

P(o | a,b)

(4.1)

where T(s,a,s’) is the transition probability and Z(s’,a,o) is the observation probability. The

denominator, P(o|a,b), is independent of s’ and can be treated as a normalising factor to cause

b’(s’) to sum to 1.

4.2.3 Finding the Optimal Policy: Value Iteration

The calculation of the value function, namely the expected sum of discounted rewards that the

POMDP agent will earn when starting in a belief state b, allows the agent to decide on what

action to choose next. When making a decision, the agent must take into account the future

implications of its current action since current actions influence the future belief state of the

world, and in turn, future actions, observations, and rewards. In order to do this, the agent must

have a preference or utility over the actions available. The goal of the POMDP agent is to

maximise the cumulative reward possible and therefore, will prefer courses of actions that will

net the agent the highest expected reward. Even though the agent may have many actions to

choose from at every possible state, there will be at least one that will have the greatest expected

utility than the rest.

For an agent that has one step remaining, all it can do is take a single action. With two steps to

go, it can execute an action, receive an observation, and then execute a final action. In general,

an agent’s non-stationary n-step policy can be represented by a policy tree, p, as shown in Figure

27

4.3. The top node determines the first action to take. Then, depending on the resulting

observation, an arc is followed to a node on the next level, which determines the next action.

Figure 4.3: A n-step policy tree

In the simplest case, p is a 1-step policy tree (i.e. a single action). The value function is simply

the reward gained by the agent by executing that action in its present state:

Vp(s) = R(s,a( p)) (4.1)

where a(p) is the action specified in the top node of p. In general, if p is a n-step policy tree, the

value function becomes:

Vp(s) = R(s,a(p)) + β (Expected value of the future)

= R(s,a(p)) + β P( ′ s | s,a( p))′ s ∈S∑ P(oi | ′ s ,a(p))Voi (p)( ′ s )oi ∈O∑

= R(s,a(p)) + β T( ′ s ,a(p),s)′ s ∈S∑ Z( ′ s ,a( p),oi )Voi (p)( ′ s )oi ∈O∑

(4.2)

28

where oi(p) is the (n-1)-step policy subtree associated with observation oi at the top level of a n-

step policy tree p, R(s,a(p)) is the reward incurred after performing action a(p) in state s, β is the

discount factor, T(s,a(p),s’) is the transition probability, Z(s’,a,oi) is the observation probability,

and Voi (p) ( ′ s ) is the value function for being in state s’. Since the agent already knows the value

function of p for the future state s’, as well as the reward, transition, and observation functions,

the agent can calculate Vp(s).

However, most applications in the real world do not have a bound on the number of time steps

available (i.e. have an infinite horizon), and thus, the inclusion of the discount factor β will

eventually cause the solution of the value function to converge. Due to the nature of β, Richard

Bellman proved that as n→∞, Vp→Voi (p) . Thus, iteration of Equation 4.2 converges to the value

function:

Vp(s) = R(s,a(p)) + β T( ′ s ,a(p),s)′ s ∈S∑ Z( ′ s ,a(p),oi )V(p) (s)oi ∈O∑ (4.3)

which is referred to as the Bellman equation.

Since the agent will never know the exact state of the world, it must be able to determine the

value of executing p from some belief state b. This is just an expectation over world states of

executing p in each state:

Vp(b) = b(s)Vp(s)s∈S∑ (4.4)

Equation 4.4 is the value of executing p in every possible belief state. However, to find the

optimal value function, it is necessary to execute different policy trees from different initial

belief states. Let P be the finite set of all policy trees. Thus, the optimal value function for b can

be defined as:

(4.5) V *(b) = maxp∈P

b(s)Vp(s)s∈S∑

The actions that maximise the optimal value function (Equation 4.5) give the optimal courses of

actions to take, and is known as the optimal policy, π*(b). This policy maps belief states to

actions, which defines what action the agent should choose in a particular belief state b.

29

Unfortunately, the use of POMDPs in real-world systems remains limited due to the intractability

of the solution algorithms for finding an optimal policy. This had led researchers to develop

methods to deal with the complexity of policy spaces and the large number of states that exist in

a POMDP model. Several other optimal and approximate POMDP solutions are discussed in

Lovejoy (1991) and Poupart (2005).

4.3 Examples of POMDPs in Real-World Applications An increasing number of researchers in various fields are becoming interested in the application

of POMDPs because they have been showing promise in solving real-world problems.

Researchers at Carnegie Mellon University used a POMDP to model the high-level controller for

an intelligent robot, Nursebot, designed to assist elderly individuals with mild cognitive and

physical impairments in their daily activities such as taking medications, attending appointments,

eating, drinking, bathing, and toileting (Pineau, Montemerlo, Pollack, Roy, & Thrun, 2003).

Using variables such as the robot’s location, the user’s location, and the user’s status, the robot

would decide whether to take an action to provide the user a reminder or to guide the user where

to move. By maintaining an accurate model of the user’s daily plans and tracking their execution

of the plans by observation, the robot could adapt to the user’s behaviour and make decisions

about whether and when it was most appropriate to issue reminders. For example, if the user

must be reminded to take their medication every three hours and the next reminder is scheduled

for 13:30 during the user’s favourite television program, the robot might schedule a reminder at

13:25 so as to not interrupt them during the show.

A POMDP model was also used in a guidance system to assist persons with dementia during the

handwashing task (Hoey, von Bertoldi, Poupart, & Mihailidis, 2007). By tracking the positions

of the user’s hands and towel with a camera mounted above the sink, the system could estimate

the progress of the user during the handwashing task and provide assistance with the next step, if

needed. Assistance was given in the form of verbal and/or visual prompts, or through the

enlistment of a human caregiver’s help. An important feature of this system is the ability to

estimate and adapt to user states such as awareness, responsiveness, and overall dementia level

which affect the amount of assistance given to the user during the handwashing activity.

30

Givon and Grosfeld-Nir (2008) developed a POMDP system that could optimally control the

running of television shows on a broadcasting network. In this application, the (uncertain) state

of a show, which could either be “good” (i.e. it should be continued) or “bad” (i.e. it should be

changed), was inferred from the partial observability of the show’s ratings (i.e. the sampled

proportion of households tuned in to their channel at their time slots). By knowing the transition

probabilities of the ratings and maximising the expected value of profits from selling advertising

time, the system could either take the action of continuing to watch the show or changing to a

different show.

4.4 Justification for using a POMDP to Model Reaching Rehabilitation

Classical planning generally consists of agents which operate in environments that are fully

observable, deterministic, static, and discrete. Although these techniques can solve increasingly

large state-space problems, they are not suitable for most robotic applications, such as the

reaching task in upper-limb rehabilitation, as they usually have partial observability, stochastic

actions, and dynamic environments (Pineau, Gordon, & Thrun, 2006). Planning under

uncertainty aims to improve robustness by factoring in the types of uncertainties that can occur.

POMDPs are perhaps the most general representation for (single-agent) planning under

uncertainty. It surpasses other techniques in terms of representational power because it can

combine many important aspects for planning under uncertainty (Pineau et al., 2006) as

described below.

In reality, the state of the world cannot be known with certainty due to inaccurate measurements

of noisy and imperfect sensors, or instances where observations may be impossible and

inferences must be made. POMDPs can handle this uncertainty in state observability by

expressing the state of the world as a belief state – the probability distribution over all possible

states of the world – rather than actual world states. By capturing this uncertainty in the model,

the POMDP has the ability to make better decisions than fully observable techniques. For

example, the reaching rehabilitation system does not consist of physical sensors that can detect

user fatigue. By capturing observations in user compensation and control, POMDPs can use this

information to infer or estimate how fatigued the user is. Fully observable methods cannot

31

capture user fatigue in this way since it is impossible to observe fatigue. The only way for fully

observable techniques to work is to physically capture information about fatigue, such as using

electrical stimulation to measure muscle contractions (Dobkin, 2008). However, these

techniques are invasive and may not even guarantee full observability of the world state since

sensor measurements may be inaccurate.

The reaching exercise is a stochastic (dynamic) decision problem where there is uncertainty in

the outcome of actions and the environment is always changing. Thus, choosing a particular

action at a particular state does not always produce the same results. Instead, the action has a

random chance of producing a specific result with a known probability. POMDPs can account

for the realistic uncertainty of action effects into the decision process through its transition

probabilities and reward function. By knowing the probabilities and rewards of the outcomes of

taking an action in a specific state, the POMDP agent can estimate the likelihood of future

outcomes to determine the optimal course of action to take in the present. This ability to

consider the future effects of current actions or to “look ahead” allows the POMDP to trade off

between alternative ways to satisfy a goal and plan for multiple interacting goals. It also allows

the agent to build a policy (prescribing the choice of action for every possible belief state) that is

capable of handling unexpected outcomes far better than many classical planners.

Different stroke patients progress in different ways during rehabilitation depending on their

ability and state of health. It is imperative that the rehabilitation system is able to tailor and

adapt to each individual’s needs and abilities over time. POMDPs have the capability of

incorporating user abilities autonomously in real-time by keeping track of which actions have

been observed to be the most effective in the past. For example, the POMDP may decide to keep

the target location at d1 for a longer period of time for patients who are progressing slowly, but

may increase the target location to d2 or d3 at a quicker rate for those who are progressing faster

(see Sections 5.2.1 and 5.3.1 for a description of the variables used in the reaching exercise).

Since one of the objectives of a rehabilitation robotic system is to reduce health care costs by

having one therapist supervise multiple stroke patients simultaneously, it is imperative to design

the system in which no or very little explicit feedback from the therapist is required during the

therapy session. The system must be able to effectively guide the patient during the reaching

exercise without the need for explicit input (e.g. a button press to set a particular resistance

32

level), as any direct input from the therapist would be time consuming and prevent the user from

intensive repetition. POMDPs have this ability to operate autonomously through the estimation

of states and then automatically making decisions. For eventually practising therapy in the home

setting, it is especially important that the system does not require any explicit feedback since no

therapist will be present.

33

Chapter 5 Design of the POMDP Reaching Exercise Model

5 Design of the POMDP Reaching Exercise Model The following section discusses the development of the POMDP model used in this thesis.

5.1 Requirements Specification

5.1.1 Definition of the Reaching Exercise

Discussions with a team of experienced OTs and PTs at the Toronto Rehabilitation Institute

(TRI) identified that early-to-moderate stage exercises for the upper-limb in stroke patients,

specifically the reaching motion, is an area of rehabilitation that is in need of more efficient

tools. Moreover, reaching is one of the most important abilities to possess, as it is the basic

motion used for many activities of daily living. Hence, this project focused on delivering

reaching motion therapy.

The specific reaching exercise chosen for this thesis was based on a common task delivered by

the OTs at TRI. This task involved a seated patient placing his/her hand (of the affected upper

limb) palm-down on the flat surface of a four-legged stool, and pushing into the surface of the

stool. For the duration of the exercise, the therapist ensured that the patient’s trunk was not

rotated, the shoulder was not elevated, and the elbow stayed in the saggital plane (aligned with

the shoulder). The goal of this exercise was to rock the stool straight forward as far as possible

on its two front legs and bring it straight back in a controlled manner, ensuring that the patient’s

trunk, shoulder, and elbow stayed in the correct position. The therapist would also apply

increasing resistance as the patient showed signs of improvement (e.g. reached further).

The reaching exercise can be summarised as a targeted, load-bearing, forward-reaching motion

in which the patient must ensure proper posture and control at all times. Figure 5.1 provides a

basic overview of the reaching exercise. The reaching exercise begins with forward flexion of

the shoulder, and extension of the elbow and wrist (Figure 5.1a). Weight is translated through

34

the heel of the hand as it is pushed forward in the direction indicated by the arrow, until it

reaches the final position (Figure 5.1b). The reaching exercise occurs in the saggital plane, and

the return path brings the arm back to the initial position. It is important to note that a proper

reaching exercise is performed with control and without compensation.

Figure 5.1: The reaching exercise from the initial position (a) to the final position (b) (© Lam, 2007 – use of picture is by permission of the copyright holder)

Adaptive or compensatory movements are usually present when persons with an affected upper-

limb attempt to carry out a voluntary reaching action. These patterns of adaptive movement are

due to muscle weakness, loss of interjoint coordination (between elbow and shoulder joints), and

lack of joint and muscle flexibility as a result of soft tissue length changes and increased muscle

stiffness (Carr & Shepherd, 2003). Typical examples of compensatory strategies during the

reaching motion are (Carr & Shepherd, 2003; Gillen & Burkhardt, 2004):

• flexion of the upper body at the hips instead of shoulder flexion

• lateral flexion and rotation of the trunk

• shoulder elevation

35

• abduction of the shoulder with elbow flexion

• internal rotation of the shoulder

• pronation of forearm

When a stroke patient encounters the motor deficits mentioned above, they have difficulty

making smooth, continuous, and accurate reaching movements (Cirstea & Levin, 2000). Stroke

patients tend to produce reaching trajectories that lack smoothness and continuity by constantly

changing the hand direction and reaching the target in a series of small sequential movements.

Stroke patients are also inclined to have higher variability along the trajectory (i.e. tend to

deviate from the straight path) and lower accuracy in reaching the target compared to able-

bodied persons. In addition, reaching movements are generally slower (Cirstea & Levin, 2000).

Therapists usually apply resistive forces (to emulate load- or weight-bearing) during the reaching

exercise to strengthen the triceps and scapula musculature, which will help to provide postural

support and anchoring for other body movements (Gillen & Burkhardt, 2004), such as pushing

down on a chair to stand up or using stair handrails for support.

There are two types of fatigue in the rehabilitation literature: 1) objective, which corresponds to

the observable and measurable decrement in performance during the repetition of a physical or

mental task; and 2) subjective, which refers to the feeling of exhaustion, weariness, and aversion

to effort caused by the neurological impairment (Dobkin, 2008). Fatigue is a very common

occurrence in post-stroke rehabilitation that can affect functional abilities and thus, the

rehabilitation progress (Barnes et al., 2005). Therefore, it is important for therapists to monitor

stroke patients for signs of fatigue during rehabilitation since the more fatigued the patient is, the

more slowly they will improve. Some signs of fatigue are an (unconscious) increase in

compensatory movements, lack of control, slowing in the pace of movement, or a disengagement

from the activity (Dobkin, 2008).

The general progression during conventional reaching rehabilitation is to gradually increase

target distance, and then to increase the resistance level (D. Hébert, personal communication,

August 10, 2007). If patients are showing signs of fatigue during the exercise, therapists will

typically let patients rest for a few minutes and then continue with the therapy session. The goal

36

is to have patients successfully reach the furthest target distance at the maximum resistance level,

while performing the exercise with control and proper posture.

This thesis solely focused on compensatory strategies, control, and time to provide clues about

fatigue as described above. The signs of compensation were defined as trunk rotation and

flexion, and shoulder abduction and internal rotation; control was defined as the amount of

deviation along the reaching trajectory; and time was used to calculate the duration of reaching

movements. Although motivation, or a lack thereof, can also be a sign of user fatigue, it was not

included in this thesis as it would have resulted in the addition of another unobservable variable,

further complicating the POMDP model.

In the following sections and in the rest of the thesis, a trial is defined as one repetition of the

reaching exercise from the initial position (Figure 5.1a) to the final position (Figure 5.1b), and

then back to the initial position.

5.1.2 Development of the Robotic System

A novel robotic system was designed to automate the reaching exercise as well as to capture any

compensatory events. The system is comprised of three main components: the robotic device,

which emulates the load-bearing reaching exercise with haptic feedback, the postural sensors,

which identify abnormalities in the upper extremities during the exercise, and the virtual

environment, which provides the user with visual feedback of the exercise on a computer

monitor.

Figure 5.2 shows the actual diagram of the rehabilitation device. It features a non-restraining

platform for better usability and freedom of movement, and has two active and two passive

DOFs, which allow the reaching exercise to be performed in 3D space (Lam et al., 2008). The

robotic device also incorporates haptic technology, which provides feedback through sense of

touch. Haptic refers to the modality of touch and the sensation of shape and texture of virtual

objects (McLaughlin, Hespanha, & Sukhatme, 2001). For the purpose of this thesis, the haptic

device provided resistance and boundary guidance for the user during the exercise, which was

performed only in 2D space (in the horizontal plane parallel to the floor).

37

Figure 5.2: Actual diagram of rehabilitation robot (A) with end-effector (B)

Rotational encoders in the end-effector of the robotic device provided data to indicate shoulder

abduction and internal rotation of the user during the exercise (Lam et al., 2008). These

compensatory strategies affect the rotational hand position on the end-effector. By monitoring

the rotational axis of the end-effector, it was deduced when the user was experiencing shoulder

38

abduction and internal rotation, as these two movements are usually coupled together (Lam et al.,

2008).

Unobtrusive trunk sensors, as seen in Figure 5.3, provided data to indicate trunk rotation and

flexion (Lam et al., 2008). The trunk sensors are comprised of three photo-sensitive resistors

that are taped to the back of a chair, each in one of three locations: the lower back, lower left

scapula, and lower right scapula (Figure 5.3a). They are placed in these locations to distinguish

between left and right rotation and more severe flexion when the lower back is displaced (Lam et

al., 2008). The detection of light during the exercise indicated trunk rotation and flexion, as it

meant a gap was present between the chair and the user (Figure 5.3b).

Figure 5.3: Trunk photoresistor sensors placed in three locations: lower back, lower left scapula, and lower right scapula (a) (© Lam, 2007 – use of picture is by permission of the copyright holder) and its detection of light (b)

Lastly, the VE provided the user with visual feedback on target location and hand position during

the reaching exercise. Figure 5.4 shows a close up diagram of the VE. The reaching exercise

was disguised in the form of a 2D bull’s eye game. The goal of the game was for the user to

move the robot’s end-effector, which corresponds to the cross-tracker in the VE, to the bull’s eye

39

target. The rectangular box is the virtual (haptic) boundary, which kept the cross-tracker within

those walls during the exercise.

Figure 5.4: Virtual environment

For more details on the development of the robotic system, refer to Lam (2007).

5.1.3 Definition of the POMDP Model

In order for an AI system to operate effectively, the state of the world must be represented in a

way that accurately describes the task of interest to the system. The more accurately this

information is captured, the more successful the system will operate. The POMDP reaching

exercise model was constructed by:

40

• defining variables used to describe the state of the rehabilitation environment,

• determining appropriate actions for the system to take, and

• estimating the dynamics of the environment.

Two versions of the POMDP reaching exercise model were constructed during this thesis. The

first model, named STroke REhabilitatioN Guidance Tool in Haptic ENvironment

(STRENGTHEN), was performing fairly well, but made some decisions and estimated some

variables that were not aligned with those of conventional reaching therapy as described

previously in Section 5.1.1 (i.e. increase target distance first, then resistance level; and increase

the level of fatigue when signs were evident). A great deal of time was spent in modifying the

transition probabilities as an attempt to correct these problems. The second model, named

intelligent STroke Rehabilitation Exericse TeCHnology (iSTRETCH), attempted to improve the

performance of the first model as well as to find a more efficient way to modify the model and

its variables by eliminating the need of explicitly specifying every transition probability. This

was done by first identifying the underlying structure from the first model and then representing

it as a basic parametric form, the sigmoid function. In the end, iSTRETCH seemed to be

performing more in line to that of conventional reaching rehabilitation compared to

STRENGTHEN.

The remaining section presents the methods used to define both models, followed by a

comparison in performance of the two models.

5.2 STRENGTHEN Model The STRENGTHEN model was the first of two models developed during this thesis and can be

seen in Figure 5.5 as a dynamic Bayesian network (DBN). The model is described in further

detail below.

41

Figure 5.5: STRENGTHEN (POMDP) model as a DBN. It consists of the state, S, represented by a combination of ten variables; the actions, A; the observations, O; the reward function, R; and the dynamics, represented by the arrows. Variables at the next time step, t+1, are denoted with an apostrophe (e.g. hat’).

5.2.1 Definition of the Variables

The system was modelled as a discrete POMDP. Variables were chosen to meaningfully capture

the aspects of the reaching task that the system would require in order to effectively guide a

stroke patient during the exercise. A state is represented by a combination of instantiations of

42

each variable. All possible unique combinations make up the state space, which is every

possible state the system could be in. The following ten variables were chosen to appropriately

represent the exercise based on discussions with the OTs and PTs at TRI. The variable name is

shown in bold with its short form in bold parentheses. The different possible instantiations of

the variables are shown in braces, followed by a description of what the variable represents.

1. target distance (d) : {d1, d2, d3}

Denotes the locations of the targets, which are positioned only along the straight, saggital

path. The targets are separated by equal distances from each other, where d=d1 is the

closest to the starting position and d=d3 is the furthest.

2. resistance level (r) : {none, min, max}

Denotes the level of resistance applied during the exercise, where r=none has a force of 0

Newtons (N), r=min has a force of 1 N, and r=max has a force of 3 N.

3. hand-at-target (hat) : {yes, no}

Indicates whether the user’s hand on the robot’s end-effector has reached the target or

not.

4. fatigue (fat) : {yes, no}

Indicates whether the user is fatigued or not.

5.-7. user’s range at a particular resistance level (n(r)) : {none, d1, d2, d3}

Denotes the range or ability of the user, which depends on the target distance and

resistance level as shown in the following:

• user’s range at zero resistance (n(none)) : {none, d1, d2, d3}

• user’s range at minimum resistance (n(min)) : {none, d1, d2, d3}

• user’s range at maximum resistance (n(max)) : {none, d1, d2, d3}

43

The range is determined by the furthest target distance the user is able to reach at a

particular resistance level. For example, if r=min and the furthest target distance the user

can reach is d=d1, then the user’s range is n(min)=d1.

8. time-to-target (ttt) : {none, slow, norm}

Denotes the time it takes the user to reach the target from the starting position. Note that

ttt=none indicates that the user has failed to reach the target.

9. control (ctrl) : {none, min, max}

Indicates the user’s control level by their ability to keep on the straight path, from the

starting position to the target.

10. compensation (comp) : {yes, no}

Indicates any compensatory action (improper posture) in which the user performs during

the exercise. Signs of compensation are trunk rotation and flexion, and shoulder

abduction and internal rotation.

5.2.2 Definition of the Actions

There are ten possible actions the system can take. These are comprised of nine actions of which

each is a different combination of setting a target distance (three values) and resistance level

(three values), and one action to stop the exercise when the user is fatigued. Specifically, these

actions are:

1. setd1resnone (sets d=d1 and r=none)



4. setd1resmin (sets d=d1 and r=min)



44

7. setd1resmax (sets d=d1 and r=max)



10. stop (terminates the exercise)

5.2.3 Definition of the Observation Variables and Observation Function

The system has four observation variables that are fully observable, thus, making the observation

function deterministic (i.e. certain probability of 1.0). The observation variables OH={yes, no},

OT={none, slow, norm}, OCT={none, min, max}, and OCO={yes, no} correspond to the state

variables hat, ttt, ctrl, and comp, respectively. In other words, the state variables are actually the

observation variables. For example, if the system observed that the user reached the target

(OH=yes) in normal time (OT=norm) with maximum control (OCT=max) and no compensation

(OCO=no), then the state variables would be hat=yes, ttt=norm, ctrl=max, and comp=no with a

probability of 1.0. However, although the observations are fully observable, the states are still

not known with certainty since both the fatigue and user range variables are unobservable and

must be estimated.

5.2.4 Definition of the Transition Function

The transition function of the system was determined by first defining the interaction between

the different variables, namely how each variable affects other variables at the next time step,

and then estimating the probability of those effects.

The general dynamics of the reaching exercise variables are described in the Table 5.1. These

dynamics were discussed with the OTs at TRI.

45

Table 5.1: Description of the variable dynamics in the reaching exercise

Variable(s) Description of Dynamics

d and r • deterministically set by the action

fat • increasing the target distance and resistance level beyond the user’s current

range will increase the rate of the user becoming fatigued; in turn, setting d

and r at or below the current range will decrease the rate of the user becoming

fatigued

• the fatigue level of the user will slowly increase due to time passing (i.e.

repetition of the exercise)

hat • increasing the target distance and resistance level beyond the user’s current

range will increase the likelihood of the user failing to reach the target; in turn,

setting d and r at or below the current range will increase the chance of the

user successfully reaching the target

• the more fatigued the user is, the less likely the user will be able to reach the

target, and vice versa

n(r) • increasing the target distance and resistance level at or just beyond the user’s

current range will cause their range to slowly increase; for example, if the

user’s range is at d=d3 at a particular distance, then practicing at that distance

and resistance level will cause their range to increase at the next higher

resistance level from none to d1

• the range of the user will slowly increase due to time passing

ttt, ctrl, and

comp

• increasing the target distance and resistance level beyond the user’s current

range will increase the probability that the user will take longer to reach the

target, have less control, and display compensatory strategies; in turn, setting d

and r at or below the current range will increase the chance of the user

46

reaching the target in normal time, with maximum control, and no

compensation

• the more fatigued the user is, the less likely the user will be able to reach the

target in normal time, with maximum control, and without compensation, and

vice versa

The descriptive dynamic relationships listed above are represented as the arrows shown in Figure

5.1 for all actions except stop. The stop action just resets the fat variable to its initial distribution

(fat=yes with probability 0.05 and fat=no with probability 0.95), and the ranges are carried

forward. It is assumed that after resting, the user is no longer fatigued and the ranges are kept

the same.

The transition probabilities for the interactions captured in the DBNs were estimated and then

specified in conditional probability tables (CPTs). A CPT describes the probability of each value

of a variable occurring at t+1 for a specific action, given the values of influencing variables at the

prior time step, t. It is not feasible to list out the entire CPT for this POMDP model as it is very

large containing approximately 678 probabilities per action, thus, resulting in a total of 6,780

probabilities. However, instead of listing all the probabilities, an example of how a CPT is

constructed can be seen in Appendix I.

5.2.5 Definition of the Reward Function

The reward function was constructed to motivate the system to guide the user to exercise at the

maximum target distance and resistance level, with maximum control and no compensation. As

such, the system was given a large reward for getting the user to reach the furthest target distance

(d=d3) at maximum resistance (r=max). Rewards were also given when the user reached the

target in normal time, with maximum control, and without compensation. However, none was

given if ttt=none, ctrl=none, and comp=yes. The system also did not get rewarded (or very little)

for setting target distances and resistance levels less than the user’s range as this would hinder

the progress of the system towards the end. A penalty was given when the user became fatigued.

This penalty was assigned so that the system would set the target and resistance at a level where

47

the user had a chance of reaching with little fatigue, rather than simply setting the target and

resistance at the maximum value where the likelihood of being fatigued would be high. Table

5.2 shows the reward function in the final version of the STRENGTHEN model, with positive

values considered a reward and negative values considered a cost.

Table 5.2: Reward function for STRENGTHEN model

Aspect Definition Reward Value

r=none 1

r=min 15

Larger rewards were given for

setting r higher

r=max 250

d=d1 1

d=d2 5


setting d higher

d=d3 10

hat=yes 1 Reward was given when user

reached target hat=no 0

ttt=none 0

ttt=slow 0.5

Larger rewards were given when

user reached target in normal time

ttt=norm 1

ctrl=none 0

ctrl=min 1


user had control

ctrl=max 2

48

comp=yes 0 Reward was given when user did not

compensate comp=no 2

fat=yes -14 Penalty was given for user being

fatigued fat=no 0

• r=none; d=d1; n(none)=none

• r=min; d=d1; n(min)=none

• r=max; d=d1; n(max)=none

1

• r=none; d=d1; n(none)=d1

• r=min; d=d1; n(min)=d1

• r=max; d=d1; n(max)=d1

0.4




0.1




0

Did not get rewarded (or very little)

for setting d and r less than n(r)




1

49




1




0.4




0.1




1




1




1

• r=none; d=d3; n(none)=d3 0.4

50



5.2.6 Computation of the STRENGTHEN Model

After the dynamics of the POMDP model were defined, the model had to be solved. Solving the

model results in a fixed policy, which describes what action the agent should take at each state.

5.2.6.1 Selection of the Solution Method

Unique combinations of instantiations of the variables represent all the different possible states

of the rehabilitation exercise that the system can be in. For this model, there were 41,472

possible states.

There are several algorithmic methods for finding optimal POMDP policies as discussed in

Lovejoy (1991). However, the size of the rehabilitation exercise renders optimal solutions

impossible. The two sources of intractability that plague classic algorithms are:

• complex value function and policy representations, where the number of α-vectors

(representing value functions) may grow exponentially with the observation space and

double exponentially with the horizon, and

• a large state space, where the number of states is exponential in the number of state

variables (recall that each state corresponds to a unique combination of variables)

(Poupart, 2005).

Hence, approximations had to be used to solve the model. There are several methods proposed

in the literature that compactly represent the complexity of policy and value function spaces with

a small number of α-vectors, and others that exploit the POMDP structure to mitigate the

complexity of state spaces (Poupart, 2005). However, none of these techniques address both

sources of intractability simultaneously and as a result, they cannot solve much larger or more

difficult POMDPs.

51

The method chosen to solve this POMDP model can effectively overcome both sources of

intractability concurrently and is based on an algorithm called symbolic Perseus. Developed by

Poupart et al. at the University of Toronto, this technique is able to efficiently exploit the

structure of large POMDPs by first representing the model as an algebraic decision diagram

(ADD), and then employing a randomised point-based value iteration algorithm using the ADDs

to solve the model (Poupart, 2005). This point-based value iteration is based on the Perseus

algorithm developed by Spaan et al. for flat POMDPs (Spaan & Vlassis, 2005).

ADDs are able to compactly represent the dynamics and rewards of the POMDP model by

exploiting their regularities. This can be leveraged by determining the conditional

independencies between variables in the DBN and the additive separability of the reward

function. Conditional independence refers to the fact that some variables are probabilistically

independent of each other when the values of other variables are held fixed. This feature

contributes to the reduction of the size of the CPT. Additive separability refers to the fact that

reward functions often decompose into the sum of smaller reward functions, resulting in further

space reduction (Poupart, 2005). ADDs can essentially group these regularities together, as

opposed to explicitly representing them, and hence, can translate into substantial savings in

computational time. A review of ADDs can be found in Hoey, St-Aubin, Hu, and Boutilier

(1999).

The model can now be solved by applying the Perseus method to the ADDs (Poupart, 2005).

However, the concept of α-vectors must be explained before describing the Perseus method.

The value iteration algorithm for solving optimal policies defined in Section 4.2.3 is intractable

for most application problems. Sondik (1971) showed that the value function at any finite

horizon, n, is piecewise linear and convex (PWLC) and can be expressed by a set of vectors:

Γn={α0, α1, …, αm}. Each α-vector represents an |S|-dimensional hyper-plane, and defines the

value function over a bounded region of the belief:

Vn (b) = maxα∈Γn

α(s)b(s)s∈S∑ . (5.1)

In addition, each α-vector maximises the value function in a certain region of the belief and has

an action associated with it, which is the optimal action to take at that particular belief region.

Thus, the optimal value function at horizon n, Vn*(b), is represented by the upper surface of the

52

α-vectors in Γn as shown in an illustrative example in Figure 5.6. For infinite horizon problems

such as the reaching exercise, V*(b) can be approximated well by bounding the number of α-

vectors as this only causes minimal decrease in the quality of the solution (Hoey et al., 2007).

Figure 5.6: Example of an optimal value function in a two-state POMDP. The belief space is a one-dimensional vector of two non-negative numbers that sum to 1 [b(s0) = P(s0) = 1-P(s1)]. The x-axis, therefore, represents the whole belief space on which the value function Vn(b) is defined. The upper surface of the three α-vectors is the optimal value function, Vn*(b), which defines the optimal action to take in a particular belief state. At the belief state, b, the action associated with α2 should be taken.

The Perseus algorithm starts by first collecting a set of reachable belief states (B) by performing

a forward search from the initial belief state. This is done by executing a random policy by

sampling the actions and observations at each step. Then, value iteration is performed on the set

of collected belief points ensuring that in each backup stage the value of each point in the belief

set is improved. A pictorial example of a backup stage is presented in Figure 5.7. The key

feature of this algorithm is that a single backup may improve the value of many points in the set,

53

allowing value functions to be computed with only a small number of α-vectors (relative to the

belief set size) and thus, leading to faster computation time.

Figure 5.7: Example of a Perseus backup stage in a two-state POMDP. The x-axis represents the belief space and the y-axis represents V(b). Solid lines are the α-vectors from the current stage and dashed lines are the α-vectors from the previous stage. There are seven belief states {b1,…,b7} which comprise the set of reachable belief points (B) indicated by the tick marks. The backup stage computing Vn+1 from Vn proceeds as follows: (a) the value function at stage n; (b) the computation of Vn+1 starts by sampling b6, which produces an α-vector that improves the values of b6 and b7; (c) b3 is then sampled, which produces an α-vector that improves the values of b1 through b5; and (d) the values of all b ∈ B has improve and thus, the backup stage at n+1 is completed (© AI Access Foundation, 2005 – use of picture is by permission of the copyright holder).

5.2.6.2 Iteration Process and Solving the Model

Solving the model created the policy, which maps belief states into actions. Akin to a lookup

table, the policy determines which action the agent should take next when it is in a particular

54

state. The POMDP designed for the reaching task was solved using symbolic Perseus as

described in Section 5.2.6.1.

The complete model was built up through many iterations. At each iteration, new variables were

added, variable dynamics were changed, transition probabilities were adjusted, or the reward

function was modified, and the resulting partial model was solved. This allowed for debugging

of the model dynamics through the analysis of the resulting policies of the partial models.

There were a total of 63 partial versions of the model before the complete model was

successfully solved.

The final model was sampled with a set of 3,000 belief points that was generated from 20

different initial belief states: one for every range possibility. It had a discount factor (β) of 0.95

and was solved with MATLAB® on a dual AMD Opteron™ (2.4GHz) CPU using 100 α-vectors

and 150 iterations in approximately 5.18 hours.

5.3 iSTRETCH Model The iSTRETCH model was the second of two models developed during this thesis and can be

seen in Figure 5.8. It was evolved from the STRENGTHEN model. A detailed description of

the iSTRETCH POMDP model is discussed below.

55

Figure 5.8: iSTRETCH (POMDP) model as a DBN. It consists of the state, S, represented by a combination of nine variables; the actions, A; the observations, O; the reward function, R; and the dynamics, represented by the arrows.

5.3.1 Definition of the Variables

From the STRENGTHEN model, it was realised that the variables d and r could be represented

by one variable, stretch, that captured the difference between the target distance set by the action

and the user’s range at the resistance set by the action.

56

1. stretch beyond user’s range (stretch) : {+9, +8, +7, +6, +5, +4, +3, +2, +1, 0, -1, -2}

Indicates the amount the system is asking the user to go beyond their current range. For

example, if the user’s range is n(min)=d1, then setting the target at d=d2 at resistance

r=min is a stretch of 1.0, while setting the target at d=d1 at resistance r=max is a stretch

of 3.0. Note that stretch is a direct function of both target distance, d={d1, d2, d3}, and

resistance level, r={none, min, max}: it is a joint measure of how much a particular

distance and resistance are going to push a user beyond their range.

The variables hat and ttt from the STRENGTHEN model were combined into just ttt since hat

was really subsumed by ttt. If ttt=slow or ttt=norm, this implies that the user has reached the

target (hat=yes), and ttt=none indicates that the user did not reach the target (hat=no). In the

STRENGTHEN model, the combination of hat=yes and ttt=none were not reachable, meaning

that these states were impossible to achieve. Recognising these unreachable states allowed for a

more compact representation of the model in the iSTRETCH version.

2. time-to-target (ttt) : {none, slow, norm}

Denotes the time it takes the user to reach the target from the starting position. Note that

ttt=none indicates that the user has failed to reach the target.

A new variable, learnrate, was added to the iSTRETCH model to estimate how quickly the user

progresses during the reaching exercise. In the previous model, this estimation was hard-coded in

the model.

3. learning rate (learnrate) : {lo, med, hi}

Indicates how quickly the user is progressing.

The remaining six variables are the same as in the STRENGTHEN model.

4. fatigue (fat) : {yes, no}

Indicates whether the user is fatigued or not.

57

5.-7. user’s range at a particular resistance level (n(r)) : {none, d1, d2, d3}

Denotes the range or ability of the user, which depends on the target distance, d={d1, d2,

d3}, and resistance level, r={none, min, max}, as shown in the following:

• user’s range at zero resistance (n(none)) : {none, d1, d2, d3}

• user’s range at minimum resistance (n(min)) : {none, d1, d2, d3}

• user’s range at maximum resistance (n(max)) : {none, d1, d2, d3}

The range is determined by the furthest target distance the user is able to reach at a

particular resistance level. For example, if r=min and the furthest target distance the user

can reach is d=d1, then the user’s range is n(min)=d1.

8. control (ctrl) : {none, min, max}

Indicates the user’s control level by their ability to keep on the straight path, from the

starting position to the target.

9. compensation (comp) : {yes, no}

Indicates any compensatory action (improper posture) in which the user performs during

the exercise. Signs of compensation are trunk rotation and flexion, and shoulder

abduction and internal rotation.

5.3.2 Definition of the Actions

The ten actions for iSTRETCH are the same as those in the STRENGTHEN model. Again, these

actions are:





58






10. stop (terminates the exercise)

5.3.3 Definition of the Observation Variables and Observation Function

This model has essentially the same observation variables as the previous model, except for the

OH variable since the hat variable was removed. Now there are three fully observable

observation variables, OT={none, slow, norm}, OCT={none, min, max}, and OCO={yes, no}, that

correspond to the state variables ttt, ctrl, and comp, respectively. Again, the observation function

is deterministic, thus the state variables ttt, ctrl, and comp, are actually the observation variables.

5.3.4 Definition of the Transition Function

Instead of explicitly using CPTs to describe the transition probabilities of the variables as

described in the STRENGTHEN model in Section 5.2.4, the transition probabilities in the

iSTRETCH model were automatically generated by using a simple parametric function. The

performance and fatigue variables in the iSTRETCH model were functions of stretch and fat.

For example, if the user is not fatigued and the system sets a target with a stretch of 0 (i.e. at the

user’s range), then the user might have a 90% chance of reaching the target at normal time

(ttt=norm). However, if the stretch is set to 1, then this chance might decrease to 50%. Even if

the stretch is 0, but the user is fatigued, the chance of reaching the target at ttt=norm will also

decrease. This idea was applied to the other variables modelling the user’s control and

compensation, and even their fatigue levels. Certainly, a larger stretch will increase the

probability of the user becoming fatigued.

59

The sigmoid function was used as the common parametric function, which relates stretch and

fatigue levels to user performance. This function, named the pace function, φ(st,f), is a function

of stretch, st, and fatigue level, f:

φ(st, f ) =1

1+ e−(st−m−m( f )) /σ st (5.2)

where m is the mean stretch (the value of stretch for which the function φ is 0.5 if the user is not

fatigued), m(f) is a shift dependent on the user’s fatigue level (e.g. 0 if the user is not fatigued),

and σst is the slope of the pace function. There was one sigmoid function for every value of each

variable.

For each pace function, there were three parameters that needed to be specified: m, σst, and m(f),

where the latter is technically a function, but since the fatigue variable is a binary value in this

model, it is a single real-valued parameter. However, it was simpler to specify the pace function

in terms of upper and lower pace limits: the values of stretch where a user’s performance will

vary by a certain probability when the user is not fatigued (m(f)=0). For example, the upper pace

limit for a user to compensate (comp=yes) when not fatigued is the stretch at which the user will

compensate with a probability of φ+. Similarly, the lower pace limit for comp=yes is the stretch

at which the user will compensate with a probability of φ- (so succeed in reaching the target with

comp=no with a probability of 1-φ-). Denoting the upper and lower pace limits by st+ and st-,

respectively, the following two equations were derived:

φ+ =1

1+ e−(st+ −m) /σ st (5.3)

φ− =1

1+ e−(st− −m) /σ st (5.4)

which could be solved for m and σst:

m =st+β− − st−β+

(β− − β+) (5.5)

60

σ st =st+ − st−

(β+ − β−) (5.6)

where β+ = ln φ+

1− φ+

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟ and β− = ln φ−

1− φ−

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟ .

The fatigue effect, m(f), was the last parameter to specify, and is a negative number that shifts the

pace function downwards. The amount of shift indicates the amount the pace limits will be

shifted down when the user is fatigued. Figure 5.9 shows an example pace function for

comp=yes. Notice that both pace limits decrease when the user is fatigued (at the same

probability). In other words, the user is more likely to compensate when fatigued.

61

Figure 5.9: Example pace function for comp=yes, with φ+ = 0.9, φ- = 0.1, st+ = +3, st- = -1, m(f=yes) = 0.8, and m(f=no) = 0.0. Shown are the upper and lower pace limits, and the pace function for each condition of fat.

For the variables with three values, such as ttt and ctrl, two pace function need to be specified,

one for the lowest value and one for the highest. The middle value gets what is left of the

probability mass. Figure 5.10 shows an example pace function for ttt.

62

Figure 5.10: Example pace function for ttt, with φ+ = 0.9, φ- = 0.1, and m(f=no) = 0.0. Shown are the upper (st+ = -3) and lower (st- = +2) pace limits for ttt=norm, and the upper (st+ = +4) and lower (st- = +1) pace limits for ttt=none. The pace function for ttt=slow gets what is left of the probability mass.

The ranges in the current model were modelled separately, although they could also use the

concept of pace functions. They were modelled such that the higher learning rate, the faster the

user’s range increased and vice versa. Setting target distances at or just above the user’s current

range will cause the learning rate to increase (i.e. progress towards learnrate=hi) and in turn,

cause the range to slowly increase. However, when the user is fatigued, their range does not

progress as well.

63

5.3.5 Definition of the Reward Function

The reward function in the iSTRETCH model was constructed in the same way as the

STRENGTHEN model – to encourage the system to guide the user to exercise at maximum

target distance and resistance level, while performing the task with maximum control and

without compensation. Therefore, the system was given a large reward for getting the user to

reach the furthest target at maximum resistance (similar to the previous model). Smaller rewards

were given when targets were set at or above the user’s current range (i.e. when stretch >= 0),

and when the user was performing well (i.e. ttt=norm, ctrl=max, comp=no, and fat=no).

However, no reward was given when the user was fatigued, failed to reach the target, had no

control, or showed signs of compensation during the exercise. The system also did not get

rewarded for negative stretches as this would delay the progress towards its goal. Table 5.3

shows the reward function in the final version of the iSTRETCH model.

64

Table 5.3: Reward function for iSTRETCH model

Aspect Definition Reward Value

r=none 1

r=min 14


setting r higher

r=max 80

d=d1 1

d=d2 9


setting d higher

d=d3 11

ttt=none 0

ttt=slow 0.8


user reached target in normal time

ttt=norm 1

ctrl=none 0

ctrl=min 0.3


user had control

ctrl=max 1

comp=yes 0 Reward was given when user did not

compensate comp=no 1

fat=yes 0 Reward was given when user was

not fatigued fat=no 1

65

stretch= -2 0

stretch= -1 0

stretch= 0 0.4

stretch= +1 1

stretch= +2 1

stretch= +3 1

stretch= +4 1

stretch= +5 1

stretch= +6 1

stretch= +7 1

stretch= +8 1

Small rewards were given when d

and r were set at or above n(r); none

were given when d and r were set

below n(r)

stretch= +9 1

5.3.6 Computation of the iSTRETCH Model

5.3.6.1 Selection of the Solution Method

The model was solved using the same solution method as the previous model – symbolic

Perseus. This approximation method had to be used as iSTRETCH had 82,944 possible states,

which was double that of the STRENGTHEN model.

5.3.6.2 Iteration Process and Solving the Model

The full model was again built up through many iterations. This time at each iteration, either the

reward function was modified or the transition probabilities were changed through the

adjustment of the various pace limits. After each modification, the resulting partial model was

66

solved. There were a total of 64 partial versions of the model before the full model was

successfully solved.

The final model was sampled with a set of 3,000 belief points that was again generated from 20

different initial belief states (one for every range possibility). The iSTRETCH model also had a

discount factor (β) of 0.95 and was solved with MATLAB® on a dual AMD Opteron™

(2.4GHz) CPU using 150 α-vectors and 150 iterations in approximately 13.96 hours.

5.4 Comparison of STRENGTHEN and iSTRETCH Models Once the policy of a model is solved, there needs to be a way to determine how well the model is

performing in real-time. To do this, a simulation program was developed in MATLAB® which

was based on the decision cycle of a POMDP agent described in Section 4.2.2. The simulation

starts with an initial belief state and then the POMDP decides on an action for the system to take

(predetermined by the policy). Observation data is manually entered since the POMDP model

had not yet been integrated with the robot (once integrated, the POMDP would automatically

receive the observation data from the robotic system). A new belief state is then computed, the

next action is determined, and the cycle keeps continuing. If the action is to stop the exercise,

the simulation program resets the fatigue variable (i.e. user is un-fatigued after resting), carries

over the ranges, and the decision cycle starts once again. The following section discusses the

comparison of the STRENGTHEN and iSTRETCH models through a few simulation examples.

Note that although both models behave quite differently, every attempt was made to keep the

observation input the same for both models for direct comparison.

The performance of both the models was subjectively rated by the researcher and focused on

whether the system was in line to that of conventional reaching rehabilitation by:

• gradually increasing target distance first, then resistance level as the user performed well

(i.e. reached target with normal time, maximum control, and no compensation), and

• increasing the rate of fatigue if the user was not performing well (i.e. failed to reach the

target, had no control, or compensated).

67

In the first simulation example (Figures 5.11 and 5.12 show a portion of the entire simulation),

the user is assumed to have trouble reaching the maximum target, d=d3, at zero resistance,

r=none. The simulation of both models starts with the same initial belief state, which assumes

that the user’s range at each resistance (i.e. n(none), n(min), and n(max)) is likely to be none, and

that the user is not fatigued with a 95% probability. There are two addition variables in the

iSTRETCH model, stretch and learnrate, that are also estimated. In both models, the POMDP

slowly increases the target distance from d1, to d2, and then to d3 while keeping at the same

resistance level (r=none) when the user successfully reaches the target in normal time, with

maximum control, and with no compensation. However, according to the initial user

assumption, at d=d3 the user fails to reach the target (i.e. OH=no and OT=none), has minimum

control (OCT=min), and does not compensate (OCO=no). The updated belief state is shown in

Figure 5.11a for the STRENGTHEN model and Figure 5.11b for the iSTRETCH model. In the

STRENGTHEN model, the POMDP decides to reduce the target to d1 since the user had trouble

reaching d3. Here, the user has no problem reaching d1 and the updated belief state is shown in

Figure 5.12a. In the iSTRETCH model, after the user failed to reach d3, the POMDP decides to

keep the same target at d3 since stretch is about 75% likely to be 0 (i.e. at the user’s range).

Again, based on the initial user assumption, the user fails to reach the target with minimum

control and no compensation. This updated belief state is shown in Figure 5.12b.

68

STRENGTHEN

(a)

iSTRETCH

(b)

Figure 5.11: (a) Updated belief state of n(r) and fat after the user failed to reach d=d3, had minimum control and no compensation. The POMDP decides to set the next action at d=d1 at the same resistance (r=none); (b) Updated belief state of n(r), stretch, fat, and learnrate after the user failed to reach d=d3, had minimum control and no compensation. The POMDP decides to set the next action at d=d3 at the same resistance (r=none).

69

STRENGTHEN

(a)

iSTRETCH

(b)

Figure 5.12: (a) Updated belief state after the user successfully reached d=d1, with maximum control and no compensation; (b) Updated belief state after the user failed to reach d=d3, with minimum control and no compensation.

70

Although the decisions of both models seem believable (i.e. gradually increased target distance

before resistance level), the belief state of the fatigue variable in the STRENGTHEN model does

not. After the user failed to reach d3 in Figure 5.11a, the user was about 30% likely to be

fatigued. However, as soon as the user reached d1 in the next time step, the percentage of the

user being fatigued was reduced to less than 10% as seen in Figure 5.12a. The dynamics of the

fatigue variable in the STRENGTHEN model do not resemble that of real-life situations. When

a user is fatigued in the real world, it is impossible for them to get un-fatigued in the next time

step. Because of these dynamics, the POMDP produces the same actions (i.e. increase target

distance from d1, to d2, to d3, and then back down to d1) over and over again if the user’s

observations are the same (i.e. successfully reaches the target set at d1 and d2, but fails to reach

the target at d3 with minimum control and no compensation). The only way the user will

become fatigued (and the action stop is determined) is if the user fails to reach the target at d2

instead of d3. The complete first simulation example of the STRENGTHEN model can be seen

in Appendix II. On the other hand, the fatigue variable in the iSTRETCH model was behaving

quite well during this simulation example. Although the POMDP did not decrease the target as it

did in the other model so a direct comparison of fatigue cannot be established in this way, the

general behaviour of fatigue can be commented on. After each time step in the beginning of the

simulation, the fatigue level increased slowly due to progression of time. When the user failed to

reach d=d3 the first time (Figure 5.11b), the level of fatigue jumped from about 10% to 25%.

After the second failure, (Figure 5.12b), the level of fatigue increased even more to about 40%.

Thus, iSTRETCH was performing quite well in terms of the performance criteria, as the rate of

fatigue increased when the user did not perform well. The complete first simulation example of

the iSTRETCH model is also shown in Appendix II.

In the second simulation example (Figures 5.13, 5.14, and 5.15 show a portion of the entire

simulation), the user is assumed to be able to reach the maximum target, d=d3, at maximum

resistance, r=max, in the beginning, but then slowly starts to compensate and lose control.

Again, the simulation of both models starts with the same initial belief state, this time assuming

that the user’s range at both zero and minimum resistance (i.e. n(none) and n(min)) is likely to be

d3, and the user’s range at maximum resistance, n(max), is likely to be d1. In addition, the initial

belief state assumes that the user is not fatigued with a 95% probability. From this initial belief

state, both models set the first action to be d=d1 and r=max. According to the initial user

71

assumption, the user successfully reaches this target in normal time, with maximum control, and

with no compensation. In the next time step, the STRENGTHEN model decides to set the target

at d=d3 at the same resistance, skipping d=d2, which does not follow the performance criteria of

gradually increasing the target distance. Conversely, at the next time step, the iSTRETCH model

decides to set the target at d=d2. The iSTRETCH model keeps the target at d2 for one more time

step before setting it at d=d3 (assuming the user reaches the targets at d2 successfully). Both

models decide to set the target at d=d3 for two more time steps, assuming the user successfully

reaches the target, with maximum control, and no compensation. The updated belief state is

shown in Figure 5.13a for the STRENGTHEN model and Figure 5.13b for the iSTRETCH

model. Now, during this time step when both models again decide to set the target at d=d3, the

user starts to compensate but still is able to reach the target with maximum control. The updated

belief state is shown in Figure 5.14a and Figure 5.14b for the STRENGTHEN and iSTRETCH

models, respectively. Again, both models set the same target and the user compensates once

more. Figure 5.15a and 5.15b show the updated belief state for STRENGTHEN and

iSTRETCH, respectively. This time, the iSTRETCH model decides to stop the exercise because

it believes the user is fatigued due to performing compensatory movements for two consecutive

times. However, the STRENGTHEN model continues and still decides to set the target at d=d3.

This time, in addition to reaching the target and compensating, the user starts to lose control

(OCT=min) and takes longer to reach the target (OT=slow). This combination is performed again

in the next time step and the updated belief state can be seen in Figure 5.16. The

STRENGTHEN model decides to set the same target, d=d3, at the same resistance level. The

user starts to lose even more control (OT=none) at this time step and the updated belief state is

shown in Figure 5.17. Notice the transition in the fatigue variable from Figure 5.16 to Figure

5.17. Starting from the initial belief state until Figure 5.16, the fatigue level was gradually

increasing due to time passing, an increase in compensation, and a decrease in control and time

to target. However, as soon as the user showed no control between Figure 5.16 to 5.17, the

fatigue level decreased from a 65% to 55% chance of being fatigued. Again, this does not

follow what is typically seen during conventional therapy. A lack of control indicates the

presence of fatigue as described in Section 5.1.1. Figure 5.18 shows the next time step with the

fatigue level decreasing once again (assuming the same action and observations occurred). In

fact, this model keeps producing the same action over and over again (assuming the same

observations) since the fatigue variable will never be fat=yes. Thus, the exercise will never stop

72

to give the user a rest. The complete second simulation of both the STRENGTHEN and

iSTRETCH models can be seen in Appendix II.

73

STRENGTHEN

(a)

iSTRETCH

(b)

Figure 5.13: (a) Updated belief state of n(r) and fat after the user successfully reached d=d3, with maximum control and no compensation. The POMDP decides to set the next action at d=d3 at the same resistance (r=max); (b) Updated belief state of n(r), stretch, fat, and learnrate after the user successfully reached d=d3, with maximum control and no compensation. The POMDP decides to set the next action at d=d3 at the same resistance (r=max).

74

STRENGTHEN

(a)

iSTRETCH

(b)

Figure 5.14: (a) Updated belief state after the user successfully reached d=d3, with maximum control but this time with compensation. The POMDP decides to set the next action again at d=d3 at the same resistance (r=max); (b) Updated belief state after the user successfully reached d=d3, with minimum control but this time with compensation. The POMDP decides to set the next action again at d=d3 at the same resistance (r=max).

75

STRENGTHEN

(a)

iSTRETCH

(b)

Figure 5.15: (a) Updated belief state after the user again successfully reached d=d3, with maximum control and with compensation. The POMDP decides to set the next action again at d=d3 at the same resistance (r=max); (b) Updated belief state after the user again successfully reached d=d3, with minimum control and with compensation. The POMDP decides to stop the exercise.

76

Figure 5.16: STRENGTHEN model. Updated belief state of n(r) and fat after the user successfully reached d=d3 in slow time, with minimum control, and with compensa on. The POMDP decides to set the next action at d=d3 at the same resistance (r=max).

ti

77

Figure 5.17: STRENGTHEN model. Updated belief state after the user successfully reached d=d3 in slow time, with compensation but this time with no control. The POMDP decides to set the next action at d=d3 at the same resistance (r=max). Notice the reverse in the fatigue level from the previous Figure 5.16.

78

Figure 5.18: STRENGTHEN model. Updated belief state after the user again successfully reached d=d3 in slow time, with no control, and with compensation. The POMDP decides to set the next action at d=d3 at the same resistance (r=max). Notice again the reverse in the fatigue level from the previous Figure 5.17.

Table 5.4 summarises the pros and cons for each model during both simulations. The

iSTRETCH model had more advantages compared to the STRENGTHEN model during

simulation.

79

Table 5.4: Summary of pros and cons of both models during simulation

Model PROS CONS

STRENGTHEN • fatigue level slowly increases

due to passing time

• fatigue level decreases during

simulation (i.e. user gets

unfatigued), which does not

model real-life situations

• rate of fatigue level does not

increase faster when user does

not perform well (i.e. fails to

reach target, takes longer to

reach target, has no control,

compensates)

• target distance does not

gradually increase

iSTRETCH • fatigue level slowly increases

due to passing time

• fatigue level does NOT

decrease during simulation

• rate of fatigue level increases

faster when user does not

perform well (i.e. fails to reach

target, takes longer to reach

target, has no control,

compensates)

• target distance gradually

increases

• perhaps the exercise stops too

fast when user is not

performing well (i.e. fatigue

level progresses too fast to

fat=yes)

80

From a design perspective, the iSTRETCH model also had more advantages than the

STRENGTHEN model. Table 5.5 summarises the computational aspects of each model.

Although the number of partial versions (i.e. changes) for each model is the same, the

iSTRETCH model had fewer modifications relative to its number of states. Therefore, more

time was spent on developing the STRENGTHEN model. The iSTRETCH model took about

2.69 times longer than that of the STRENGTHEN model to computed its policy. However, as

the policy is computed offline, the difference in the number of computational hours was not

critical.

When a problem occurred during simulation, it was difficult to find where the problem lay in the

STRENGTHEN model. There were too many probabilities to work with, as they were explicitly

defined in the CPTs, which made it less intuitive as to where the problem was. iSTRETCH used

a pace function to automatically generate the transition probabilities. This made the model much

easier to modify since there were only a few parameters to change (i.e. pace limits).

Representing the model dynamics as a pace function made it easier to correct the problem.

Table 5.5: Summary of computational aspects of each model

Computation Features STRENGTHEN iSTRETCH

Number of states 41,472 82,944

Hours to compute policy 5.18 13.96

Number of partial versions of the model 63 64

From the comparison of the two models explained above, iSTRETCH was chosen to be the final

POMDP as it seemed to be better than the STRENGTHEN model in terms of performance and

computation. However, this assumption can only be proven through a clinical evaluation of the

comparison of the two models. Due to time constraints, this could not be performed for this

thesis – only the iSTRETCH model was tested.

81

Chapter 6 Integration of the POMDP Model with the Robotic System

6 Integration of the POMDP Model with the Robotic System

Figure 6.1 shows the diagram of the overall reaching rehabilitation system: the robotic system

(Figure 6.1a) and the POMDP agent (Figure 6.1b). As the user performs the reaching exercise,

data from the robotic system is used as observational input to the POMDP agent, where it

estimates the progress of the user and decides on an action for the system to take.

82

Figure 6.1: Diagram of the reaching rehabilitation system consisting of the robotic system (a) and the POMDP agent (b)

6.1 Acquisition of Data from the Robotic System The controller and VE of the robotic device were developed by Quanser Inc. Both were written

in the Python programming language. The controller was responsible for providing feedback

83

control, rendering the VE on the computer monitor, and calculating performance statistics during

the exercise. The VE was designed to reflect the POMDP model such that it could incorporate

three linear targets and three resistance levels. In the end, the reaching exercise took the form of

a 2D, linear bull’s eye game as previously shown in Figure 5.4.

A micro-controller was used to establish the communication between the photoresistor sensors

described in Section 5.1.2 and the computer. The micro-controller (Figure 6.2), a Massachusetts

Institute of Technology’s Handyboard, was programmed in the C language. The sensitivity

threshold for light detection in the photoreistors was designed to detect a gap of approximately

2cm (Lam et al., 2008). The output from the photoresistors were 4-bits in length and the transfer

to the computer was programmed to be bidirectional and asynchronous. The source code of the

micro-controller can be seen in Appendix III.

Figure 6.2: Massachusetts Institute of Technology’s Handyboard (micro-controller)

84

The following parameters from the robotic system were used as observation input to the POMDP

agent:

• Boolean flag to indicate whether target was reached: the robot tracks the position (x,y)

coordinates of the end-effector and if it gets within four millimetres (mm) of the target

position, the flag is set to 1 (i.e. target was reached)

• time: the robot keeps track of the time to determine how long it takes the user to reach the

target from the starting point

• deviation from straight path: is the average amount (in mm) the user strays from the zero

position on the y-axis, calculated from the starting point to the target

• rotation of the end-effector: is the average amount, in degrees, the user rotates the end-

effector from the zero position, calculated from the starting point to the target

• detection of light from the chair: a 4-bit data packet is continuously being sent to the

computer, with each value indicating which photoresistor sensors are detecting light (e.g.

0001 means light is detected from the right photoresistor, 0010 is from the lower back;

0011 is from the right and lower back; 0100 is from the left; 0101 is from the left and

right; 0110 is from the left and lower back; and 0111 is from the left, right, and lower

back)

6.2 Setting the Value Ranges for the Observation Variables Based on the parameters captured by the robotic system described above, the values for the

POMDP observation variables, OH (only in the STRENGTHEN model), OT, OCT, and OCO, were

first chosen by the researcher, and then evaluated by an OT at TRI to verify that the values were

suitable for moderate-level stroke patients. The following values were chosen as the final

observation variables.

The values of the OH variable were determined by the target’s Boolean flag. If the flag was set to

1, OH =yes. If not, OH =no.

85

The ranges for OT were determined from the time parameter as follows:

• OT = norm, if the time to reach the target was between 0 and 7.5 seconds

• OT = slow, if the time to reach the target was between 7.5 and 15 seconds

• OT = none, if the time to reach the target was longer than 15 seconds

Note that the user was given a maximum of 15 seconds to reach the target before a timeout

occurred, in which case the user failed to reach the target (OH =no).

The ranges for OCT were determined from the deviation parameter as follows:

• OCT = max, if the average deviation was between 0 and 7 mm

• OCT = min, if the average deviation was between 7 and 20 mm

• OCT = none, if the average deviation was greater than 20 mm

The values of the OCO variable were determined by both the rotation and light detection

parameters. If the end-effector was rotated by an average of greater than 45 degrees and/or any

of the photoresistors detected light, then OCO =yes. If neither occurred, OCO =no.

6.3 Merging the POMDP Agent with the Robotic Device Controller

The integration of the POMDP agent with the robot’s controller (including the photoresistor

micro-controller) was written in MATLAB®. This MATLAB® program handled:

• the communication of data between the Python controller and the POMDP agent,

• the transfer of data from the micro-controller, and

• the decision cycle of the POMDP agent (belief state update).

86

The following diagram (Figure 6.3) shows the interaction between the POMDP agent and both

controllers of the robotic system. The steps performed during the reaching exercise are

described as follows:

1. From the initial belief state, the POMDP agent decides on an action for the robotic

system to take (determined by the policy computed previously offline).

2. If the action was to stop the exercise, the MATLAB® program would terminate the

exercise cleanly and start from Step 1 after the user rested. If the action was to set a

particular target distance and resistance level, the MATLAB® program would send both

those target coordinates and resistance level to the Python controller, where it would then

set the appropriate target and resistance in both the real (robotic device) and virtual

environments.

3. As the user performs the reaching exercise by trying to reach the target, parameters from

the robotic controller and micro-controller, as described above, are sent as observation

input to the POMDP agent.

4. The POMDP agent then takes this observation input and determines the values of OH

(only in the STRENGTHEN model), OT, OCT, and OCO from the prescribed values

described in Section 6.2.

5. Now, the POMDP agent has enough information (i.e. the current observation, the

previous action, and the previous belief state) to compute the new belief state (as

described in Section 4.2.2.1).

6. From the new belief state, the POMDP agent decides on the next action for the system to

take (again, determined by the policy), and steps 2-6 are repeated.

87

Figure 6.3: Interaction between POMDP agent and robotic controller

88

Chapter 7 Evaluation Study

7 Evaluation Study Once the POMDP was successfully integrated with the robotic system, it was necessary to

develop a method for evaluating the decisions being made by the POMDP agent.

7.1 Questions to be Answered by the Study The study was designed to provide insight into the following questions:

1. Does the POMDP make decisions comparable to those made by human therapists in

guiding stroke patients during a reaching exercise?

2. Are there any aspects of the system which seem to get more positive or negative feedback

from patients and therapists?

3. What do these results mean in terms of future development of the POMDP model and

overall system?

7.2 Participants Twelve (12) stroke survivors (hereafter referred to as patient-participant(s) or patient(s)) and

twelve (12) occupational and/or physical therapists (hereafter referred to as therapist-

participant(s) or therapist(s)) were intended to be recruited for this study from the University

Centre of the Toronto Rehabilitation Institute, located in Toronto, Canada. The sample size was

chosen based on a statistical power analysis of a one sample, one tail t-test assuming a large

effect size of 0.8 (Cohen, 1992), with a significance level of 0.05 and meaningful power of at

least 0.8 (Faul, Erdfelder, Lang, & Buchner, 2007). The one sample t-test was chosen for this

study because it can test whether the observed mean (expressed as a percentage of agreement) is

89

distinctly different (large effect size) than a hypothetical value of 100%. To be included in this

study, therapists must have had at least one recent year of experience with conducting reaching

motion therapy for upper-limb stroke rehabilitation, was fluent in English, worked at TRI, and

was able to consent.

To be included in this study, patient-participants must:

• have been right-side hemiparetic resulting from unilateral stroke at least 6 months before

enrollment

• have scored between 3 to 5 (inclusive) on the arm section of the Chedoke-McMaster

Stroke Assessment (CMSA) Scale (Gowland et al., 1993).

• have been able to move to some degree, but still have impaired movements as determined

by their therapist

• have been fluent in English, such that they could understand and respond to simple

instructions

Patient-participants were excluded from the study if s/he:

• exhibited a hearing and/or visual impairment that may have interfered with their ability to

understand verbal instructions and/or observe graphics on the computer screen

• had a history of physical aggression, agitation and/or exit seeking behaviour

• experienced any upper-limb joint pain or range-of-motion limitations on the affected side

that may have interfered with the ability of the person to perform linear-reaching task

movements

Patient-participants were asked to continue any outpatient therapies in which they were enrolled

in at the time of study acceptance.

90

7.3 Testing Methodology Each patient participant was randomly paired up with one of the participating therapists, each

creating a unique patient-therapist pair. Each session lasted for approximately 1 hour and

10 minutes and was intended to be completed once per day, three times a week during weekdays

(Monday through Friday) for four weeks.

Another therapist (hereafter referred to as the CMSA-therapist), who was not one of the

therapist-participants, was responsible for administering the CMSA on each stroke subject at

three time points in the study: at the start (week 0), in the middle (week 2), and at the end (week

4). The length of time for this assessment was approximately 20-30 minutes for the arm. While

it is to be noted that the goal of the study was not to improve upper-limb motor function in

patients but to assess the decision-making strategy used by the POMDP system, CMSA

measurements of upper-limb motor function in patients were performed throughout the study to

ensure no negative effect (decrease in motor function, occurrence of pain) occurred due to

exercising with the system. This CMSA-therapist had decision-making authority as to whether

the patient-participant should be withdrawn early from the study.

For each session, the therapist brought his/her assigned patient to the testing room. Patient

participants were seated on a regular, straight-back chair positioned to the left of the robotic

device. The therapist was responsible for adjusting the position of the chair, placing the trunk

sensors at the appropriate spots (lower back, lower left scapula, and lower right scapula), and

adjusting the height of the robot to ensure that the end-effector was correctly positioned in the

saggital plane of the patient’s right shoulder.

At the start of the study, both participants were briefed by the researcher as to the purpose of

their participation and were encouraged to ask questions and voice any opinions at any time.

The participants were introduced to the rehabilitation system, which consisted of the haptic-

robotic device, unobtrusive trunk sensors, computer display, and POMDP. Once both

participants were familiar with the operation of the system, time was given to the patient to test

the equipment with the power off to familiarise themselves with pushing and pulling the end-

effector.

91

When both participants were comfortable with the device and ready to begin, the researcher

powered on the robotic device and started up the computer programs that controlled the POMDP

agent, robotic device, and virtual environment. The patient was asked to place their hand on the

end-effector, which was secured with a comfortable strap, and when ready, the researcher started

the exercise.

The exercise consisted of three parts:

Part A - after the POMDP made a decision (i.e. to set the target position and resistance

level, or to stop the exercise) the therapist either agreed or disagreed with the decision

made;

Part B - the researcher had the device either execute the decision made by the POMDP if

the therapist agreed or execute the decision made by the therapist if the therapist

disagreed; and

Part C - the patient then performed the reaching exercise by trying to reach the target on

the computer screen.

These parts were repeated in the order of A-B-C until the end of the session, which lasted for

approximately 45 minutes. Once the session had ended, each participant was asked to fill out a

questionnaire. The procedure of each session can be summarised in the following steps:

1. Introduction to study and system (5 minutes)

2. Familiarise patient participant with end-effector, if necessary (5 minutes)

3. Researcher to power on device and computer programs (5 minutes)

4. Perform exercise in order of A-B-C (45 minutes)

5. Fill out questionnaire (10 minutes)

Total duration: approximately 1 hour and 10 minutes (per session)

For the very first session of the study, all patients started with no resistance (r=none) at the

shortest target distance (d=d1). Depending on how they progressed, the POMDP agent adapted

92

to each patient differently and thus, each session thereafter started at a different resistance and

target distance for each patient.

7.4 Modification of Integrated System To incorporate the therapists’ decisions in the study, the MATLAB® program explained in

Section 6.3, which integrated the POMDP agent with the robotic controller, had to be modified.

Instead of automatically sending the target coordinates and the resistance level to the Python

controller once the action was determined from the policy, an extra step had to be taken. This

step involved the therapist either agreeing or disagreeing with the decision made by the policy.

If the therapist agreed, the same target coordinates and resistance level were sent to the

controller. If the therapist disagreed with the decision made, the therapist would choose which

action they thought to be correct, and that target distance and resistance level would be sent to

the controller (if the therapist chose to stop the exercise, the program would terminate). Figure

7.1 shows the modified diagram of Figure 6.3, which includes the therapist’s decision.

93

Figure 7.1: Interaction between POMDP agent and robotic controller, via the therapist

A graphical user interface (GUI) was developed for the therapist during the study to make the

decisions and choices clearer. Figure 7.2 shows an example GUI for the therapist, which was

also developed in MATLAB®. The GUI displays:

A. the decision made by the POMDP agent

B. a button to indicate whether the therapist agrees or disagrees with the decision made

C. a panel of buttons representing the ten actions to choose from (this is only displayed

when the therapist disagrees with the decision made)

D. the decision made by the therapist

E. a history of the previous actions and observations made during the session

F. an emergency stop button that will suddenly kill the exercise if any dangerous events

occur

94

Figure 7.3 is a picture of the final rehabilitation system in use, with a monitor for the therapist.

Figure 7.2: Therapist GUI displaying: (A) decision from POMDP, (B) therapist agreement of decision made, (C) action choice if therapist disagrees, (D) decision from therapist, (E) history of actions and observations, and (F) emergency stop button.

95

Figure 7.3: Final rehabilitation system in use consisting of: (A) virtual environment on the computer monitor, (B) therapist GUI on another monitor, (C) end-effector with rotational encoder, (D) haptic-robotic device, (E) trunk photoresistor sensors (not seen – placed on chair), and (F) robotic controller and POMDP agent

7.5 Capturing Decisions Made by POMDP and Therapist Throughout the duration of the study, every decision made by both the POMDP and the therapist

was saved to a local file for later analysis. Specifically, the following data were captured:

• the decision made by the POMDP, which consisted of setting a particular target distance

and resistance level, or of stopping the exercise

96

• the agreement or disagreement of the therapist to the decision made by the POMDP

• the decision made by the therapist, which consisted of setting a particular target distance

and resistance level, or of stopping the exercise

• a timestamp of when the therapist made the decision

• the observation data

• the initial belief state

• the final belief state after each session

• the final state of the user’s range after each session to be carried over to the next session

Note that the intermediate belief states of each session were not recorded as they could be re-

created later by simulation, starting from the initial belief state and entering the observations and

actions recorded from the study.

7.6 Questionnaire Questions were asked at the end of each session and at the completion of the study for both

participants. The questionnaire for the therapist-participant was designed to focus on rating the

decision-making strategy of the POMDP system. For the patient-participant, the questionnaire

focused on gathering feedback with respect to their satisfaction in using such a robotic system.

Both questionnaires consisted of quantitative and qualitative questions for statistical analysis and

to provide insight into future design improvements, respectively.

7.6.1 Questionnaire for Therapists

The questionnaire for the therapist can be seen in Appendix IV. The first “Participant

Information” page was to gather basic personal information from the therapist. This was used as

background information only.

97

The next section, “Evaluation of Decisions Made by Control System”, was filled out at the end

of every session. The first two questions were rated by circling the appropriate response on a

four-point Likert scale. The four-point scale was deliberate for simplicity and to discourage

neutral answers. The qualitative question encouraged elaboration on any aspects related to the

decisions made by the system.

The final “Overall Evaluation” was filled out once the study was completed. It consisted of five

qualitative questions that focused on the overall decisions made by the system and on the

potential of the system delivering upper-limb reaching rehabilitation.

7.6.2 Questionnaire for Patients

The questionnaire for the patient can be seen in Appendix V. Again, the “Participant

Information” page was to gather basic personal information from the patient.

The “Evaluation of System” section was to be filled out after every session. It consisted of five

four-point Likert scale questions. The qualitative question encouraged any other comments.

The “Overall Evaluation” form was filled out at the end of the study and consisted of eight

quantitative questions, followed by four qualitative questions. These questions focused on

different aspects of the physical system.

7.7 Ethics Approval The study protocol described above received approval five months after the original application

from the Toronto Rehabilitation Institute Research Ethics Board and the University of Toronto

Office of Research Ethics. Unfortunately, due to the delay in obtaining earlier approval, the

study could not be performed as intended, as this would have delayed the completion of this

project. As such, the recruitment time was shortened and the study duration was reduced. In the

end, only one patient-participant and one therapist-participant were recruited for this study,

which lasted for six sessions (each completed once per day, three times a week for two weeks).

Furthermore, the patient recruited for this study was not fluent in English and thus, could not

answer the session questionnaires. However, a translator was hired to help the patient answer the

final questionnaire at the end of the study.

98

Chapter 8 Results

8 Results The following section presents the quantitative and qualitative results from the study.

8.1 Subject Data Due to the delay in receiving ethics approval, only one therapist and one patient were recruited

for this pilot study. The therapist recruited for this study was a physical therapist with more than

nine years of experience in post-acute upper-limb stroke rehabilitation. The patient had suffered

a stroke 227 days (7 months and 14 days) before enrollment with a Chedoke-McMaster Stroke

Assessment (CMSA) arm score of 4, indicating moderate function of the arm (i.e. was able to

perform elbow flexion and extension, and shoulder flexion) (Gowland et al., 1993). At the end

of the two-week study, the patient’s CMSA score did not change. Tables 8.1 and 8.2 show the

participant information of both therapist and patient, respectively.

Table 8.1: Therapist information

ID Gender Profession Years of Experience

T01 Female Physical Therapist 9+ in post-acute (3 in acute care)

99

Table 8.2: Patient information

ID P01

Gender Male

Age (years) 54

Days Since Stroke Occurrence 227

Height (cm) 162

Weight (lbs) 155

Arm Length of Affected Arm - Shoulder

to Wrist (inches)

19

Height of Robot – to Top of End-effector

(inches)

28.5

CMSA of arm (week 0) 4

CMSA of arm (week 2) 4

8.2 Decisions from POMDP and Therapist Every decision made by both the POMDP and therapist, which occurred after a trial, were

broken down into three separate decisions: 1) the distance to set the target at, 2) the level to set

the resistance at, and 3) whether to stop the exercise or not. If the target distance and resistance

level were set (by either the POMDP or therapist), then stopping the exercise would be set to

“no”. However, if the decision was to stop the exercise (“yes”), then this decision would only

count as one (not three) since setting the target distance and resistance level would not be

applicable in this case. Raw quantitative data on the decisions made by the POMDP and

therapist can be seen in Appendix VI.

100

The level of agreement by the therapist to the decisions made by the POMDP was calculated

based on the three separate decisions as described above. A point of agreement would be given

if the therapist:

• set the same target distance as the POMDP, or

• set the same resistance level as the POMDP, or

• agreed with the POMDP to stop the exercise or not.

For example, if the POMDP decided to set the target at d1 with zero resistance (thus, not

stopping the exercise) but the therapist decided to set the target at d2 with zero resistance (again,

not stopping the exercise), then two points of agreement (out of three) would be given for setting

the same resistance level and not stopping the exercise. However, if the POMDP decided to stop

the exercise but the therapist decided to set the target and resistance at a particular level, then

zero points of agreement (out of only one) would be given in this case. The raw quantitative data

on the number of agreements between the POMDP and therapist can also be seen in Appendix

VI.

Figure 8.1 shows the percentage of the therapist’s agreement to the decisions made by the

POMDP system on target distance, resistance level, and stopping the exercise, as well as the

overall performance of the system for each session. For each decision, the percentage of

agreement generally seemed to improve over the course of the six sessions.

101

0

10

20

30

40

50

60

70

80

90

100

Percentage of Agreement Per

Session (%)

S01 59.09090909 59.09090909 30.48780488 40.47619048

S02 83.33333333 77.77777778 32.25806452 50

S03 100 92 28.72340426 52.08333333

S04 100 98.30508475 46.92307692 71.77419355

S05 100 96.36363636 54.28571429 76.74418605

S06 100 96.77419355 63.36633663 82.66666667

Target Resistance Stop Overall

Figure 8.1: Percentage of agreement per session

The following table (Table 8.3) shows the percentage of agreement for all sessions. Note that

there were 636 state transitions or decision points (i.e. total number of trials) and 1,154 decisions

made during the study.

102

Table 8.3: Percentage of agreement over all sessions

Number of

Agreement

Total Number of

Decisions

Percentage of

Agreement (%)

Target Distance 244 259 94.208

Resistance Level 235 259 90.734

Stop the Exercise (or not) 274 636 43.082

Overall 753 1154 65.251

8.3 Questionnaire Data Figure 8.2 summarises the therapist’s session responses, in terms of mean and standard deviation

(SD), regarding the appropriateness of the decisions made during the exercise and whether the

patient was given enough time to complete each trial before the next decision was made. These

ratings corresponded to questions a) and b) in the “Evaluation of Decisions Made by Control

System” questionnaire in Appendix IV. A four-point Likert scale was used with one

representing complete disagreement and four representing complete agreement. The raw

quantitative data of the therapist’s ratings per session can be seen in Appendix VII.

103

0

0.5

1

1.5

2

2.5

3

3.5

4

a) The decisions made during the exercise wereappropriate

b) The patient was given an appropriate amount oftime to complete each exercise before the next

decision was madeQuestions

Figure 8.2: Evaluation of POMDP decisions made by therapist on Likert scale with a mean and SD of 2.833 and 0.408, respectively, for question a) and a mean and SD of 3.167 and 0.408, respectively, for question b)

In addition to the quantitative ratings in the session questionnaire, a qualitative question was

asked to encourage the therapist to elaborate on any aspects related to the decisions made by the

POMDP system. In general, the therapist liked how the system kept setting the target at d3 and

the resistance at the maximum level (once the patient was able to perform the exercise at these

settings) for the patient to work on strengthening. However, the therapist would have liked the

system to be able to randomise different targets and resistances for the patient to work on

control. The raw qualitative data can also be seen in Appendix VII.

The overall questionnaire was filled out by the therapist at the end of the study. Table 8.4 lists

the qualitative responses from the therapist.

104

Table 8.4: Qualitative response from therapist for overall questionnaire

Question Therapist’s Response

Do the decisions seem believable?

If not, why?

Yes they did. Initially, it seemed to end early [i.e. stop the

exercise] for the “high level” patient whereas it could have

been used to strengthen perhaps.

Are there any other decisions you

feel the computer system should

make? If so, what are they?

For this patient, the parameters of control could perhaps

have been more stringent as his performance improved; i.e.

“perfect” vs. min or max control.

Can you envision using this

system as a therapy tool? Please

comment.

It would be great to use with people especially if there were

other dimensions of freedom, perhaps some variety of

targets just to keep interest for someone less motivated –

although this client did very well and was well motivated.

Independent use (in set up only) would be a helpful adjunct

to therapies.

Can you see this system being

used in the clinic, home setting, or

both? Please comment.

It could be integrated perhaps into a PS2 or Wii-type

system eventually to be used at home, but it would be

recommended to practice in clinic first -> Both.

Please elaborate on other

comments you might have.

Good for strengthening in one plane of movement. More

directions would be better to have variety of movements

instead of having a “stereotyped” movement.

As mentioned in Section 7.7, the patient recruited for this study was not fluent in English and

was not able to answer the session questionnaires. However, with the help of a translator, the

patient was able to answer the final questionnaire at the end of the study, which consisted of

eight quantitative four-point Likert scale questions and four qualitative questions. Tables 8.5 and

8.6 show the raw quantitative and qualitative data, respectively.

105

Table 8.5: Quantitative response from patient for overall questionnaire

Question Patient’s Response

Very jerky Very smooth How smooth do you find the

quality of motion of the

robotic device? 1 2 3 4

Not far Very far How do you feel regarding

how far the robot made you

reach? 1 2 3 4

Too little Too much How do you feel regarding

how much resistance the

robot applied? 1 2 3 4

Very

different Very alike How closely does the

exercise resemble the

reaching motion? 1 2 3 4

Very

different Very alike How closely does the

exercise compare to regular

upper-limb therapy? 1 2 3 4

No Yes Were you able to feel the

chair (trunk) sensors during

the exercise? 1 2 3 4

Very boring

Very

interesting How do you feel about the

game display on the

computer screen? 1 2 3 4

106

No Yes Would you use this robotic

system as your primary

therapy? 1 2 3 4

Table 8.6: Qualitative response from patient for overall questionnaire

Question Patient’s Response

What did you like about the system? Fairly good.

What did you not like about the system? [Nothing].

Is there anything you would like to change

about the system?

No.

Please elaborate on other comments you might

have.

(left blank)

107

Chapter 9 Discussion

9 Discussion

9.1 Study Analysis The small sample size of the study limited the use of hypothesis testing to interpret the data.

Section 7.2 calculated that the necessary sample size was 12. Therefore, the data collected in the

study from one therapist and one patient can only provide insight into the performance of the

system.

The therapist agreed with both the target distance and resistance level decisions made by the

POMDP approximately 94% and 90% of the time, respectively, during the study (shown in Table

8.3). Most of this agreement was with the POMDP repeatedly setting the target distance at d3

and the resistance at max. Since the patient was able to reach this setting within the first session

with proper posture and control, the POMDP continued to make this decision as it was given

large rewards for getting the user to reach the furthest target at maximum resistance. The

therapist generally agreed with these decisions as she wanted the patient to work on

strengthening.

However, the therapist only agreed with the POMDP approximately 43% of the time for the stop

decision. The POMDP wanted to stop the exercise to let the user take a break far more often

than the therapist wanted. If the therapist did not see any signs of fatigue from the user, she

would have the patient continue practising the exercise for a longer period of time and not stop.

After about 50 repetitions, the therapist would stop the exercise to let the user take a break. The

dynamics of the fatigue variable in the POMDP model caused its progression to fat=yes too

quickly. Decreasing this progression to match that of the therapist’s decision of stopping the

exercise can be fixed by adjusting the fatigue effects in the iSTRETCH model. Since the

percentage of agreement for the stop decision was low, the overall therapist agreement with the

POMDP decisions dropped to approximately 65%.

108

During each session, as soon as the POMDP estimated that the patient was fatigued, it

continually made the decision to stop the exercise no matter what decision the therapist entered

into the system. The therapist’s decisions alternated between having the patient work on muscle

strengthening (by repeatedly setting the distance and resistance at the highest level) and on

control (by randomising the target distance and resistance levels). However, repetition and

randomisation were not part of the POMDP’s initial objective and thus, the POMDP would never

make the decision to repeat or randomise the target distance and resistance levels. The low

percentage of agreement calculated for the stop decision may not have represented the POMDP’s

decision fairly as repetition and randomisation were not modelled. If the repeated stop decisions

were discarded, this percentage of agreement would have been approximately 94.167% (226 stop

agreements divided by 240 total stop decisions).

The therapist’s ratings on the appropriateness of the amount of time given to complete each trial

before the next decision was made were generally favourable with a mean score of more than 3.1

out of 4.0 on the Likert scale. However, the appropriateness of the decisions made by the

POMDP during the sessions was less favourable with a mean score of more than 2.8 out of 4.0.

Comments from the therapist suggested that randomising the target distance and resistance level

would be beneficial for the patient to work on control in addition to strengthening, which the

POMDP seemed to do by repeatedly setting the target distance at d3 and the resistance level at

max (once the patient was able to perform the exercise at these settings). These results imply

that perhaps in addition to the current model of gradually increasing target distance and then

resistance level, the rehabilitation exercise could include a timeframe where different target

distances and resistance levels would be randomly chosen.

The general qualitative results from the therapist for the final questionnaire can be summarised

as follows:

• the POMDP decisions were believable, except for the fact that the POMDP kept wanting

to stop the exercise too early

• the therapist could envision the rehabilitation system being used in both the clinic and

home setting, as long as the system could vary the locations of the target and not restrict

it to a straight path for more patient motivation, and was easy to set up for therapists

109

From the patient’s quantitative results for the final questionnaire (shown in Table 8.5), the

patient found the quality of motion of the robotic device to be very smooth with a score of 4.0

out of 4.0. The patient also felt that the robotic device made him reach very far during the

exercise with a score of 4.0 out of 4.0. However, the raw data in Appendix VI suggested that the

patient had no trouble reaching the furthest target distance at the maximum level of resistance

with proper posture, maximum control, and no compensation. This suggests that perhaps the

client did not fully understand the question. The patient also felt that the resistance applied by

the robotic device was too little with a score of 1.0 out of 4.0. Throughout the study, the patient

repeatedly commented that the exercise was “too easy”.

The patient was not able to feel the trunk sensors at all during the exercise, which suggests that

trunk compensatory movements can be captured unobtrusively. The patient also felt that the

bull’s eye game was somewhat interesting, scoring 3.0 out of 4.0.

The patient felt that the exercise closely resembled the reaching motion and conventional upper-

limb therapy, scoring 3.0 out of 4.0 for both. In addition, the patient believed he would use this

robotic system as his primary therapy, scoring 4.0 out of 4.0 on the Likert scale.

The patient did not elaborate on the qualitative questions, thus, feedback from this section of the

questionnaire was discarded.

9.2 Analysis of Other Upper Extremity Rehabilitation Robotic Systems

Compared with other rehabilitation robotic systems discussed in Section 3.1, this system was

able to operate autonomously without the explicit feedback from a therapist (or user) by

automatically adjusting exercise parameters from one trial to the next. Through observation and

estimation of states, the POMDP would automatically decide which target distance and

resistance level to set next.

The results from the study cannot conclude that this rehabilitation system can tailor the exercise

to each individual differently since only one patient was recruited and a comparison between

individuals could not be made. However, the system seemed to adjust the exercise parameters

110

according to the progression of the patient. In the beginning of the study, the patient was able to

reach the targets at particular resistance levels with ease. Thus, within the first session, the

POMDP had already set the exercise parameters to the furthest target distance and maximum

resistance level for the patient to practice at.

The POMDP system was also able to estimate user fatigue and in turn, make decisions to stop

the exercise for the user to take a break. Although the progression of fatigue may have been too

fast, the ability to capture fatigue is a great progress in the field of rehabilitation robotics as

fatigue can have a significant affect on a patient’s rehabilitation outcome.

9.3 Limitations The delay in receiving ethics approval limited the amount of time for recruitment, conducting the

study, and comparing the performance of the two models, STRENGTHEN and iSTRETCH, to

determine which one was better.

Given more time, the sample size could have been increased and thus, more data from different

therapists and patients could have been collected. The results described above may be quite

different from those involving more therapists and patients. In addition, the study could have

been expanded to involve both POMDP models. A similar study could have been performed

with one group of therapists and patients performing the exercise on the STRENGTHEN model

and the other group of therapists and patients performing the exercise on the iSTRETCH model.

An evaluation of each model and a comparison of both models would result, with further

analysis determining which model performed better.

9.4 Recommendations for Future Work It is recommended that the immediate future work for this project would be to test this POMDP

model with more participants in order to obtain significant results. It is not recommended to

change the POMDP model at this time before obtaining further results as the opinions of other

therapist may differ significantly from this therapist.

111

If more time permits, it is recommended that a similar clinical evaluation be performed to

compare the performance of the STRENGTHEN and iSTRETCH models. The following

suggestions can also be considered given more time:

• enhance the user interface to provide feedback for the user such as a scoring system or

sounds to indicate that the user has reached the target

• include target distances that are not restricted to the linear path

• develop an easier way to initialise the exercise such that both Python and MATLAB®

programs start automatically

112

Chapter 10 Conclusion

10 Conclusion Although no substantial conclusions can be made, the results from the study provided some

insight into the proposed questions.

1. Does the POMDP made decisions comparable to those made by human therapists in

guiding stroke patients during a reaching exercise? Based on one therapist’s point-of-

view, the decisions made by the POMDP for setting the target distance and resistance

level seemed comparable to those made by the therapist. However, the therapist

disagreed with the POMDP’s decision to stop the exercise more than half the time.

Overall, the therapist agreed with the decisions made by the POMDP approximately 65%

of the time.

2. Are there any aspects of the system which seem to get more positive or negative feedback

from patients and therapists? Overall, the patient was content with the system and would

use this system as their primary therapy. The only negative feedback from the patient

was that the resistances applied by the robotic device during the exercise were too little.

The therapist thought the POMDP decisions were believable and could envision this

system being used first, in the clinic and then eventually in the home setting. The

suggestions of improvement received from the therapist were to randomise the target

distances and resistance levels during the exercise, include target distances that were not

located on the straight path, increase the amount of repetition before stopping the

exercise, and develop an easier method of initialising the exercise.

3. What do these results mean in terms of future development of the POMDP model and

overall system? These results suggest that the dynamics of the fatigue variable in the

POMDP model may need to be changed in order for the POMDP to stop the exercise less

often. In addition, the POMDP model may have to be expanded to include more targets

(not located on the linear path) and randomisation of different target distances and

113

resistance levels. In terms of the robotic system itself, future changes may include

increasing the resistance levels applied by the device and developing an easier way to set

up the exercise.

This research demonstrates that POMDPs have promising potential to provide autonomous

upper-limb rehabilitation for stroke patients, which may allow clients to perform guided

rehabilitation when and where they prefer and enable them to progress at the best possible pace.

114

References Amirabdollahian, F., Loureiro, R., Gradwell, E., Collin, C., Harwin, W., and Johnson, G. (2007).

Multivariate analysis of the Fugl-Meyer outcome measures assessing the effectiveness of GENTLE/S robot-mediated stroke therapy. Journal of NeuroEngineering and Rehabilitation, 4(4), 1-16.

Barnes, M., Dobkin, B., and Bogousslavsky, J. (2005). Recovery after stroke. United Kingdom: Cambridge University Press.

Brewer, B. R., McDowell, S. K., and Worthen-Chaudhari, L. C. (2007). Poststroke upper extremity rehabilitation: A review of robotic systems and clinical results. Topics in Stroke Rehabilitation, 14(6), 22-44.

Buchanan, B. G. (2005). A (very) brief history of artificial intelligence. AI Magazine, 26(4), 53-60.

Canadian Stroke Network. (2007). Stroke 101. About Stroke. Retrieved July 21, 2008, from http://www.canadianstrokenetwork.ca/eng/about/stroke101.php

Caplan, L. R. (2000). Caplan’s stroke: A clinical approach, 3rd edition. Massachusetts: Butterworth-Heinemann.

Caplan, L. R. (2006). Stroke. New York: Demos Medical Publishing.

Carr, J. and Shepherd, R. (2003). Stroke rehabilitation: Guidelines for exercise and training to optimize motor skill. United Kingdom: Butterworth-Heinemann.

Cirstea, M. C. and Levin, M. F. (2000). Compensatory strategies for reaching in stroke. Brain, 123(5), 940-953.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.

Dobkin, B. H. (2008). Fatigue versus activity-dependent fatigability in patients with central or peripheral motor impairments. Neurorehabilitation and Neural Repair, 22(2), 105-110.

Erol, D., Mallapragada, V., Sarkar, N., Uswatte, G., and Taub, E. (2006). Autonomously adapting robotic assistance for rehabilitation therapy. Paper presented at the First IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, Pisa, Italy.

Fasoli, S. E., Krebs, H. I., and Hogan, N. (2004). Robotic technology and stroke rehabilitation: Translating research into practice. Topics in Stroke Rehabilitation, 11(4), 11-19.

Fasoli, S. E., Krebs, H. I., Stein, J., Frontera, W. R., Hughes, R., and Hogan, N. (2004). Robotic therapy for chronic motor impairments after stroke: Follow-up results. Archives of Physical Medicine and Rehabilitation, 85(7), 1106-1111.

115

Faul, F., Erdfelder, E., Lang, A.G., and Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191.

Ferraro, M., Palazzolo, J. J., Krol, J., Krebs, H. I., Hogan, N., and Volpe, B. T. (2003). Robot-aided sensorimotor arm training improves outcome in patients with chronic stroke. Neurology, 61(11), 1604-1607.

Friel, K. M. and Nudo, R. J. (1998). Restraint of the unimpaired hand is not sufficient to retain spared primary motor hand representation after focal cortical injury. Society for Neuroscience Abstracts, 24, 405.

Gillen, G. and Burkhardt, A. (2004). Stroke rehabilitation: A function-based approach, 2nd edition. Missouri: Mosby.

Givon, M. and Grosfeld-Nir, A. (2008). Using partially observed Markov processes to select optimal termination time of TV shows. Omega, 36(3), 477-485.

Gowland, C., Stratford, P., Ward, M., Moreland, J., Torresin, W., Van Hullenaar, S., et al. (1993). Measuring physical impairment and disability with the Chedoke-McMaster Stroke Assessment. Stroke, 24(1), 58-63.

Heart and Stroke Foundation of Canada. (2006, June 5). Tipping the scales of progress. Heart Disease and Stroke in Canada, 1-18.

Heart and Stroke Foundation of Canada. (2008). Stroke. Statistics. Retrieved July 21, 2008, from http://www.heartandstroke.com/site/c.ikIQLcMWJtE/b.3483991/k.34A8/ Statistics.htm#stroke

Hidler, J., Nichols, D., Pelliccio, M., and Brady, K. (2005). Advances in the understanding and treatment of stroke impairment using robotic devices. Topics in Stroke Rehabilitation, 12(2), 22-35.

Hoey, J., St-Aubin, R., Hu, A., and Boutilier, C. (1999 July). SPUDD: Stochastic planning using decision diagrams. Paper presented at the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden.

Ju, M. S., Lin, C. C. K., Lin, D. H., Hwang, I. S., and Chen, S. M. (2005). A rehabilitation robot with force-position hybrid fuzzy controller: Hybrid fuzzy control of rehabilitation robot. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(3), 349-358.

Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99-134.

Kahn, L. E., Zygman, M. L., Rymer, W. Z., and Reinkensmeyer, D. J. (2006). Robot-assisted reaching exercise promotes arm movement recovery in chronic hemiparetic stroke: A randomized controlled pilot study. Journal of NeuroEngineering and Rehabilitation, 3(12), 1-13.

116

Krebs, H. I., Ferraro, M., Buerger, S. P., Newbery, M. J., Makiyama, A., Sandmann, M., et al. (2004). Rehabilitation robotics: Pilot trial of a spatial extension for MIT-Manus. Journal of NeuroEngineering and Rehabilitation, 1(5), 1-15.

Krebs, H. I., Hogan, N., Aisen, M. L., and Volpe, B. T. (1998). Robot-aided neurorehabilitation. IEEE Transactions on Rehabilitation Engineering, 6(1), 75-87.

Lam, P. T. Y. (2007). Development of a haptic-robotic platform for moderate level upper-limb stroke rehabilitation. M.A.Sc. thesis, Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, Canada.

Lam, P., Hébert, D., Boger, J., Lacheray, H., Gardner, D., Apkarian, J., et al. (2008). A haptic-robotic platform for upper-imb reaching stroke therapy: Preliminary design and evaluation results. Journal of NeuroEngineering and Rehabilitation, 5(15), 1-13.

Liepert, J., Bauder, H., Miltner, W. H. R., Taub, E., and Weiller, C. (2000). Treatment-induced cortical reorganization after stroke in humans. Stroke, 31, 1210-1216.

Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28, 47-66.

Lum, P. S., Burgar, C. G., Shor, P. C., Majmundar, M., and Van der Loos, M. (2002). Robot-assisted movement training compared with conventional therapy techniques for the rehabilitation of upper-limb motor function after stroke. Archives of Physical Medicine and Rehabilitation, 83(7), 952-959.

Lum, P. S., Burgar, C. G., Van der Loos, M., Shor, P. C., Majmundar, M., and Yap, R. (2006). MIME robotic device for upper-limb neurorehabilitation in subacute stroke subjects: A follow-up study. Journal of Rehabilitation Research and Development, 43(5), 631-642.

MacClellan, L. R., Bradham, D. D., Whitall, J., Volpe, B., Wilson, P. D., Ohlhoff, J., et al. (2005). Robotic upper-limb neurorehabilitation in chronic stroke patients. Journal of Rehabilitation Research and Development, 42(6), 717-722.

McLaughlin, M. L., Hespanha, J. P., and Sukhatme, G. S. (2001). Touch in virtual environments: Haptics and the design of interactive systems. New Jersey: Prentice Hall.

Nudo, R. J. and Milliken, G. W. (1996). Reorganization of movement representations in primary motor cortex following focal ischemic infarcts in adult squirrel monkeys. Journal of Neurophysiology, 75(5), 2144-2149.

Nudo, R. J., Wise, B. M., SiFuentes, F., and Milliken, G. W. (1996). Neural substrates for the effects of rehabilitative training on motor recovery after ischemic infarct. Science, 272(5269), 1791-1794.

Pineau, J., Gordon G., and Thrun, S. (2006). Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 27, 335-380.

117

Pineau, J., Montemerlo, M., Pollack, M., Roy, N., and Thrun, S. (2003). Towards robotic assistants in nursing homes: Challenges and results. Robotics and Autonomous Systems, 42(3-4), 271-281.

Poupart, P. (2005). Exploiting structure to efficiently solve large scale partially observable Markov decision processes. Ph.D. thesis, Department of Computer Science, University of Toronto, Toronto, Canada.

Reinkensmeyer, D. J., Kahn, L. E., Averbuch, M., McKenna-Cole, A., Schmit, B. D., and Rymer, W. Z. (2000). Understanding and treating arm movement impairment after chronic brain injury: Progress with the ARM guide. Journal of Rehabilitation Research and Development, 37(6), 653-662.

Russell, S. and Norvig, P. (2003). Artificial intelligence: A modern approach, 2nd edition. New Jersey: Prentice Hall.

Sondik, E. J. (1971). The optimal control of partially observable Markov decision processes. Ph.D. thesis, Department of Electrical Engineering, University of Stanford, Stanford, California.

Spaan, M. T. J. and Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24, 195-220.

Volpe, B. T., Krebs, H. I., Hogan, N., Edelstein, L., Diels, C., and Aisen, M. (2000). A novel approach to stroke rehabilitation: Robot-aided sensorimotor stimulation. Neurology, 54(10), 1938-1944.

Young, E. (2007, April 7). Tireless, reliable physio-robots take on stroke paralysis. New Scientist, 194(2598), 24-25.

118

Appendix I – Example Construction of a Conditional Probability Table

Conditional probability tables (CPTs) are a way of capturing the transition function of a POMDP model. It describes the probability of each value of a variable occurring at time step, t+1, for a specific action, given the values of influencing variables at the prior time step, t. An example of a CPT is illustrated below. Figure I.1 shows the dynamic Bayesian network (DBN) that describes the relationship of the fat [fat : {yes,no}] and n(none) [n(none) : {none, d1, d2, d3}] variables at time t to fat’ at time t+1 (extracted from the POMDP model in Figure 5.5).

Figure I.1: Relationship of fat and n(none) to fat’

The dynamics for this DBN for the action setd1resnone (sets target distance at d=d1 and resistance level at r=none) are enumerated in the CPT of Table I.1.

Table I.1: CPT for fat’ for action setd1resnone

fat n(none) fat’(yes) fat’(no) yes none 1.0 0.0 yes d1 1.0 0.0 yes d2 1.0 0.0 yes d3 1.0 0.0 no none 0.2 0.8 no d1 0.1 0.9 no d2 0.067 0.933 no d3 0.05 0.95

119

From Table I.1, it can be observed that the instances of the fat(yes) variables are deterministic. This means that these variables will progress to a specific value with certainty, namely that they will transition to a future value with a probability of 1.0. For these instances, fat’(yes) will always have a probability of 1.0 because if fatigued at time t, the user will, for certain, be fatigued at time t+1 regardless of the user’s range. The other instances (i.e. fat(no)) in Table I.1 are probabilistic. The probability of fat’(yes) decreases as the user’s range increases when the action is to set d=d1 and r=none. In other words, as the user’s current range increases beyond that of the set target and resistance, the rate of the user becoming fatigued decreases (i.e. exercise becomes “easier”). It is also important to note that the probabilities for each combination of instantiations of the variables at time t sum to 1.0. The ideas presented in the above example can be expanded to accommodate the interaction of multiple variables found in the full reaching exercise model. CPTs can also be applied to the observation function in the same way.

120

Appendix II – Simulation Examples of the STRENGTHEN and iSTRETCH Models

Simulation 01 (STRENGTHEN Model) Initial Belief State:

ACTION: d=d1; r=none OBSERVATION: OH=yes OT=norm OCT=max OCO=no


121

ACTION: d=d3; r=none OBSERVATION: OH=no OT=none OCT=min OCO=no



122

ACTION: d=d3; r=none OBSERVATION: OH=no OT=none OCT=min OCO=no



123

ACTION: d=d3; r=none OBSERVATION: OH=no OT=none OCT=none OCO=yes


ACTION: d=d2; r=none OBSERVATION: OH=no OT=none OCT=none OCO=yes

124

ACTION: stop

125

Simulation 01 (iSTRETCH Model)

ACTION: d=d1; r=none OBSERVATION: OT=norm OCT=max OCO=no

ACTION: d=d2; r=none OBSERVATION: OT=norm OCT=max OCO=no

126

ACTION: d=d3; r=none OBSERVATION: OT=none OCT=min OCO=no

ACTION: d=d3; r=none OBSERVATION: OT=none OCT=min OCO=no

ACTION: d=d3; r=none OBSERVATION: OT=none OCT=none OCO=yes

127

ACTION: stop

128

Simulation 02 (STRENGTHEN Model) Initial Belief State:

ACTION: d=d1; r=max OBSERVATION: OH=yes OT=norm OCT=max OCO=no


129



ACTION: d=d3; r=max OBSERVATION: OH=yes OT=norm OCT=max OCO=yes

130

ACTION: d=d3; r=max OBSERVATION: OH=yes OT=norm OCT=max OCO=yes

ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=min OCO=yes

ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=min OCO=yes

131

ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=none OCO=yes

*notice the decrease (reverse) in the fatigue level ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=none OCO=yes

ACTION: d=d3; r=max OBSERVATION: OH=yes OT=slow OCT=none OCO=yes

132

ACTION: d=d3; r=max *keeps producing the same action over and over again since fat=yes keeps decreasing…

133

Simulation 02 (iSTRETCH Model)

ACTION: d=d1; r=max OBSERVATION: OT=norm OCT=max OCO=no


134




135


ACTION: d=d3; r=max OBSERVATION: OT=norm OCT=max OCO=yes

ACTION: d=d3; r=max OBSERVATION: OT=norm OCT=max OCO=yes

136

ACTION: stop

137

Appendix III – Micro-controller Software Code /* serialPhotoSensor.ic Handyboard serial communication with Matlab compiled with Interactive C v4.3 requires serial communication with PC running Matlab script send one byte of data from photo sensors status to PC this file is based on seriolio.c described below */ /* serialio.c ;low-level serial I/O for the Handy Board ;also works with the 6.270 board Dr. Fred G. Martin Learning and Epistemology Group Media Laboratory Massachusetts Institute of Technology [email protected] */ /*****************************************************************************/ /* */ /* function declarations */ /* */ /*****************************************************************************/ void disable_pcode_serial() /* disable board handshaking with IC */ /* on the host computer, allowing user */ /* programs to receive serial data */ { poke(0x3c, 1); } /*****************************************************************************/ void enable_pcode_serial() /* enable board handshaking with IC on */ /* the host computer */ { poke(0x3c, 0); } /*****************************************************************************/ void serial_putchar(int c) /* send a serial character. Note: the */ /* program hangs until the character is */ /* sent! There is no timeout! */ { while (!(peek(0x102e) & 0x80)); /* wait until it's okay to send */ poke(0x102f, c); /* send the character */ } /*****************************************************************************/ int serial_getchar() /* read a serial character. Note: the */ /* program hangs until a character is */ /* received! There is no timeout! */ { while (!(peek(0x102e) & 0x20)); /* wait until a character arrives */

138

return (peek(0x102f)); /* return it as an int */ } /*****************************************************************************/ void main() { int phThreshold = 130; int phThresholdRight = 50; int phRight, phLeft, phLow; int serialOut; /* int serialIn; */ printf("Program Started\n"); disable_pcode_serial(); /* disable handshaking to ENABLE serial use */ while(!stop_button()) { serialOut = 0; phRight = analog(2); /* photo sensor behind right shoulder */ phLeft = analog(4); /* photo sensor behind left shoulder */ phLow = analog(6); /* photo sensor behind upper lumbar */ if (phRight < phThresholdRight) /* if right shoulder moved away*/ { serialOut = serialOut + 0b00000001; /* right = first bit*/ } if (phLeft < phThreshold) /* if left shoulder moved away*/ { serialOut = serialOut + 0b00000100; /* left = third bit*/ } if (phLow < phThreshold) /* if lower back moved away*/ { serialOut = serialOut + 0b00000010; /* lower back = second bit*/ } serial_putchar(serialOut); /* serial out the one byte of data */ /* serial_putchar(10); LF terminator NOT NEEDED*/ /* serialIn = serial_getchar(); */ printf("%d\n", serialOut); sleep(0.05); /* wait a while for next cycle */ } /* when Stop is pressed */ serialOut = 0b00001000; serial_putchar(serialOut); /* serial out 8 */ printf("%d\n", serialOut); sleep(1.0); enable_pcode_serial(); printf("Program Terminated\n"); }

139

Appendix IV – Questionnaire for the Therapist (Note: to be completed once at beginning of trials) Date: Participant ID: T## Participant Information Participant ID: T## Gender F / M Profession

Occupational Therapist Physical Therapist Other:

______________________________ Years working with stroke patients in upper-limb rehabilitation _______________

140

(Note: to be completed per session) Start Time: End Time: Date: Participant ID: T##

Evaluation of Decisions Made by Control System Please circle one rating for each question. Given the decisions that the computer system can make:

Disagree Agree a) The decision(s) made during the exercise(s) were appropriate.

1 2 3 4

b) The patient was given an appropriate amount of time to complete each exercise before the next decision was made.

1 2 3 4

Please elaborate if there were any aspects of the decisions made by the system that you felt were especially strong or weak. __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________________

141

(Note: to be completed once at end of trials) Date: Participant ID: T## Overall Evaluation Please answer the following: Do the decisions seem believable? If not, why? ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________ Are there any other decisions you feel the computer system should make? If so, what are they? ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________ Can you envision using this system as a therapy tool? Please comment. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________

142

Can you see this system being used in the clinic, home setting, or both? Please comment. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________ Please elaborate on other comments you might have. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________

143

Appendix V – Questionnaire for the Patient (Note: to be completed once at beginning of trials) Date: Participant ID: P## Participant Information Participant ID: P## Gender F / M Age _______________ Height _______________ Weight _______________ Arm length of affected arm (shoulder to wrist) _______________ Time since stroke occurrence _________________ Height of robot _________________ (to be filled in by CMSA-therapist): CMSA Score at Week 0 _________________ CMSA Score at Week 2 _________________ CMSA Score at Week 4 _________________

144

(Note: to be completed per session) Start Time: End Time: Date: Participant ID: P##

Evaluation of System Please circle one rating for each question.

Very un-comfortable Very

comfortable a) How comfortable did you feel during the exercise?

1 2 3 4

Very unsafe Very safe b) How safe did you feel during the exercise?

1 2 3 4

Not good Very good c) Did you feel you had a good arm workout during this session?

1 2 3 4

Not tired at all Very tired

d) Does your arm feel tired after the exercise?

1 2 3 4

No pain A lot of paine) Do you feel pain in your arm after the exercise?

1 2 3 4

145

Please elaborate on any other comments you might have. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________________________________________________________________

146

(Note: to be completed once at end of trials) Date: Participant ID: P## Overall Evaluation Please circle one rating for each question. Very jerky Very

smooth a) How smooth do you find the quality of motion of the robotic device?

1 2 3 4

Not far Very far b) How do you feel regarding how far the robot made you reach?

1 2 3 4

Too little Too much c) How do you feel regarding how much resistance the robot applied?

1 2 3 4

Very different Very alike

d) How closely does the exercise resemble the reaching motion?

1 2 3 4

Very different Very alike

e) How closely does the exercise compare to regular upper-limb therapy?

1 2 3 4

147

No Yes f) Were you able to feel the chair (trunk) sensors during the exercise?

1 2 3 4

Very boring Very interesting

g) How do you feel about the game display on the computer screen?

1 2 3 4

No Yes h) Would you use this robotic system as your primary therapy?

1 2 3 4

Please answer the following: What did you like about the system? __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ What did you not like about the system? __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

148

Is there anything you would like to change about the system? If so, what are they? __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Please elaborate on other comments you might have. __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

149

Appendix VI – Raw Quantitative Data on Decisions Made by POMDP and Therapist

The following data are presented according to session, exercise, and trial. An exercise is defined as a group of trials, where the last trial occurs when the therapist decides to stop the exercise to let the user take a break. Session 01, Exercise 01

Trial #

POMDP Therapist Agree

1 Target 1 2 0 Resistance 0 0 1 Stop No No 1 2 Target 2 3 0 Resistance 0 1 0 Stop No No 1 3 Target 1 3 0 Resistance 2 2 1 Stop No No 1 4 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 6 /10 Target 0 /3 Resistance 2 /3 Stop 4 /4

Session 01, Exercise 02

Trial #


1 Target 1 3 0 Resistance 2 0 0 Stop No No 1 2 Target 1 3 0 Resistance 2 1 0 Stop No No 1 3 Target 1 3 0 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1

150

Stop No No 1 6 Target 3 3 1 Resistance 2 1 0 Stop No No 1 7 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 13 /19 Target 3 /6 Resistance 3 /6 Stop 7 /7


Trial #


1 Target 3 3 1 Resistance 2 0 0 Stop No No 1 2 Target 3 3 1 Resistance 2 1 0 Stop No No 1 3 Target 3 3 1 Resistance 2 1 0 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 2 0 Resistance 2 2 1 Stop No No 1 6 Target n/a 2 Resistance n/a 2 Stop Yes No 0 7 Target n/a 2 Resistance n/a 2 Stop Yes No 0 8 Target n/a 2 Resistance n/a 2 Stop Yes No 0 9 Target n/a 2 Resistance n/a 2 Stop Yes No 0

10 Target n/a 3 Resistance n/a 2 Stop Yes No 0


12 Target n/a 3

151

Resistance n/a 2 Stop Yes No 0

















152












40 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 12 /50 Target 4 /5 Resistance 2 /5 Stop 6 /40


Trial #


1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1

153

Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 2 0 Resistance 2 0 0 Stop No No 1 8 Target 3 2 0 Resistance 2 0 0 Stop No No 1 9 Target n/a 2 Resistance n/a 2 Stop Yes No 0










19 Target n/a 3

154











29 Target 3 3 Resistance 2 2 Stop No No 0

30 Target 3 3 Resistance 2 2 Stop No No 0

31 Target 3 n/a Resistance 2 n/a Stop No Yes 0 20 /47 Target 6 /8 Resistance 6 /8 Stop 8 /31


Trial #


1 Target 3 3 1 Resistance 2 2 1 Stop No No 1

155

2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1









18 Target n/a 3 Resistance n/a 0

156

Stop Yes No 0 19 Target n/a 2 Resistance n/a 1 Stop Yes No 0
















35 Target n/a 3

157
















Trial #


158

1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 0 0 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 1 0 Stop No No 1 9 Target 3 1 0 Resistance 2 0 0 Stop No No 1









159

















34 Target n/a 3

160












Trial #


1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1

161

5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target n/a n/a Resistance n/a n/a Stop Yes Yes 1 16 /16 Target 5 /5 Resistance 5 /5 Stop 6 /6


Trial #






162
















163


Trial #


1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target n/a 3 Resistance n/a 2 Stop Yes No 0 9 Target n/a 3 Resistance n/a 2 Stop Yes No 0







164


















165

















49 Target n/a 1

166









Trial #


1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1

167

8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1
















168
















169


Trial #


1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target n/a 3 Resistance n/a 2 Stop Yes No 0







170


















171











Trial #


1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1

172

Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1













173


















174

















55 Target n/a n/a

175

Resistance n/a n/a Stop Yes Yes 1 84 /111 Target 28 /28 Resistance 27 /28 Stop 29 /55


Trial #







176

















30 Target 3 3 1 Resistance 2 2 1

177

Stop No No 1 31 Target 3 3 1 Resistance 2 2 1 Stop No No 1



34 Target 3 n/a Resistance 2 n/a Stop No Yes 0 69 /80 Target 23 /23 Resistance 23 /23 Stop 23 /34


Trial #



10 Target 3 3 1

178

Resistance 2 2 1 Stop No No 1

















179


















180













Trial #


1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1

181

Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target 3 3 1 Resistance 2 2 1 Stop No No 1











182


















183
















184


Trial #









185


















186

















49 Target n/a 3

187







Trial #


1 Target 3 3 1 Resistance 2 2 1 Stop No No 1 2 Target 3 3 1 Resistance 2 2 1 Stop No No 1 3 Target 3 3 1 Resistance 2 2 1 Stop No No 1 4 Target 3 3 1 Resistance 2 2 1 Stop No No 1 5 Target 3 3 1 Resistance 2 2 1 Stop No No 1 6 Target 3 3 1 Resistance 2 2 1 Stop No No 1 7 Target 3 3 1 Resistance 2 2 1 Stop No No 1 8 Target 3 3 1 Resistance 2 2 1 Stop No No 1 9 Target n/a 3 Resistance n/a 2 Stop Yes No 0

188


















189

















43 Target n/a 3

190







191

Appendix VII – Raw Quantitative and Qualitative Data on Therapist’s Ratings Per Session

Session Question a) Question b) Other Comments

01 2 4

High-level patient therefore wanted to randomise some parts to work on increasing control with different targets and different resistances. Then wanted to work on strengthening therefore furthest target most resistance <- this is what the computer wanted to do at the beginning before the control part.

02 3 3

Decisions at beginning were good (target 3 and max resistance) but then I wanted to do some more random targets/resistances. As [the patient] was high level, would have liked perhaps to increase speed.

03 3 3

As per [last session]. Would like to be able to randomise. Perhaps increase options to be able to increase speed of performance, change of angle…

04 3 3

It seemed to ask for more reps at higher level before stopping today which I thought was appropriate for this particular patient.

05 3 3 Zero change from yesterday. 06 3 3 (left blank)

Documents

Design of an Adaptive System for Upper-Limb Stroke Rehabilitation · 2010-02-08 · iii Acknowledgments First and foremost, I would like to thank my supervisor, Dr. Alex Mihailidis,