Upload
junior-lefton
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
The Effects of Interface Design on Telephone Dialing Performance
Master’s thesis in Computer Science
Andrew R. Freed
4/30/2003
The Effects of Interface Design on Telephone Dialing Performance
Towards automatic interface evaluation Methods of evaluation Experiment design Three analyses Comparison of analyses Further work
Towards automatic interface evaluation Why not test with actual users instead? It takes too much time and money! Automatic evaluation has been useful in
the past (Project Ernestine - Gray et al 1992) to the tune of $2.4M savings/year
Several proposed tools will make this type of evaluation easier
Towards automatic interface evaluation Motivation:
– Eye-tracking studies by Byrne (1999, 2001) and Hornof (1997)
– Cognitive models as surrogate users (Ritter 2001)
Towards automatic interface evaluation 100 phones to choose from Selected 10 for analysis
Towards automatic interface evaluation 10 tasks (Ritter 2000)
– 1. Call home (*)
– 2. Call work (*)
– 3. Redial last number (*)
– 4. Call directory inquiries
– 5. Call mother (*)
– 6. Conference call work and home (*)
– 7. Conference call work (flash) then home
– 8. Forward call to another number (*)
– 9. Forward call (flash) to another number
– 10. Hang up
Towards automatic interface evaluation 10 telephone numbers
– 814-866-5000 215-654-5785– 123-654-7890 814-234-9657– 814-863-5000 740-611-9273– 412-268-3000 101-010-1010– 606-193-3012 103-273-1029
and 3 other tasks– Forward, redial, conference call
Methods of evaluation
Possible tools Cognitive architectures ACT-R/PM Generic Simulated Eyes and Hands Focused analysis methods
Possible tools
Ivory’s tools to evaluate websites (2001) Apex (M. Freed 1998) and iGen
(Emmerson 2000) model complex tasks Glean (Kieras et al 1995) evaluates Lisp
interfaces Shortcomings: no learning, no visual
search, tied to a specific interface format, no cognitive theory
Cognitive architectures
Unified theory of cognition (Newell 1990)
Simulate human behavior Perceptual and motor capability
(simulated eyes and hands) Can do visual search, click buttons,
sometimes learn
Cognitive architectures (examples) EPIC (Kieras and Meyer 1997) - has visual
search and perceptual/motor skills… but only evaluates Common Lisp interfaces
Soar (Newell 1990) - also has visual search, perceptual motor skills, plus learning… but only evaluates Tcl/Tk interfaces (or requires a socket connection)
ACT-R/PM (Anderson & Lebiere 1998, Byrne 2001) - nearly identical benefits and limitations as EPIC, plus has learning
ACT-R/PM
Why did we choose ACT-R/PM? Well-accepted cognitive architecture Used in past to evaluate interfaces Can overcome the “Lisp interface-only”
problem with generic eyes and hands
Generic Simulated Eyes and Hands Segman (St. Amant & Riedl 2001) can
parse a Windows screen capture and determine the interface components
Can use interfaces written in Lisp, Tcl/Tk, HTML, Visual C++, ...
Segman can be connected to ACT-R/PM
Focus of analysis
A - Analytical model (Fitts’ Law) B - Cognitive model (ACT-R/PM) C - Human data
General experiment design
Analytical model, cognitive model, and human users interact with same interfaces
Analytical model dials each number once on each phone, does not do other tasks
Cognitive model: Dialed each phone number 50 times on each phone, performed other phone tasks 50 times on each phone.
Human users (N=9): Dialed each phone number on each phone, performed other phone tasks once on each phone
Experimental software
General experiment design
General experiment design
Cognitive model and users– Timing and mouse-click logging– Eye-tracking– Users can control pace of trials, model does not
“care” Analytical model
– Does not need to “see” telephones– Mathematical formula with pixel-level input
yields “reaction times”
A. Fitts’ Law analysis
What is Fitts’ Law? Numerical analysis Simple conclusions and problems
What is Fitts’ Law?
Fitts’ Law (two possible forms):– MT = a + b * LOG2(2 * D/W) (Fitts 1954)
– MT = max(tm, k * LOG2[0.5 + D/W]) (Card et al, 1983)
MT is mouse movement time D is distance to target, W is target width a, b, k are constants tm is minimum movement time
Numerical analysis
Collected pixel-level input about telephones (size and location of buttons)
Dialing a phone requires 10 movements Total the times from the 10 movements
and a base dialing time is established (with no visual search!)
Numerical analysis
Validating our choice of sample telephone numbers (R2 = 0.96)
Simple conclusions and problems
Fitts’ Law analysis is fast (it is just an equation!)
Does not consider many factors Not affected by any aspect of interface
design other than button sizing and spacing
B. ACT-R/PM model analysis
Description of model Visual search predictions ACT-R/PM makes different reaction
time conclusions
Description of ACT-R/PM model
Model has three main components that can operate in parallel: – retrieve a phone digit from memory– visually search for the digit– move the mouse/click on a digit (governed
by Fitts’ Law) Composed of 71 production rules
(mostly for visual search)
Description of ACT-R/PM model
Visual search strategy: random or systematic
One production for random search Find-random-target
IF the goal is to find a phone target
THEN find a visual object of type text which has not been attended lately
Description of ACT-R/PM model
Sixty productions for systematic search Systematic-search-from-targetIF a digit x is in the visual buffer
AND the goal is to find a target y
AND y is in direction z from x
THEN find a visual object of type text in direction z from target x which is within the bounds of the keypad
Visual search predictions
Count fixations and note fixation locations Search for the keypad is random Search within the keypad is systematic The telephones do not generally require a
statistically significant different number of fixations to dial (about 16)
(The telephone numbers are significantly different)
Visual search predictions
Model trace
Visual search predictions
Phone 4 Phone 9
What’s wrong with this picture?
Visual search predictions
Two phones are predicted to have abnormally long visual searches
These phones require approximately sixty fixations (average on others was sixteen)
Phone 4 has an upside-down keypad -- the systematic search fails!
Phone 9 contains extra information on the buttons… distracts the visual search
We will see the model takes much longer than humans to dial these phones
ACT-R/PM makes different reaction time conclusions This is no surprise - more factors are
being considered Phones 4 and 9 pay a large visual
search penalty Fitts’ Law still a factor - phones with
“Fitts’ Law violations” still perform worse
ACT-R/PM makes different reaction time conclusions
ACT-R/PM makes different reaction time conclusions The phones are often shown to have different
dialing times (T-test, p<.05) The significance level of the differences
depends on the telephone number being dialed
On average, approximately 8.7 seconds to dial a telephone.
Never faster than six seconds No errors!
ACT-R/PM makes different reaction time conclusions Model is able to perform additional
tasks (redial, forward, conference) with a random search
Model does not always succeed but never gives up
Will attend the same visual target several times
C. User data analysis
Where and how users look (eye-tracking)
Humans make errors Summary of user reaction times
Where and how users look
Fast random search for keypad Systematic search within keypad
Where and how users look
User trace
Where and how users look
Users require approximately the same number of fixations per telephone as the model did (also true for telephone numbers)
User able to cope with phones 4 and 9 by changing search strategy– Phone 4: “Up is down, down is up”– Phone 9: Ignore ABCs on the keypad
Where and how users look
Fixation comparison across numbers (R2 = 0.11)
Where and how users look
Fixation comparison across 8 phones (R2 = 0.34)
Humans make errors
Errors not predicted by the automatic analyses
Depend on several factors– Number being dialed– Dialing speed (weak correlation)– Interface being used
Errors dependent on interface
Most errors on “Fitts’ Law violators” Least errors when large and adjacent
buttons Users will move mouse while clicking
(ACT-R/PM will not), this can cause errors
Possible to estimate number of errors with Fitts’ “index of difficulty”?
Summary of reaction times
User on average more than one second faster than model
This probably due to efficient pipelining of motor tasks (room for ACT-R/PM improvement)
Users can dial as fast as 3.5 seconds (average is seven seconds)
Summary of reaction times Model (R2 = 0.41), Fitts’ (R2 = 0.85), user dial
time across phones
Summary of reaction times
Users can do other phone tasks faster than ACT-R/PM
Users can find the target under varied conditions
Users try more strategies to find target Users will give up if they can’t succeed!
Summary of reaction times
Model vs user on extra tasks (R2 = 0.60, 0.26, 0.11)
Summary of reaction times
User data also shows that the interfaces are often significantly different (p <.05), though less often than the model says
User time differences also depend on the number being dialed
Theory: users less affected by additional interface objects than ACT-R/PM
Comparison of analyses
Analytical model is not enough Visual search differences between ACT-R/PM
and users ACT-R/PM and Segman need better
representation of interfaces Cognitive models can make more complicated
predictions ACT-R/PM model is generally slower than
users
Further work
Cellular phones– This analysis does not work “out of the box” for
cellular phones– These phones have different tasks! (Golightly
2003) Hutchinson 3G UK phone task (Golightly 2003)
– Analysis of menu controls for cellular phone menus, included analytical model
– Interface became easier to use when more directional controls were provided
Further work
Analyzing ten additional designs– Easy if you use existing automatic models!
• Fifteen minutes for Fitts’ Law analysis• Forty-five minutes for 500 model runs
– Hard if you test with actual users!• Can take weeks to get scheduled• Humans miss appointments
Further work
This analysis is generalizable– The same procedures and techniques can be done
with other types of interfaces– Automatic models provide fast, easy analysis that
mirrors human performance– Must do task analysis first, otherwise you will test for
wrong tasks– The hard work (Fitts’ Law, ACT-R/PM, Segman) has
already been done– Cognitive models are available freely as open source
Thank you!
Any questions?
Why is this Computer Science?
Interfaces affect how computers are used (Project Ernestine)
Cognitive modeling is an inter-disciplinary effort
Automatic analysis similar to SPICE Analysis of visual search algorithms
– Random search: O(10*n)– Systematic search: O(10+n>0,<1)
References Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Lawrence
Erlbaum. Byrne, M. D. (1999). ACT-R Perceptual-Motor (ACT-R/PM) version 1.0b5: A users manual. Houston,
TX: Psychology Department, Rice University. Byrne, M. D. (2001). ACT-R/PM and menu selection: Applying a cognitive architecture to HCI.
International Journal of Human-Computer Studies, 55, 41-84. Card, S., Moran, T., & Newell, A. (1983). The psychology of human-computer interaction. Hillsdale,
NJ: Lawrence Erlbaum Associates, Inc. Emmerson, P. (2000). Review of iGEN software. Ergonomics in Design, 29-31. Fitts, P. M. (1954). The information capacity of the human motor system in controlling the amplitude of
movement. Journal of Experimental Psychology, 47, 381-391. Freed, M. A. (1998). Simulating performance in complex, dynamic environments. Northwestern,
Evanston, IL. Golightly, D. (2003). Personal communication. Gray, W. D., John, B. E., & Atwood, M. E. (1992). The precis of Project Ernestine or An overview of a
validation of GOMS. Proceedings of the CHI‘92 Conference on Human Factors in Computer Systems. Hornof, A. J., & Kieras, D. E. (1997). Cognitive modeling reveals menu search is both random and
systematic. Proceedings of the CHI‘97 Conference on Human Factors in Computer Systems, New York, NY.
References Ivory, M. Y., & Hearst, M. A. (2001). The state of the art in automating usability evaluation of
user interfaces. ACM Computing Surveys, 33(4), 470-516. Kieras, D. E., & Meyer, D. E. (1997). An overview of the EPIC architecture for cognition and
performance with application to human-computer interaction. Human-Computer Interaction, 12, 391-438.
Kieras, D. E., Wood, S. D., Abotel, K., & Hornof, A. (1995). GLEAN: A computer-based tool for rapid GOMS model usability evaluation of user interface designs. Proceedings of the ACM Symposium on User Interface Software and Technology (UIST'95), New York, NY.
Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press. Ritter, F. E. (2000). A role for cognitive architectures: Guiding user interface design. Seventh
Annual ACT-R Workshop, Department of Psychology, Carnegie-Mellon University. Ritter, F. E., & Young, R. M. (2001). Embodied models as simulated users: Introduction to
this special issue on using cognitive models to improve interface design. International Journal of Human-Computer Studies, 55, 1-14.
St. Amant, R., & Riedl, M. O. (2001). A perception/action substrate for cognitive modeling in HCI. International Journal of Human-Computer Studies, 55, 15-39.