Instructor: Vincent Duffy, Ph.D. Associate Professor Lecture 10: Research, Design & Evaluation Tues. Feb. 20, 2007 IE 486 Work Analysis & Design II

Instructor: Vincent Duffy, Ph.D.

Associate Professor

Lecture 10: Research, Design & Evaluation

Tues. Feb. 20, 2007

IE 486 Work Analysis & Design II

Administrative – revised & corrected

New material from last weekand this week

• Lab meeting on Friday - week 2 of 5 lifelong learning, return exams, introduce lab 3, lab 2 due next week in class

• Thurs. in class: instructions on registering CPS Clickers

IE 486 Lecture 10 - QOTD

• Q.1. Give an example of applied research • Q.2. Give an example illustrating the goal of the

research?• Q.3. Validity is accuracy? Validity is consistency? Circle

one. What is validity comprised of?• Q.4 Is it difficult to get consent for human factors

experiments? Why or why not? • Q.5. What is a factorial experiment?• Q.6 What is meant by between-subjects design? Within-

subjects design? Mixed-design? What are the advantages and disadvantages of each?

• Q.7. What are examples of ‘descriptive statistics’? ‘Inferential statistics’? What is an ‘Interaction’?

• Q.8. What is meant by statistical significance? What is the difference between statistical vs. practical significance? Does a 3 % difference justify replacement costs of equipment even when significant?

What do we measure?• For instance, we may measure the influence

of …• Stimulus variables

• On

• Behavioral variables– Eg. An alarm/buzzer was sounded and shows

the desired response by• An air-traffic controller

– To complex auditory and visual messages received

• How does this relate to independent and dependent variables?

Independent and dependent variables

• Independent variable is• Typically manipulated by the researcher• A dependent variable is?• The effect

– Such effects for behavioral variables can be objective (eg. Speed of response, heart rate) or subjective (responses to Likert-scale questionnaire)

Why do we measure?: Or what is the goal of a science?• To gather facts

– True or false?• False really.

– More to understand, predict and control• How?• Through theory.• What is a theory?

– A set of interrelated concepts, definitions and propositions that show a systematic view of a phenomena

• What is F=ma?• A theory that has been validated (verified).

What is a ‘science’ and what are the foundations?• A ‘science’ is a….• Not so much a ‘thing’, as much as it is a

‘process’– Proctor and Van Zandt, 1994

• A process for…• Gathering knowledge about the world• What are empirical results?• An important foundation of a ‘science’

– efforts to verify the ‘knowledge’ by observation

Steps in the scientific process

– Observe some phenomenon– State the problem– Development of hypothesis(es)– Conduct experiment– Evaluate hypothesis(es)– Possibly develop new hypothesis(es)– Disseminate the information

Validity of a measure…• Construct validity, Internal validity, External validity• Construct validity

• Did we manipulate what we wanted to properly.– eg. Did we set levels of certain variables during testing

and did we correctly measure the dependent variable (the effect)

• Different from accuracy (of the measure)– It asks ‘did we measure the right thing?’

• Internal validity– The independent variables caused a change in the

effects being measured• Without confounds

Validity• A confound?

– Eg. Improvements in performance of the workers was believed to be related to a new TQM process recently implemented

• However, it turned out that wages were recently increased

• External validity• The degree to which we can generalize the results.

• Which has higher external validity?– Lab study or field study?– Are lab studies generalizable?

Between-subjects & within-subjects tests

• Two or more groups are tested• in the different conditions

– Eg. Speed of selection under 2 conditions: Laptop computer using touchpad and Desktop 21” computer using mouse. One group uses the Laptop 13” and the other group uses the desktop.

• Good test?– Not really. Poor internal validity. There is a confound.

– Can’t vary computer/monitor (at least 2 levels) and input device (2 levels) without considering at least 4 test conditions.

– In the above case, that would require 4 different groups for a between subjects test.

Between-subjects & within-subjects tests

• How about the following example? • Laptop computer using touchpad and Laptop

computer with trackball.

• Within-subjects testing…• Each person is tested in all conditions

Mixed design?• Eg. Suppose we want to test 2 variables: mobile phone

(phone/no phone) and temperature (cold/warm day) and the impact on driving performance– Suppose 10 subjects were tested on a warm day and 10

subjects were tested on a cold day.– On the cold day, all 10 subjects were tested in both

conditions (with and without using phone)– On the warm day, all 10 subjects were tested in both

conditions also (with and without using phone)

• Therefore, temperature is a between subjects variable and mobile phone is a within subjects variable.

Advantages and disadvantages of within-subjects design• Advantages

– Fewer subjects are needed – Less likely to have concern about individual differences

• Disadvantages– More likely to have an order effect.– Some variables can not be tested in multiple conditions

(eg. With and without sound).

• How to offset the order (statistical) effect that is likely to show up?– Counterbalance – randomize the order of presentation

of the conditions.

Hypothesis testing, Type I & Type II error

• We believe traffic conditions will influence driving performance as measured by ‘no. of times out of lane’.

• How do you state this for ‘statistical testing’?• Null hypothesis:

– The treatment had no effect.

• Alternate: – The treatment had an effect.


• Null hypothesis:

– The traffic conditions had no effect driving performance.• Alternate:

– Reject the null if the traffic conditions had an effect.• Typically Type I error if <0.05

– Implies 5% chance we rejected the null incorrectly• Or…The chance we found a significant result that wasn’t

there.

• Type II error ( )

– The chance that we didn’t find an effect when it was really there

• In other words we didn’t reject null when we should have.


• How do we interpret the results of the data?– Typically reject null if p < (usually =0.05)– probability we reject the null incorrectly is less than 5%

• If we didn’t find an effect, does it mean our idea was incorrect?

• Maybe not, it just means you didn’t find a difference.

Why might you not find a difference if it is there?• Could be insufficient sample size• Or bad measures – bad measurement

instrument or difficult to measure• Some variables not controlled• Some procedures of the experiment may

not have been controlled

Factorial designs• A 2x2 design?

– Speed of selection under 2 conditions: Laptop computer using touchpad and Desktop computer using mouse. One group uses the Laptop and the other group uses the desktop.

• Better example would show 2 variables x 2 levels• Why was the previous example not good?

– due to the confound.• Also, we hope the two groups (of

participants) represent the same population

– that they are randomly chosen and are similar …

• However, we know there will be some differences – we hope they will not make the effect too difficult to detect

(if it is there).

Factorial designs• A 2x2 design?

• How about: computer type and input device?– Could be: 2 variables each with 2 types (conditions or levels)

• Eg. Laptop computer 14” using touchpad and Laptop computer 14” with trackball; desktop computer 14” monitor using touchpad and desktop computer 14” monitor using trackball

Factorial Designs

Evaluation of results: Interactions

Documents

Instructor: Vincent Duffy, Ph.D. Associate Professor Lecture 10: Research, Design & Evaluation Tues. Feb. 20, 2007 IE 486 Work Analysis & Design II