28
Conducting a User Conducting a User Study Study Human-Computer Human-Computer Interaction Interaction

Conducting a User Study

  • Upload
    crwys

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Conducting a User Study. Human-Computer Interaction. Overview. What is a study? Empirically testing a hypothesis Evaluate interfaces Why run a study? Determine ‘truth’ Evaluate if a statement is true. Example Overview. Ex. The heavier a person weighs, the higher their blood pressure - PowerPoint PPT Presentation

Citation preview

Page 1: Conducting a User Study

Conducting a User StudyConducting a User Study

Human-Computer InteractionHuman-Computer Interaction

Page 2: Conducting a User Study

OverviewOverview

What is a study?What is a study?Empirically testing a hypothesisEmpirically testing a hypothesisEvaluate interfacesEvaluate interfaces

Why run a study?Why run a study?Determine ‘truth’Determine ‘truth’Evaluate if a statement is trueEvaluate if a statement is true

Page 3: Conducting a User Study

Example OverviewExample Overview Ex. The heavier a person weighs, the higher Ex. The heavier a person weighs, the higher

their blood pressuretheir blood pressure Many ways to do this:Many ways to do this:

Look at data from a doctor’s officeLook at data from a doctor’s office Descriptive design: Descriptive design: What’s the pros and cons?What’s the pros and cons? Get a group of people to get weighed and measure their BPGet a group of people to get weighed and measure their BP Analytic design: Analytic design: What’s the pros and cons?What’s the pros and cons? Ideally?Ideally?

Ideal solution: have everyone in the world get Ideal solution: have everyone in the world get weighed and BPweighed and BP

Participants are a Participants are a samplesample of the population of the population You should immediately question this!You should immediately question this! Restrict populationRestrict population

Page 4: Conducting a User Study

Study ComponentsStudy Components

DesignDesignHypothesisHypothesisPopulationPopulationTaskTaskMetricsMetrics

ProcedureProcedureData AnalysisData AnalysisConclusionsConclusionsConfounds/BiasesConfounds/Biases

Page 5: Conducting a User Study

Study DesignStudy Design

How are we going to evaluate the How are we going to evaluate the interface?interface?HypothesisHypothesis

What statement do you want to evaluate?What statement do you want to evaluate?

PopulationPopulationWho?Who?

MetricsMetricsHow will you measure?How will you measure?

Page 6: Conducting a User Study

HypothesisHypothesis Statement that you want to evaluateStatement that you want to evaluate

Ex. A mouse is faster than a keyboard for numeric Ex. A mouse is faster than a keyboard for numeric entryentry

Create a hypothesisCreate a hypothesis Ex. Participants using a keyboard to enter a string of Ex. Participants using a keyboard to enter a string of

numbers will take less time than participants using a numbers will take less time than participants using a mouse.mouse.

Identify Independent and Dependent VariablesIdentify Independent and Dependent Variables Independent VariableIndependent Variable – the variable that is being – the variable that is being

manipulatedmanipulated by the experimenter ( by the experimenter (interaction interaction methodmethod))

Dependent Variable Dependent Variable – the variable that is caused by – the variable that is caused by the independent variable. (the independent variable. (timetime))

Page 7: Conducting a User Study

Hypothesis TestingHypothesis Testing Hypothesis:Hypothesis:

People who use a mouse and keyboard will be faster to fill out a People who use a mouse and keyboard will be faster to fill out a form than keyboard alone.form than keyboard alone.

US Court system: Innocent until proven guiltyUS Court system: Innocent until proven guilty NULL Hypothesis: Assume people who use a mouse NULL Hypothesis: Assume people who use a mouse

and keyboard will fill out a form than keyboard alone in and keyboard will fill out a form than keyboard alone in the same amount of timethe same amount of time

Your job to prove that the NULL hypothesis isn’t true!Your job to prove that the NULL hypothesis isn’t true! Alternate Hypothesis 1: People who use a mouse and Alternate Hypothesis 1: People who use a mouse and

keyboard will fill out a form than keyboard alone, either keyboard will fill out a form than keyboard alone, either faster or slower.faster or slower.

Alternate Hypothesis 2: People who use a mouse and Alternate Hypothesis 2: People who use a mouse and keyboard will fill out a form than keyboard alone, faster.keyboard will fill out a form than keyboard alone, faster.

Page 8: Conducting a User Study

PopulationPopulation The people going through your studyThe people going through your study AnonymityAnonymity Type - Two general approachesType - Two general approaches

Have lots of people from the general publicHave lots of people from the general public Results are generalizableResults are generalizable Logistically difficultLogistically difficult People will always surprise you with their variancePeople will always surprise you with their variance

Select a niche populationSelect a niche population Results more constrainedResults more constrained Lower varianceLower variance Logistically easierLogistically easier

NumberNumber The more, the betterThe more, the better How many is enough? How many is enough? LogisticsLogistics

Recruiting (n>20 is pretty good)Recruiting (n>20 is pretty good)

Page 9: Conducting a User Study

Two Group DesignTwo Group Design

Design StudyDesign StudyGroups of participants are called Groups of participants are called conditionsconditionsHow many participants?How many participants?Do the groups need the same # of Do the groups need the same # of

participants?participants?TaskTask

What is the task?What is the task?What are considerations for task?What are considerations for task?

Page 10: Conducting a User Study

DesignDesign External validity External validity – do your results mean – do your results mean

anything?anything? Results should be similar to other similar studiesResults should be similar to other similar studies Use accepted questionnaires, methodsUse accepted questionnaires, methods

Power Power – how much meaning do your results – how much meaning do your results have?have? The more people the more you can say that the The more people the more you can say that the

participants are a sample of the populationparticipants are a sample of the population Pilot your studyPilot your study

GeneralizationGeneralization – how much do your results – how much do your results apply to the true state of thingsapply to the true state of things

Page 11: Conducting a User Study

DesignDesignPeople who use a mouse and keyboard People who use a mouse and keyboard

will be faster to fill out a form than will be faster to fill out a form than keyboard alone.keyboard alone.

Let’s create a study designLet’s create a study designHypothesisHypothesisPopulationPopulationProcedureProcedure

Two types:Two types:Between SubjectsBetween SubjectsWithin SubjectsWithin Subjects

Page 12: Conducting a User Study

ProcedureProcedure

Formally have all participants sign up for a Formally have all participants sign up for a time slot (if individual testing is needed)time slot (if individual testing is needed)

Informed Consent (let’s look at one)Informed Consent (let’s look at one)Execute studyExecute studyQuestionnaires/Debriefing (let’s look at Questionnaires/Debriefing (let’s look at

one)one)

Page 13: Conducting a User Study

IRBIRB

http://irb.ufl.edu/irb02/index.htmlLet’s look at a completed oneLet’s look at a completed oneYou MUST turn one in before you You MUST turn one in before you

complete a study to the TAcomplete a study to the TAMust have OKed before running studyMust have OKed before running study

Page 14: Conducting a User Study

BiasesBiases Hypothesis GuessingHypothesis Guessing

Participants guess what you are trying hypothesisParticipants guess what you are trying hypothesis

Learning BiasLearning Bias User’s get better as they become more familiar with the taskUser’s get better as they become more familiar with the task

Experimenter BiasExperimenter Bias Subconscious bias of data and evaluation to find what you want Subconscious bias of data and evaluation to find what you want

to findto find

Systematic BiasSystematic Bias Bias resulting from a flaw integral to the system Bias resulting from a flaw integral to the system

E.g. An incorrectly calibrated thermostatE.g. An incorrectly calibrated thermostat

List of biasesList of biases http://en.wikipedia.org/wiki/List_of_cognitive_biaseshttp://en.wikipedia.org/wiki/List_of_cognitive_biases

Page 15: Conducting a User Study

ConfoundsConfounds Confounding factors Confounding factors – factors that affect – factors that affect

outcomes, but are not related to the study outcomes, but are not related to the study Population confoundsPopulation confounds

Who you get?Who you get? How you get them?How you get them? How you reimburse them?How you reimburse them? How do you know groups are equivalent?How do you know groups are equivalent?

Design confoundsDesign confounds Unequal treatment of conditionsUnequal treatment of conditions LearningLearning Time spentTime spent

Page 16: Conducting a User Study

MetricsMetrics

What you are measuringWhat you are measuringTypes of metricsTypes of metrics

ObjectiveObjectiveTime to complete taskTime to complete taskErrorsErrorsOrdinal/ContinuousOrdinal/Continuous

SubjectiveSubjectiveSatisfactionSatisfaction

Pros/Cons of each type?Pros/Cons of each type?

Page 17: Conducting a User Study

AnalysisAnalysis

Most of what we do involves:Most of what we do involves:Normal Distributed ResultsNormal Distributed Results Independent TestingIndependent TestingHomogenous PopulationHomogenous Population

Recall, we are testing the hypothesis by Recall, we are testing the hypothesis by trying to prove the NULL hypothesis falsetrying to prove the NULL hypothesis false

Page 18: Conducting a User Study

Raw DataRaw Data Keyboard timesKeyboard times

What does What does mean mean mean?mean? What does What does variance variance and and standard deviation standard deviation mean?mean? E.g. 3.4, 4.4, 5.2, 4.8, 10.1, 1.1, 2.2E.g. 3.4, 4.4, 5.2, 4.8, 10.1, 1.1, 2.2 Mean = 4.46Mean = 4.46 Variance = 7.14 (Excel’s VARP)Variance = 7.14 (Excel’s VARP) Standard deviation = 2.67 (sqrt variance)Standard deviation = 2.67 (sqrt variance)

What do the different statistical data tell us?What do the different statistical data tell us? User study.xlsUser study.xls

Page 19: Conducting a User Study

What does Raw Data Mean?What does Raw Data Mean?

Page 20: Conducting a User Study

Roll of ChanceRoll of Chance

How do we know how much is the ‘truth’ How do we know how much is the ‘truth’ and how much is ‘chance’?and how much is ‘chance’?

How much confidence do we have in our How much confidence do we have in our answer?answer?

Page 21: Conducting a User Study

HypothesisHypothesis

We assumed the means are “equal”We assumed the means are “equal”But are they? But are they? Or is the difference due to chance?Or is the difference due to chance?

Ex. A Ex. A μμ00 = 4, = 4, μμ11 = 4.1 = 4.1

Ex. B Ex. B μμ00 = 4, = 4, μμ11 = 6 = 6

Page 22: Conducting a User Study

T - testT - test

T – test – statistical test used to determine T – test – statistical test used to determine whether two observed means are whether two observed means are statistically differentstatistically different

Page 23: Conducting a User Study

T-testT-test DistributionsDistributions

Page 24: Conducting a User Study

T – test T – test

(rule of thumb) Good values of t > 1.96(rule of thumb) Good values of t > 1.96Look at what contributes to tLook at what contributes to thttp://socialresearchmethods.net/kb/http://socialresearchmethods.net/kb/

stat_t.htmstat_t.htm

Page 25: Conducting a User Study

F statistic, p valuesF statistic, p values F statistic – assesses the extent to which the F statistic – assesses the extent to which the

means of the experimental conditions differ more means of the experimental conditions differ more than would be expected by chancethan would be expected by chance

t is related to F statistict is related to F statistic Look up a table, get the Look up a table, get the pp value. Compare to value. Compare to αα α value – probability of making a Type I error α value – probability of making a Type I error

(rejecting null hypothesis when really true)(rejecting null hypothesis when really true) p p value – statistical likelihood of an observed value – statistical likelihood of an observed

pattern of data, calculated on the basis of the pattern of data, calculated on the basis of the sampling distribution of the statistic. (% chance sampling distribution of the statistic. (% chance it was due to chance)it was due to chance)

Page 26: Conducting a User Study

T and alpha valuesT and alpha values

Page 27: Conducting a User Study

  

Small Pattern Large Pattern

  

t – testwith unequal

variancep – value

t – testwith unequal variance

p - value

PVE – RSE vs. VFHE – RSE 3.32 0.0026** 4.39 0.00016***

PVE – RSE vs.HE – RSE 2.81 0.0094** 2.45 0.021*

VFHE – RSE vs.HE – RSE 1.02 0.32 2.01 0.055+

Page 28: Conducting a User Study

SignificanceSignificance What does it mean to be significant?What does it mean to be significant? You have some confidence it was not due to You have some confidence it was not due to

chance.chance. But difference between But difference between statisticalstatistical significance significance

and and meaningfulmeaningful significance significance Always know: Always know:

samples (samples (nn)) p valuep value variance/standard deviationvariance/standard deviation meansmeans