User Study Evaluation Human-Computer Interaction

User Study EvaluationHuman-Computer Interaction

Hypothesis

• A statement of prediction• Describes what you expect will happen in your study• Alternative hypothesis (H1) – your prediction, i.e. a claim of difference in the population• e.g. Participants will commit more errors with interface A

than with interface B

• Null hypothesis (H0) – No difference or no effect• e.g. Participants will commit the same number of errors

between interface A and interface B or Participants will commit more errors in interface B than with interface A

Hypothesis – one or two tailed?• Alternative hypothesis

• One-tailed: Participants will commit more errors with interface A than with interface B (i.e. directional)

• Two-tailed: There will be a significant difference in the number of errors participants commit with interface A than with interface B • but I don’t know if there will be more or fewer (i.e. non-

directional)

• Can’t prove the alternative hypothesis, can only reject the null hypothesis

• If your prediction was correct – reject null hypothesis• Not rejecting null hypothesis ≠ accepting it

Metrics

• What you are measuring• Some types of metrics

• Objective – facts of an event• Time to complete task (continuous)• Errors (discrete, i.e. distinct and separate, can be

counted)• Subjective – a person’s opinion

• Satisfaction

Metrics

• Types of metrics• Objective – facts of an event• Subjective – a person’s opinion• *Both* are important

• How to measure• Instrumentation – record data within your system• Questionnaires / Surveys

• Scales• Free-response

• Let’s discuss appropriateness of each• Let’s look at a very popular survey (SUS)

http://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html

Analysis•Most of what we do involves:

•Normal Distributed Results•Independent Testing•Homogenous Population

•Recall, we are testing the hypothesis by trying to prove the NULL hypothesis false

Analysis• 3 main steps for analysis

• Data Preparation: Cleaning and organizing the data for analysis• Checking the data for accuracy• Transforming data (e.g. reverse coding survey data)

• Descriptive Statistics: Describing the data• Provide simple summaries about the sample and the measures• Simply describing what is, what the data shows

• Inferential Statistics: Testing Hypotheses and Models• Try to infer from the sample data what the population thinks• Make judgments of the probability that an observed difference

between groups is a dependable one or one that might have happened by chance

Data preparation

• Checking data for accuracy• Are the responses legible/readable?

• Are all important questions answered?

• Are the responses complete?

• Is all relevant contextual information included (e.g., data, time, place, researcher)?

Data preparation• Data transformations

• Missing values• Depending on program, need designate specific values to

represent missing values, e.g. -99• Scale totals

• Add or average across individual items • Item reversals

• Likert scale – sometimes rating for items need to be reversed

• 1 (strongly disagree) – 5 (strongly agree)• “I generally feel good about myself.”• “Sometimes I feel like I'm not worth much as a person.”• What does a 5 mean in each case?

Descriptive statistics• Simple summaries of sample and measures, i.e. data• Describing what is or what the data shows• Central tendency – estimate of the “center” of a distribution of values• Mean – average across a set of values

• 15, 15, 18, 25, 33 = 106• µ = 106/5 = 21.2

• Median – score found in middle of a set of values• 15, 15, 18, 25, 33

• Mode – most frequently occurring value• 15, 15, 18, 25, 33

• Describe the data with a number and a graph

Inferential statistics• Try to reach conclusions that go beyond the immediate data – draw inferences

• e.g. want to compare the average performance of 2 groups to see if there’s a difference

t-test: statistical test used to determine whether two observed means are statistically different

t-test• What does it mean to say that the averages for two groups are statistically different?

t-test

• Variability is the noise that may make it harder to see the group difference

• Variance: measure of variability around the mean

• Standard deviation: square root of the variance

t – test • (rule of thumb) Good values of t > 1.96 (standard

deviations from the mean)

t-test• Once computed, look up t-value to see whether the ratio is large

enough to say that the difference between the groups is not likely to have been a chance finding.

• To test the significance, you need to set a risk level (called the alpha level). Accepted standard is alpha level of .05. • 5 times out of 100 you would find a statistically significant difference

between the means even if there was none (i.e., by "chance").

• Degrees of freedom (df). For t-test, the df = sum of the persons in both groups minus 2.

• Given the alpha level, the df, and the t-value, look up t-value to determine whether the t-value is large enough to be significant.

• If yes, conclude that difference between means for the 2 groups is different (even given the variability) and reject null hypothesis.

α and p values• α value – probability of making a Type I error (rejecting null hypothesis when really true)

• p value – probability that the effect found did not occur by chance. The lower the p value, the higher the statistical significance (the more rigorous the test)

Relationship between α and p values• Once the alpha level has been set, a statistic (like t) is

computed.

• Each statistic has an associated probability value called a p-value, or the likelihood of an observed statistic occurring due to chance, given the sampling distribution.

• Alpha sets the standard for how extreme the data must be before we can reject the null hypothesis. The p-value indicates how extreme the data are.

• Compare the p-value with alpha to determine whether the observed data are statistically significantly different from the null hypothesis

Kinds of t-tests

Formula is slightly different for each:• Single-sample:

• tests whether a sample mean is significantly different from a pre-existing value (e.g. norms)

• Paired-samples:

• tests the relationship between 2 linked samples, e.g. means obtained in 2 conditions by a single group of participants

• Independent-samples:

• tests the relationship between 2 independent populations

• Which test fits your situation?

t and alpha values

Independent samples t-test• Example: social presence questionnaire• “I perceived I was in the presence of a patient in the room

with me.”• http://www.vassarstats.net/tu.html

http://www.vassarstats.net/tu.html

Correlations

Correlations – relationship between two variablesPearon’s product-moment correlation coefficient – r

http://bdaugherty.tripod.com/KeySkills/lineGraphs.html



Correlations

Pearson’s product-moment correlation coefficient – r

http://www.socscistatistics.com/tests/pearson/Default2.aspx

http://en.wikipedia.org/wiki/Correlation_and_dependence






Documents

User Study Evaluation Human-Computer Interaction