IS 4800 Empirical Research Methods for Information Science Class Notes Feb 8, 2012

IS 4800 Empirical Research Methods for Information ScienceClass Notes Feb 8, 2012Instructor: Prof. Carole Hafner, 446 [email protected] Tel: 617-373-5116Course Web site: www.ccs.neu.edu/course/is4800sp12/

OutlineAssignment 2: Relational Agents for Patient Education studyAssignment 3: Descriptive Statistics ReportReview for testTeam project 1Survey research (cont.)Questionnaire constructionComposite measuresValidity and reliability

Assignment 2 Points to mentionRespect for personsSubjects can opt out/verbal and written consent obtainedRefusal will not impact medical care -- voluntaryStudy described in detail in recruitment letter -- informedGives procedures to ensure confidentialityParticipants given number to call if they have concernsBeneficenceLittle or no riskPotential for significant public benefit what benefitsBenefit to all diabetes patientsUse of relational agents for educating elderly/minorities/low literacy peopleJusticeParticipants may benefit personally (health + $)Minority patients in urban areas have 3X higher low health literacy therefore represent a class that would benefit the most.

Assignment 2 more points to mentionData safety & monitoring planIndependent oversight ensures plan is followedProvides extra protection for poor/minority patients (justice)Point of Study Subjects sectionDocument inclusion/exclusion criteriaDemonstrate there is a sufficient sample sizeShows disabled are not over-burdened (justice)HIPPAA issuesUse of data to pre-select without consentOpt-out initial consent processUse of phone interview to collect more data

Assignment 3Results were disappointingFrequency tables are only meaningful for categorial measures (gender and job category) unless you create intervals for numeric data.Histograms are meaningful for numeric measures (experience, call time, customer satisfaction)Crosstabs apparently could not figure out how to get percentsMost were able to get the scatter plotAbout half did the Custom TablesGrade of B for all the requested stats plus a minimal discussion

*3. Types of Questionnaire ItemsRestricted (close-ended)Respondents are given a list of alternatives and check the desired alternativeOpen-EndedRespondents are asked to answer a question in their own wordsPartially Open-EndedAn Other alternative is added to a restricted item, allowing the respondent to write in an alternative

*Types of Questionnaire ItemsRating ScaleRespondents circle a number on a scale (e.g., 0 to 10) or check a point on a line that best reflects their opinionsTwo factors need to be consideredNumber of points on the scaleHow to label (anchor) the scale (e.g., endpoints only or each point)Ranking question

*Types of Questionnaire ItemsA Likert Scale is a scale used to assess attitudesRespondents indicate the degree of agreement or disagreement to a series of statementsI am happy. Disagree 1 2 3 4 5 6 7 Agree

A Semantic Differential Scale allows participate to provide a rating within a bipolar space How are you feeling right now? Sad 1 2 3 4 5 6 7 Happy

*Sample Survey Questionshttp://www.custominsight.com/survey-question-types.asp

Composite Measures

*Psychological Conceptsaka ConstructsConstructs are general codifications of experience and observations.Observe differences in social standing -> concept of social status.Observe differences in religious commitment -> concept of religiosityMost psychological constructs have no ultimate definitionsConstructs are ad hoc summaries of experience and observations

*Composite MeasuresIndexes (aka scales) provide an ordinal ranking of respondents with respect to a construct of interest (e.g., liking of computers)Usually assessed through a series of related questions.

*Composite measuresIt is seldom possible to arrive at a single question that adequately represents a complex variable.Any single item is likely to misrepresent some respondents (e.g., church-going)A single item may not provide enough variation for your purposes.Single items give crude assessments; several items give a more comprehensive and accurate assessment.

Example Composite MeasureWorking Alliance Inventory (5 of 36 Qs)*

*OperationalizationThe process of specifying empirical observations that are indicators of the concept of interestBegin by enumerating all the subdimensions (factors) of the conceptReview previous researchUse commonsense

*Example: religiositySubdimensions/indicators/factors Ritual involvementE.g., going to churchIdeological involvement Acceptance of religious beliefsIntellectual involvementExtent of knowledge about religionExperiential involvementRange of religious experiences Consequential involvementExtent to which religion guides social decisions(there are many others)

*Discriminant indicatorsAlso think about related measures which should not be indicators of your constructIn particular if you will be measuring another related variable, make sure none of your indicators include any attributes of it. Example Want to study the relationship between religiosity and attitudes towards war => including a question about adherence to peace on earth doctrine is not a good idea.

*Picking items for a CompositeFace validityUnidimensionalityAll items measure same conceptShould provide variance in responsesDont pick items that classify everyone one way. If you are interested in a binary classification (e.g., liberal vs. conservative), each item should split respondents roughly in halfNegate up to half of the items to avoid response bias.

*Picking items: bivariate analysisEvery pair of items should be related, but not too stronglyScoring high on item A should increase likelihood of scoring high on item BBut, if two items are perfectly correlated (e.g. one logically implies the other), then one can be dropped.Should also look at combinations of >2 items to ensure that they all provide additional information.

*Scoring a Composite MeasureAverage the item scoresWeight items equally unless you have a compelling reason to do otherwiseMissing data Omit datasetImpute average/intermediate scoreLast value forward for repeated measuresMany other strategies

*5. ExampleNU Husky Fanatic

What are some factors?What are some items per factor?

Designing a Composite MeasureLiterature ReviewPrevious measures, theoretical conceptsBrainstorm on FactorsBrainstorm on ItemsPreliminary /Validity Reliability testing

Factor analysis

Reliability testing

Validity testing

Validity and ReliabilityReliability of a measureValidity of a measureEspecially composite measures of constructsValidity of claims about association of IV and DVInternalExternal

*Internal ValidityINTERNAL VALIDITY is the degree to which your design tests what it was intended to testIn an experiment, internal validity means showing the observed difference in the dependent variable is truly caused by changes in the independent variableIn correlational research, internal validity means that observed difference in the value of the criterion variable are truly related to changes in the predictor variableInternal validity is threatened by Extraneous and Confounding variablesInternal validity must be considered during the design phase of research

*External ValidityEXTERNAL VALIDITY is the degree to which results generalize beyond your sample and research settingExternal validity is threatened by the use of a highly controlled laboratory setting, restricted populations, pretests, demand characteristics, experimenter bias, and subject selection bias (such as volunteer bias)Steps taken to increase internal validity may decrease external validity and vice versaInternal validity may be more important in basic research; external validity, in applied research

*Factors Affecting External Validity

Reactive testingA pretest may affect reactions to an experimental variable.Interactions between selection biases and the independent variableResults may apply only to subjects representing a unique group.Reactive effects of experimental arrangementsArtificial experimental manipulations or the subjects knowledge that he or she is a research subject may affect results.Multiple treatment interferenceExposure to early treatments may affect responses to later treatments.

Internal vs. External Validity of a study..Internal: appropriate methods (well designed)conducted properlydata analyzed correctlycorrect inferencereplicability: could someone else conduct your study and get the same result?External:generalize-ability

Extraneous and Confounding Variables(impact on internal validity)Extraneous variable influences the DV.Confounding variable influences BOTH the IV and DV. Ice cream and drowning deaths.The most dangerous type of Extraneous variableMust be considered during design of a study

ExamplesConfounding variable (very difficult to address)A study of the effect of larger vs. smaller monitors on performance. Larger monitors have better speakers. (correlation w/IV). Perhaps the performance difference is due to the speakers.Other extraneous variable (can be addressed by sample restriction, matched group assignment , statistical methods)Task time on 2 word processors: typing skill. Can control by only using subjects with one skill level, matching skills levels among groups, multivariate analysis.

Extraneous variables

*Example:You want to evaluate a new sensor to detect whether people are happy or not.You hire actors and randomly assign them to act happy or sad, and test your sensors on them.What kind of validity (internal/external) might be challenged?

*Example:You conduct the Conversational Agents to Promote Health Literacy study by assigning the first 30 patients who volunteer to the intervention group, and the next 30 to the control group.What kind of validity (internal/external) might be challenged?

*Research SettingsThe laboratory settingAffords greatest control over extraneous variablesSimulationsAttempt to recreate the real world in the laboratoryRealism is an issueThe field settingStudy conducted in a real world environmentField experiment: Manipulate variables in the fieldHigh degree of external validity, but internal validity may be low

*Validating a Composite Measure

*What is a validated measure?Has reliabilityHas validity

For psychological measures, these are collectively referred to as a measures psychometrics.

*Measure ReliabilityA reliable measure produces similar results when repeated measurements are made under identical conditionsReliability can be established in several waysTest-retest reliability: Administer the same test twiceParallel-forms reliability: Alternate forms of the same test usedSplit-half reliability: Parallel forms are included on one test and later separated for comparison

*ReliabilityFor surveys, this also encompasses internal consistency: Do all of the questions address the same underlying construct of interest?That is, do scores covary?A standard measure is Cronbachs alpha0 = no correlation1 = scores always covary in the same way0.7 used as conventional threshold

*Increasing the Reliability of a QuestionnaireCheck to be sure the items on your questionnaire are clearly written and appropriate for those who will complete your questionnaireIncrease the number of items on your questionnaireStandardize the conditions under which the test is administered (e.g., timing procedures, lighting, ventilation, instructions)Make sure you score your questionnaire carefully, eliminating scoring errors

Volunteer BiasHow can it affect external validity?

Characteristics of volunteers?

How do you address volunteer bias?

Characteristics of Individuals Who Volunteer for ResearchMaximum Confidence1.tend to be more highly educated than nonvolunteers2.tend to come from a higher social class than nonvolunteers3.are of a higher intelligence in general, but not when volunteers for atypical research (such as hypnosis, sex research)4.have a higher need for approval than nonvolunteers5.are more social than nonvolunteers

Considerable ConfidenceVolunteers are more arousal seeking than nonvolunteers (especially when the research involves stress)Individuals who volunteer for sex research are more unconventional than nonvolunteersFemales are more likely to volunteer than males, except when the research involves physical or emotional stressVolunteers are less authoritarian than nonvolunteersJews are more likely to volunteer than Protestants; however, Protestants are more likely to volunteer than CatholicsVolunteers have a tendency to be less conforming than nonvolunteers, except when the volunteers are female and the research is clinically orientedSource: Adapted from Rosenthal & Rosnow, 1975.

Remedies for Volunteer BiasMake your appeal very interestingMake your appeal as nonthreatening as possibleExplicitly state the theoretical and practical importance of your researchExplicitly state why the target population is relevant to your researchOffer a small reward for participation

Have a high-status person make the appeal for participantsAvoid research that is physically or psychologically stressfulHave someone known to participants make the appealUse public or private commitment to volunteering when appropriate

*The degree to which a measure corresponds to what happens in the real world.Example:Assessing productivity/day in the lab vs.Assessing productivity/day in the office

Ecological Validity

*Concerns with MeasuresSensitivityIs a dependent measure sensitive enough to detect behavior change?An insensitive measure will not detect subtle behaviorsRange EffectsOccur when a dependent measure has an upper or lower limitCeiling effect: When a dependent measure has an upper limitFloor effect: When a dependent measure has a lower limit.

*ExampleYou want to assess the effect of TV viewing on whether people like large computer monitors or not (yes/no).You run an experiment in which participants are randomized to watch either 2 hrs or 0 hrs of TV per day for a week, then answer your question.

Whats going on?ParticipantCondition LikesLargeMonitors1TV Yes2No TV Yes3TV Yes4No TV Yes

*Developing a New MeasureSay you decide you need a new survey measure, attitude towards large computer monitors (ATLCM)I like big monitors.Big monitors make me nervous.I prefer small monitors, even if they cost more.7-pt Likert scales

How would you validate this measure?

*ExampleYou want to assess the effect of TV viewing on attitude towards large computer monitors (ATLCM).You run an experiment in which participants are randomized to watch either 2 hrs or 0 hrs of TV per day for a week, then fill out the ATLCM.

Whats going on?ParticipantCondition ATLCM1TV 7.02No TV 6.73TV 6.94No TV 7.0

*Measure ValidityA valid measure measures what you intend it to measureVery important when using psychological tests (e.g., intelligence, aptitude, (un)favorable attitude)Validity can be established in a variety of waysFace validity: Assessment of adequacy of content. Least powerful methodContent validity: How adequately does a variable sample the full range of behavior it is intended to measure?

*Criterion-related validity: How adequately does a test score match some criterion score? Takes two formsConcurrent validity: Does test score correlate highly with score from a measure with known validity?Predictive validity: Does test predict behavior known to be associated with the behavior being measured?Measure Validity

*Construct validity: Do the results of a test correlate with what is theoretically known about the construct being evaluated?Convergent validity (subtype): measures of constructs that should be related to each other areDiscriminant validity (subtype): measures of constructs that should not be related are not

Measure Validity

*ExampleSeniorityAssume we have good evidence for this model of the world..

We now propose a new measure for ProductivityWhat would be evidence for convergent validity?What would be evidence for discriminant validity?

*Validation - SummaryReliabilityTest-retestInternal consistencyValidityFaceContentCriterion-relatedConcurrentPredictiveConstructConvergentDiscriminant

*SamplingYou should obtain a representative sampleThe sample closely matches the characteristics of the populationA biased sample occurs when your sample characteristics dont match population characteristicsBiased samples often produce misleading or inaccurate resultsUsually stem from inadequate sampling procedures

**Magic number is 7-10, 7 is most common.Anchorning endpoints is sufficient. *****Someone hands this to you to use as a measure what do you do (how to code)?=> Want to wind up with a single measure/number per subject, but based on several Qs.Why several Qs? (ea. Provides evidence for some underlying attitude that we cannot measure directly).Dont want a numeric measure (meaningless), just want to be able to rank order Ss wrt their attitude. More questions USUALLY provides better reliability (WHY? Errors in interpretation, errors in assoc bet Q & construct for a given S)Why reverse code items? (response bias)In the next homework you do, Im goinig to ask you to design one of these. **********Extraneous variables one kind of confounding variable. E.g., dont randomize subjects and dont control for Seniority in the Monitor->Performance study.Confounding variables not under control, but may systematically covary with IV. E.g., Larger monitors in our study have high quality speakers which affects performance & satisfaction. *****Example: comparing two accounting systems install first for 30 day trial, send an email to all admins in company, take first 20 respondants. Then install second for 30 day trial, send out another email and have second group try the second. Volunteers are more highly educated, have higher intelligence -> may rate more complex system higher, even though the average company admin would be completely lost.*****************************

Documents

IS 4800 Empirical Research Methods for Information Science Class Notes Feb 8, 2012