27
How to Design and Evaluate Research in Education By Jack R. Fraenkel and Norman E. Wallen Chapter 1 The Nature of Research Ways of knowing Sensory experience (incomplete/undependable) Agreement with others (common knowledge wrong) Experts’ opinion (they can be mistaken) Logic/reasoning things out (can be based on false premises) Why research is of value Scientific research (using scientific method) is more trustworthy than expert/colleague opinion, intuition, etc. Chapter 1 - continued The Nature of Research Scientific Method (testing ideas in the public arena) Put guesses (hypotheses) to tests and see how they hold up All aspects of investigations are public and described in detail so anyone who questions results can repeat study for themselves Replication is a key component of scientific method Chapter 1 - continued The Nature of Research Scientific Method (requires freedom of thought and public procedures that can be replicated) Identify the problem or question Clarify the problem Determine information needed and how to obtain it Organize the information obtained Interpret the results All conclusions are tentative and subject to change as new evidence is uncovered (don’t PROVE things) Chapter 1 - continued The Nature of Research

Notes from frankel and wallen

Embed Size (px)

Citation preview

How to Design and Evaluate Research in Education

By Jack R. Fraenkel and Norman E. Wallen

Chapter 1

The Nature of Research Ways of knowing

Sensory experience (incomplete/undependable) Agreement with others (common knowledge wrong) Experts’ opinion (they can be mistaken) Logic/reasoning things out (can be based on false premises)

Why research is of value Scientific research (using scientific method) is more trustworthy than expert/colleague opinion,

intuition, etc.

Chapter 1 - continued

The Nature of Research Scientific Method (testing ideas in the public arena) Put guesses (hypotheses) to tests and see how they hold up All aspects of investigations are public and described in detail so anyone who

questions results can repeat study for themselves Replication is a key component of scientific method

Chapter 1 - continued

The Nature of Research Scientific Method (requires freedom of thought and public procedures that can be

replicated) Identify the problem or question Clarify the problem Determine information needed and how to obtain it Organize the information obtained Interpret the results

All conclusions are tentative and subject to change as new evidence is uncovered (don’t PROVE things)

Chapter 1 - continued

The Nature of Research

Types of Research Experimental (most conclusive of methods)

Researcher tries different treatments (independent variable) to see their effects (dependent variable)

In simple experiments compare 2 methods and try to control all extraneous variables that might affect outcome

Need control over assignment to treatment and control groups (to make sure they are equivalent)

Sometimes use single subject research (intensive study of single individual or group over time)

Chapter 1 - continued

The Nature of Research (Types of Research continued)

Correlational Research Looks at existing relationships between 2 or more variables to make better predictions

Causal Comparative Research Intended to establish cause and effect but cannot assign subjects to trtmt/control Limited interpretations (could be common cause for both cause and effect…stress causes

smoking and cancer) Used for identifying possible causes; similar to correlation

Chapter 1 - continued

The Nature of Research (Types of Research continued)

Survey Research Determine/describe characteristics of a group Descriptive survey in writing or by interview Provides lots of information from large samples Three main problems: clarity of questions, honesty of respondents, return rates

Ethnographic research (qualitative) In depth research to answer WHY questions Some is historical (biography, phenomenology, case study, grounded theory)

Chapter 1 - continued

The Nature of Research (Types of Research continued)

Historical Research Study past, often using existing documents, to reconstruct what happened Establishing truth of documents is essential

Action Research (differs from above types) Not concerned with generalizations to other settings Focus on information to change conditions in a particular situation (may use all the above

methods)

Each of these methods is valuable for a different purpose

Chapter 1 - continued

The Nature of Research General Research Types Descriptive (describe state of affairs using surveys, ethnography, etc.) Associational (goes beyond description to see how things are related so can

better understand phenomena using correl/causal-comparative Intervention (try intervening to see effects using experiments)

Chapter 1 - continued

The Nature of Research Quantitative v. Qualitative

Quantitative (numbers) Facts/feelings separate World is single reality Researcher removed Established research design Experiment prototype Generalization emphasized

Chapter 1 - continued

The Nature of Research Meta-Analysis

Locate all the studies on a topic and synthesize results using statistical techniques (average the results)

Critical Analysis of Research (some say all research is flawed) Question of reality (are only individual perceptions of it) Question of communication (words are subjective) Question of values (no objectivity only social constructs) Question of unstated assumptions (researchers don’t clarify assumptions that guide them) Question of societal consequences (research serves political purposes that are conservative or

oppressive; preserve status quo)

Chapter 1 - continued

The Nature of Research

Overview of the Research Process (Fig. 1.4)

Introduction chapter Problem statement that includes some background info and justification for

study Exploratory question or hypothesis (relationship among variables clearly

defined); goes last in Ch. Definitions (in operational terms) Review of related literature (other studies of the topic read and summarized to

shed light on what is already known)

Chapter 1 - continued

The Nature of Research

Overview of the Research Process (Fig. 1.4)

Methods chapter Subjects (sample, population, method to select sample) Instruments (tests/measures described in detail and with rationale for their use) Procedures (what, when, where, how, and with whom);

Give schedule/dates, describe materials used, design of study, and possible biases/threats to validity

4. Data analysis (how data will be analyzed to answer research questions or test hypothesis)

Chapter 2

The Research Problem Statement of the Problem (identify a problem/area of concern to

investigate) Must be feasible, clear, significant, ethical

Research Questions (serve as focus of investigation, see p. 28 list) Some info must be collected that answers them (must be researchable) Cannot research “should” questions See diagram, p. 29

Chapter 2 - Continued

The Research Problem RQ should be feasible (can be investigated with available resources) RQ should be clear (specifically define terms used…operational needed, but give

both) Constitutive definitions (dictionary meaning) Operational definitions (specific actions/steps to measure term; IQ=time to solve puzzle,

where <20 sec. is high; 20-40 is med.; 40+ is low)

RQ should be significant (worth investigating; how does it contribute to field and who can use info)

RQs often investigate relationships (two characteristics/qualities tied together)

Chapter 3

Variables and Hypotheses Important to study relationships Sometimes just want to describe (use RQ)

Usually want to look for patterns/connections Hypothesis predicts the existence of a relationship

Variables (anything that can vary in measure; opposite of constant) Variables must be clearly defined Often investigate relationship between variables

Chapter 3 - Continued

Variables and Hypotheses Variable Classifications (Fig. 3.4, p. 42)

Quantitative (variables measured as a matter of degree, using real numbers; i.e. age, number kids)

Categorical (no variation…either in a category or not; i.e. gender, hair color) Independent: the cause (aka the manipulated, treatment or experimental variable) Dependent: the effect (aka outcome variable) Extraneous: uncontrolled IVs (see Fig. 3.2, p. 46)

All extraneous variables must be accounted for in an experiment

Chapter 3 - Continued

Variables and Hypotheses Hypotheses – predictions about possible outcome of a study; sometimes several hypotheses

from one RQ (Fig 3.3) RQ: Will athletes have a higher GPA that nonathletes?

H: Athletes will have higher GPAs that nonathletes

Advantages to stating a hypothesis as well as RQ Clarifies/focuses research to make prediction based on previous research/theory

Multiple supporting tests to confirm hypothesis strengthens it

Disadvantages Can lead to bias in methods (conscious or un) to try to support hypothesis

Sometimes miss other important info due to focus on hypothesis (peer review/replication is a check on this)

Chapter 3 - Continued

Variables and Hypotheses Some hypothesis more important than others Directional v. nondirectional Directional says which group will score higher/do better Nondirectional just indicates there will be a difference, but not who will score

higher/do better Directional more risky, so be careful/tentative in using directional ones

Chapter 4

Ethics and Research Examples of unethical practices Requiring participation from powerless (students) Using minors without parental permission

Deleting data that don’t agree w/ hypothesis Invading privacy of subjects Physically or psychologically harming subjects

APA statement of ethical principles in research Each student must sign one and have it signed by workplace supervisor

Chapter 4 - Continued

Ethics and Research Protecting participants from harm requires informed consent

Subjects must know the purpose of the study, possible benefits/harm; participation is voluntary and they can

w/draw without penalty any time (Fig. 4.3, p. 59)

Researchers should ask: Could subjects be harmed? Is there another way to get the info? Is the info valuable enough to justify study?

Researchers must ensure confidentiality of data (limit access; no names if possible; tell subjects confidential or anonymous)

Deceiving subjects is sometimes necessary (Milgram study), ask if results justify ethical lapse When deception used subjects they should be okay with it after (and they can refuse use of their data)

Chapter 4 - Continued

Ethics and Research Research with children

Parental consent required (signed permission from parents APA Ethics in Research Form addresses this also

Regulation of Research (National Research Act of 1974) If federal funding received must have an IRB to check: risks to subjects, informed consent

guidelines met, debriefing plans for subjects HHS made changes in 1981 so that educational research is exempt under certain conditions

Chapter 5

Review of the Literature Value of the Literature Review

Glean ideas from others interested in topic See results of related studies (must be able to evaluated those objectively)

Types of sources General References – indexes (of primary sources and abstracts (ERIC, Psych Abstracts) Primary Sources – publications where researchers report their results (peer reviewed/refereed

journals) Secondary Sources – publications where authors describe works of others (encyclopedias,

tradebooks, textbooks)

Chapter 5 - Continued

Review of the Literature Steps in the Literature Review (manual or electronic) See examples p.

74 Define problem precisely as possible Review some secondary sources* Review some general reference works* Formulate search terms (keywords/descriptors) Search general references for primary sources Obtain and read primary sources (make notes/summarize)

*May be based on existing knowledge or previous reading

Chapter 5 - Continued

Review of the Literature Making notes

Include problem/purpose; hypotheses/RQ; procedures w/ subjects/methods; findings/conclusions; citation!

Searching strategies…use Boolean operators (AND, OR, NOT) Searching www…be careful of reliability

Writing up the Literature Review Introduction - describes problem and justification for study;

Body – discuss related studies together (#2, p.88) Summary – ties literature together/give conclusions arising from literature

Reference list

Don’t replace a review of primary sources with meta-analysis (a combined review of all available research on a topic w/ results averaged)

End Part 1

Chapter 6

Sampling Sample – any group on which info is obtained Population – group that researcher is trying to represent Population must be defined first; more closely defined, easier to do, but less

generalizable Study a subset of the population because it is cheaper, faster, easier, and if

done right, get same results as a census (study of whole pop) Accessible population – the group you are able to realistically generalize to…may

differ from target population

Chapter 6 - Continued

Sampling (Random v. Nonrandom Sampling)

Random – every population element has an equal and independent chance to participate Uses names in a hat or table or random numbers Elimination of bias in selecting the sample is most important (meaning the

researcher does not influence who gets selected) Ensuring sufficient sample size is second most important

Nonrandom/purposive - troubles with representativeness/generalizing Chapter 6 - Continued

Sampling (Random Sampling Methods)

Simple random sampling Names in a hat or table of random numbers--p.99 Larger samples more likely to represent pop. Any difference between population and sample is random and small (called random sampling

error)

Stratified random sampling Ensures small subgroups (strata) are represented Normally proportional to their part of pop. Break pop into strata, then randomly select w/in strata Multistage sampling (see p. 94)

Chapter 6 - Continued

Sampling (Random Sampling Methods, cont.)

Cluster random sampling Select groups as sample units rather than individuals REQUIRES a large number of groups/clusters Multistage sampling (see p. 94)

Systematic (Nth) sampling Considered random is list if randomly ordered or nonrandom if systematic w/

random starting point Divide pop size by sample size to get N (ps/ss=N)

Chapter 6 - Continued

Sampling (Non-Random Sampling Methods)

Systematic can be nonrandom if list is ordered Convenience sampling

Using group that is handy/available (or volunteers) Avoid, if possible, since tend not to be representative due to homogeneity of groups Report large number of demographic factors to see likeliness of representativeness

Purposive sampling Using personal judgment to select sample that should be representative (i.e., this faculty

seems to represent all teachers) OR selecting those who are known to have needed info (interested in talking only to those in power)

Snowball is a type (used with hard to identify groups such as addicts)

Chapter 6 - Continued

Sampling Sample size affects accuracy of representation

Larger sample means less chance of error Minimum is 30; upper limit is 1,000 (see table)

External validity – how well sample generalizes to the population Representative sample is required (not the same thing as variety in a sample) High participation rate is needed Multiple replications enhance generalization when nonrandom sampling is used Ecological generalization (gen to other settings/conditions, such as using a method tested in

math for English class)

Video 17 Chapter 7

Instrumentation (Measurement)

Data – information researchers obtain about subjects Demographic data are characteristics of subjects such as age, gender, education level, etc. Assessment data are scores on tests, observations, etc. (the device used to measure these is

called the measurement instrument)

Key questions in data measurement/ instrumentation Where and when will data be collected How often will data be collected Who will collect the data

Chapter 7 - Continued

Instrumentation Validity – measures what it is supposed to (accurate) Reliability – a measure that consistently gives same readings

(repeatable) Objectivity – absence of subjective judgments (need to eliminate

subjectivity in measuring) Usability of instruments Consider ease of administration; time to administer; clarity of directions; ease of

scoring; cost; reliability/validity data availability

Chapter 7 - Continued

Instrumentation (Classifying Data Collection Instruments)

By the group providing the data Researcher instruments (researchers observes student performance and records) Subject instruments (subjects record data about themselves, such as taking test)

Others/Informants (3rd party reports about subjects such as teacher rates students)

By where instrument came from Preference is for existing ones (www.ericae.net, MMY

Can develop your own (requires time, effort, skill, testing; see p. 125)

By response type Written response – preferred – objective tests, rating checklist

Performance instruments – measure procedure, product

Chapter 7 - Continued

Instrumentation (Examples of Data Collection Instruments)

Researcher Completed Instruments Rating scales (mark a place on a continuum for example numeric rating 1=poor to 5=

excellent) Interview schedules (complete scales as interview takes place; use precoding; beware of

dishonesty) Tally sheets (for counting/recording frequency of behavior, remarks, activities, etc.) Flow charts (to record interactions in a room) Anecdotal records (need to be specific and factual) Time/Motion logs (record what took place and when)

Chapter 7 - Continued

Instrumentation (Examples of Data Collection Instruments)

Subject Completed Instruments Questionnaires (question clarity to reader essential) Self checklists Attitude scales (Likert is one type, how much subject agrees/disagrees with descriptive

statements about a topic indicates a positive/negative attitude toward topic) Semantic differential (good/bad; poor/excellent ratings) Personality profiles Achievement/Aptitude tests Performance tests Projective devices (Rorschach Ink Blot Test) Sociometric devises (peer ratings)

Chapter 7 - Continued

Instrumentation

Item Formats Selection items or closed response (T/F; Yes/No; Right/Wrong; Multiple choice) Supply items or open ended (short answer; essay) Unobtrusive measures (no intrusion into event… usually direct observation and recording)

Types of Scores Raw scores (initial score or count obtained…w/out context)

Derived scores (raw scores translated to meaningful usage with standardized process) Age/Grade equivalence; Percentile ranks; Standard scores (how far a score is from a given reference point,

i.e. z and T scores);

Which to use depends on the purpose; usually standard scores used

Chapter 7 - Continued

Instrumentation

Norm Referenced v. Criterion Referenced Tests Norm referenced scores give a score relative to a reference group (the norm

group) Criterion referenced scores determine if a criterion has been mastered

These are used to improve instruction since they indicate what students can or cannot do or do or do not know

Chapter 7 - Continued

Instrumentation (Measurement Scales)

Nominal (in name only) Numbers are only name tags, they have no mathematical value (gender: 1=male and 2= female OR race: 1=

Blk, 2=Wht, 3=other)

Ordinal (in name, plus relative order) Numbers show relative position, but not quantity (grade level, finishing place in a race)

Interval (in name w/ order AND equal distance) Numbers show quantity in equal intervals, but an arbitrary zero (can have negative numbers; degrees C or F)

Ratio (in name, w/ order, eq. distance AND absolute zero) Numbers show quantity with base of zero where zero means the construct is absent

Higher levels more precise…collect data at highest level possible; some statistics only work with higher level data

Chapter 7 - Continued

Instrumentation (Preparing for Data Analysis)

Scoring data – use exact same format for each test and describe scoring method in text

Tabulating and Coding – carefully transfer data from source documents to computer Give each test an ID number Any words must be coded with numerical values Report codes in text of research report

Video 18 Chapter 8

Validity and Reliability

(Quality of instruments is important)

Validity is most important aspect of measures Means accuracy, correctness, usefulness of instrument Validation is the process of collecting and analyzing evidence to support

inferences based on an instrument Test publishers usually give a statement of intended use as well as evidence to

support validity

Reliability (consistency in scoring) is part of validity

Chapter 8 - Continued

Validity and Reliability (Three ways to establish validity)

Content validity – is entire content of construct covered by test, are important parts emphasized? Established by expert judgment Facial validity is part of this

Criterion validity – is there consistency between the instrument and some predicted or concurrent criterion? Established by empirical evidence using validity coefficient (-1 to +1 scores) Correlate scores of the test with the criterion (SAT and GPA in college)

Chapter 8 - Continued

Validity and Reliability (Three ways to establish validity)

Construct validity – Does the measure correctly identify those with different levels of the construct Established with empirical evidence Correlate scores on test with known indicator of the construct (prisoners score

low on test of ethics)

Validity problems come from systematic error (also known as bias…something the research did wrong)

Chapter 8 - Continued

Validity and Reliability

Reliability means that scores are consistent from one time measuring to the next

Can have a reliable measure that may not be valid Must be reliable to be valid See p. 166, target shooting

Errors of measurement – there is always some variation from measure

to measure Look at reliability coefficient to determine reliability

Chapter 8 - Continued

Validity and Reliability (Three ways to establish reliability)

Test/Retest – give the same test (of enduring trait) to the same people at two times and correlate the scores

Equivalent forms – give two parallel forms of a test to the same people and correlate scores

Internal consistency – several methods Split halves (score two halves of test and correlate scores) KR-21 and Cronbach Alpha – Correlate each item to overall score

Chapter 8 - Continued

Validity and Reliability

Standard Error of Measurement – variations in measurement result in some error which is reported

Scoring Agreement – for subjective tests or direct observations (check of internal reliability)

Validity and Reliability should be addressed in all research (including qualitative)

Chapter 9

Internal Validity

(The IV really caused a change in the DV) Threats Subject characteristics/selection bias – when subjects in study or in trmt/cont

groups differ from each other (on age, gender, ability, etc) Loss of subj/Mortality – must address question of whether those dropping out

are different than those not Location/Experiment variables – characteristics of the school, classroom, etc.

may be interfere with the cause/effect relationship (keep constant for both groups)

Chapter 9 - Continued

Internal Validity

(The IV really caused a change in the DV) Threats (continued) Instrumentation – need constant application and scoring of instruments

Instrument decay – when scoring varies due to fatique Data collector characteristics (age, gender, etc.) influence results) … use same collector or

randomly assn Data collector bias – unconscious or conscious distortion of data (use single or double blind

technique)

5. Testing – pretest sensitization can occur or subjects can figure out acceptable answers

Chapter 9 - Continued

Internal Validity

(The IV really caused a change in the DV) Threats (continued) History – an external occurrence that interferes with relationship between IV and

DV Maturation – changes in relationship between IV and DV due to passage of

time/growth of subj Attitudes of Subjects – Hawthorne or guinea pig effects, novelty effects and

demoralization may occur Regression (toward the mean) – Low scorers do better in subsequent tests; high

scorers do worse Implementation – experiment differs for groups

Chapter 9 - Continued

Internal Validity

(The IV really caused a change in the DV) How to minimize threats: Standardized conditions Collect and report demogr characteristics of subj Identify/report details of study Select a design to minimize effects (true randomized experimental designs are

best)

See page 189, Fig. 9.10 for threats summary

End Part 2

Chapter 13

Experimental Research

Most powerful design Used to establish cause and effect by manipulating (influencing) an IV

(independent variable, aka treatment or experimental variable) to see its effect on a DV (dependent variable (aka criterion or outcome variable)

Goes beyond description and prediction

Chapter 13 - Continued

Experimental Research (Characteristics of Experimental Research)

Comparison of groups (at least two groups of subjects, called treatment and control groups) Manipulation of the IV (experimenter changes something for the treatment group that’s different

than the control group) Randomization (true experiments require random assignment into treatment/control

conditions…after random selection of subjects to participate in study) Assignment takes place at start of experiment

Do not use already formed groups Groups should be equivalent (any differences due to chance)

Randomization eliminates threats from extraneous variables Groups must be sufficiently large to be equivalent

Chapter 13 - Continued

Experimental Research (Control of Extraneous Variables)

All extraneous variables must be controlled to eliminate threats to validity/rival hypotheses Ensure groups are equivalent to begin using randomization Hold certain variables constant (i.e. age, IQ) or build them into to the design Use matching when necessary Use subjects as their own controls (treat same group first in control condition then in

treatment OR use pre-test/posttest on same group) Use analysis of covariance to statistically equate unequivalent groups

Chapter 13 - Continued

Experimental Research (Group Designs)

Weak Designs One Shot Case Study (X O)

One group exposed to treatment then DV is measured

No controls Example: Try new teaching method then see how students do on post test

One Group Pretest-Posttest Design (O X O) Adds a pretest but no control group

Static-Group Comparison Design X1 O Need control for diff subj characteristics X2 O

Static Group Pretest/Posttest Design (adds a pretest)

Chapter 13 - Continued

Experimental Research (Group Designs)

True Experimental Designs Randomized Posttest Only Design R X1 O (random assign to trtmt/cntrl, then posttest) R O Randomized Pretest/Posttest Control Group R O X1 O (controls history, maturation, etc.) R O X2 O Randomized Solomon 4-Group Design combines the above two (eliminates testing threat;

problem is number of subjects needed) Random Assignment w/ Matching

Match pairs on factors that influence DV then randomly assign to treatment or control (subjects limited by no

match elimination)

Statistical matching can be done using predicted scores

Chapter 13 - Continued

Experimental Research (Group Designs)

Quasi Experimental Designs Matching only – different from random assignment w/ matching (uses existing groups)

Match subjects in trmt and cntrl groups on known extraneous variables If possible, use multiple groups, and randomly assign them

Counterbalanced – Each group exposed to all the same treatments but in different order Time series – Repeated treatments and observations over a period of time (both before and

after treatment) Factoral designs – Multiple IVs or DVs investigated simultaneously (i.e. look for interactions

between 2 IVs)

Chapter 13 - Continued

Experimental Research (Controlling Threats to Internal Validity)

See Table 13.1, p. 284 for advantage/disadv. of each design To evaluate the likelihood of a threat to internal validity in experiments ask:

What are the known extraneous factors? Do the groups differ on them? How were they controlled?

Researchers need tight control for experiments to be successful See pp. 288-289 questions to evaluate published article See evaluation of selected article on pp. 290-299

Chapter 15

Correlation Research (Predicting Outcomes Through Association)

Correlational research involves study of existing relationships between two variables Descriptive in nature Often a precursor to experimental research Positive correlation is Hi/Hi and Lo/Lo (coeff. +r) Negative correlation is Hi/Lo and Lo/Hi (-r)

Purpose is to explain relationships or to predict outcomes

Chapter 15 - continued

-Correlation Research (Predicting Outcomes Through Association)

Explanatory studies examine relationship to identify possible cause/effect Relationship might or MIGHT NOT mean causation For causation: 1) A before B; 2) A and B related; 3) Rule out other causes of B (need

experiment)

Prediction studies identify predictors of criterions (i.e. HS GPA and College GPA) Scatterplots with regression line/equation predicts scores numerically The stronger the correlation the better the prediction

Chapter 15 – continued

Correlation Research (Predicting Outcomes Through Association)

Complex Correlation Techniques, such as multiple regression allow use of several predictors for one criterion Coefficient of multiple correlation (R) gives strength of correlation between predictors and

criterion Coefficient of determination (r2) is amount x and y vary together Descriminant function analysis is for non-quantitative criterion (predict which group someone

will be in) Other techniques also used (factor analysis, path analysis, structural modeling)

Chapter 15 - continued

Correlation Research (Steps in the process)

Problem selection – usually it’s are x and y related or how well does p predict c Sample – random selection of at least 30 Measurement – need quantitative data Design/Procedures – need two measures on each subject

Data collection – usually both measures close in time Data analysis – correlation coefficient, r, and plot (r is -1 to +1, and the closer to

plus or minus 1, the stronger the relationship)

Chapter 15 - continued

Correlation Research (Interpreting Correlation Coefficients)

General guideslines: +.75 to +1.0 Very strong relationship +.50 to +.75 Moderate strong relationship +.25 to +.50 Weak relationship +.00 to +.25 Low to no relationship

Need .5 or better for prediction of any use, and .65 for accurate predictions

Reliability coefficients should be .7 up Validity coefficients should be .5 up

Chapter 15 - continued

Correlation Research (Threats to Internal Validity in Correlation Research)

Remember correlation is not causation (lurking variables) Subject characteristics – may get different correl w/ different ability levels, gender, etc. (can

control with partial correlation) Location – testing conditions can impact results Instrumentation problems – helps to standardize instrument and data collection for both groups Testing – pretest interference and sensitization possible Mortality – be careful if have large loss from one group being tested

Chapter 15 - continued

Correlation Research (Questions to ask to avoid threats to internal validity)

What factors could affect the variables being studied? Does any factor affect BOTH variables? (this is where threats occur) Figure a way to control any lurking variables

Chapter 16

Causal Comparative Research (Ex Post Facto)

Determines cause (or effect) that has occurred and looks for effect (or cause) from it Start w/ differences in groups and examine them Examples: Difference in math abilities of male/female stu

No random assignment to treatment (it already occurred) Associational like correlation but primarily interested in cause/effect IV either cannot (ethnicity) or should not (smoking) be manipulated

Chapter 16 - continued

Causal Comparative Research (Ex Post Facto)

Often an alternative to experimental (faster and cheaper) Serious limitation is lack of control over threats to internal validity Need to remember the cause may be the effect; they may only be

related and there is some other variable that is the cause (lurker) Remember three canons of causation

Chapter 16 - continued

Causal Comparative (CC) Research (CC versus Correlational Research)

Both are associational (looking for relationship) Both are often prelude to experiments Neither involves manipulation of variables CC works with different groups; correl examines one group on different

variables Correlation is measured w/ coefficient while CC compares

means/medians/percents of group members

Chapter 16 - continued

Causal Comparative (CC) Research (CC versus Experimental Research)

Both compare group scores of some type In experimental the IV is manipulated, but not in CC (already took place) CC does not provide as strong evidence as experimental for cause and

effect

Chapter 16 - continued

Causal Comparative (CC) Research (Steps in CC Research)

Problem formation – identify phenomena and look for causes or consequences of it Sometimes several alternate hypotheses investigated

Sample – define (operationally) characteristics of study carefully, then select individuals who possess Groups should be homogeneous in regard to several important variables (to control for them as causes) then

match control/exp groups on one or more variables (smoking study matched on 19 variables)

Instruments – use any type to compare the groups Design – basic CC involves 2 or more grps that differ on variable of interest (basic design is one

group possesses trait (athlete) other doesn’t compare DV (GPA)

Chapter 16 - continued

Causal Comparative (CC) Research (Threats to Internal Validity in CC Research)

Subject characteristics – since don’t select subjects and form groups, there may be unidentified lurking variables Can use matching to control for any identified differences, but limits samples size Can find or create homogeneous groups (for example compare only high GPA students to

other high GPA students) on attitudes toward x Statistical matching – adjusts posttest scores based on some initial difference

Other threats – location, instrument, history, maturation, loss of subjects can be concerns Need to control as many as possible to eliminate alternate hypotheses

Chapter 16 - continued

Causal Comparative (CC) Research (Evaluating threats to Internal Validity in CC Research)

Questions to ask What factors are known to affect the variable being studied? What is the likelihood the comparison groups differ on these factors? How well did the design identify and control for these?

For example consider subject characteristics such as socioeconomic status, gender, ethnicity, job skills; mortality rates in groups; location (schools differ); instrument (differrent data collectors and/ or biases)

Data Analysis in CC – often compare means of groups; with 2 categorical use crosstabs (crossbreak tables) to compare percents by groups

Text gives example study

Chapter 17

Survey Research (Used to describe what people think/do/believe)

Types Cross sectional provide a snapshot in time Longitudinal collect data at different points in time to study changes over time

Trend study - random sample each year on same topic Cohort study - sample from same cohort members year after year Panel study - same individuals surveyed year after year (mortality a problem over long time

periods)

Often surveys are the data collection instrument in correlation (or cc/exp’l) studies

Chapter 17 - Continued

Survey Research (Steps to conduct survey research)

Define the problem Needs to be important enough respondents will invest their time to complete it Must be based on clear objectives

Identify the target population Defined by sample unit or unit of analysis Unit can be a person, school, classroom, district, etc.) Survey a sample or do a census of the population

Chapter 17 - Continued

Survey Research (Steps to conduct survey research)

Methods of data collection Direct administration to a group (such as at a meeting) - good response rate, limited

generaliz. Mail survey (inexpensive way to get large amount of data from widespread pop) - lower

response rates, not in-depth info, illiterate missed Telephone survey (cheap/fast) - response rates higher due to encouragement (“I’m not

selling…”); miss some pop members, interviewer bias possible Personal interviews (face-to-face has good response rate but time and cost high) - lack

anonymity, interviewer bias

Chapter 17 - Continued

Survey Research (Steps to conduct survey research)

Select the sample (randomly, but check to see respondents are qualified to answer) Pilot test can indicate likely response rate and problems with data collection or

sample

Prepare instrument (questionnaire and interview schedule) Appearance important - look short and easy Clarity in questions is essential

Chapter 17 - Continued

Survey Research (Steps to conduct survey research)

Question types (same questions need to be asked of all respondents) Closed ended (multiple choice) - easier to complete, score, analyze

Categories must be all inclusive, mutually exclusive

Open ended - easy to write, hard to analyze and hard on respondents See examples p. 403

Chapter 10

Descriptive Statistics (Tools to summarize data)

Descriptive statistics describe many scores with just one or two indices (such as mean or median) Sample of a pop is described w/ indices called statistics Entire pop is described w/ indices called parameters

Types of data (words or numbers) Quantitative data – scales measure how much (test scores, amount of money spent, etc.

Interval, Ratio, and sometimes Ordinal, variables

Categorical data – total number of objects in a category (ethnicity, gender, etc.) Nominal and sometimes Ordinal, variables

Chapter 10 - Continued

Descriptive Statistics (Summarizing Quantitative Data)

Frequency distributions or tables show the layout of the data (see text example p. 201) Frequency polygons – shows where most scores are and how spread out data

are Pay attention to shape (positive, negative skews) Normal curves – smoothed polygons – most scores in the center, fewer in the tails – many

variables follow a normal shape (height, weight, age, etc.) Normal curves are the foundation for inferential statistics

Chapter 10 - Continued

Descriptive Statistics (Summarizing Quantitative Data)

Averages – measures of of central tendency Three indices tell what is a typical score

Mode – most frequent score Median – middle score (50th percent)

Mean – takes into account all scores

Which to use depends on what you are trying to show See example pp. 205/206

Spreads – measures of variation or dispersion Three indices tell how closely scores cluster together

Range (highest – lowest); a crude indicator of spread

Standard deviation (average distance of each point from the mean) Smaller SD means less spread out, larger one means more spread out

Quartiles, percents, IQR, boxplots

SD and normal curves…68/95/99.7 rule

Chapter 10 - Continued

Descriptive Statistics (Summarizing Quantitative Data)

Standard scores and the normal curve Standard scores use a common scale for all scores z scores are simplest – tell how far from the mean in SD units

Score on mean then z=0; score 1 SD above then z=1.0; 1SD below then z=-1.0, etc. Use mean and SD to calculate z scores so you can compare apples/oranges (p. 210) Z = any score – mean

standard deviation

Chapter 10 - Continued

Descriptive Statistics (Summarizing Quantitative Data)

Probability based on z scores All scores in normal distribution are equal to 100% A z-table gives percent of scores from any score to the mean (Appendix, pp. A-4/5) The probability for getting higher or lower than any given score can then be calculated

T-scores are often used because negative z scores awkward (all T-scores are positive) Multiply z times 10, then add 50 (p. 212 Table 10.15) Standard test scores often given with T-scores and percents above/below the given score

Note…use z and T scores only with NORMAL distributions!

Chapter 10 - Continued

Descriptive Statistics (Summarizing Quantitative Data)

Correlation examines relationships between two quantitative variables (interval/ratio data) Scatterplot shows the relationship visually

Use it to check for pattern in data (hi/hi or hi/lo?) If linear pattern, can us Pearson’s r coefficient

Use it to look for strength (scatteredness) Pay attention to outliers (p. 215/216 examples)

Correlation coefficient is a numerical indicator or strength of the relationship Pearson’s ppm (r) is for linear data (-1 to +1) Eta is for curved data

Chapter 10 - Continued

Descriptive Statistics (Summarizing Categorical Data)

Frequency tables Give percents for ease in interpreting

Crossbreak or crosstabulations for relationships (IV goes on the side, then give row percents)

Bar charts and pie charts used Bars for ordered categories Pies for unordered categories

Chapter 11

Inferential Statistics

Inferences about a population based on data from a sample Answers questions about how likely a sample is to represent some

parameter about a population Inferential test used depends on the level of data (quantitative or

categorical) Chapter 11 - Continued

Inferential Statistics (The logic of inferential statistics)

Sampling error Samples differ from their parent populations (no two samples are the same) Difference is called sampling error

Distribution of sampling means (the sampling distribution) Large collections of random samples of at least 30 follow a normal curve pattern Its mean (mean of means) is the mean of the population Its SD (SD of means) is the standard error of the mean (SEM)

Chapter 11 - Continued

Inferential Statistics (The logic of inferential statistics)

Standard error of the mean (SEM) It’s the SD of the sampling distribution Since distribution is normal, then +1SEM has 68% of cases; +2SEM has 95%; +3SEM has

99.7% Once we can estimate the mean and SD of the sampling distribution can determine how likely it is that a

particular sample mean came from that population i.e. Mean of pop=100, SD=10 and draw a sample with a mean of 110, yes could be from that pop…but if

draw a sample with a mean of 140, most likely NOT from that pop…since is +4SEM from the mean (almost zero probability)

Express means as z scores; a z score move that 2SEM is going to occur less than 5% of the time (2.5% each side)

Chapter 11 - Continued

Inferential Statistics (The logic of inferential statistics)

Estimating the SEM It is estimated from the SD of the sample, adjusted for sample size: SEM=SD/√n-1

Confidence Intervals (CI) Use the SEM to indicate boundaries 95% of the time a pop mean will be within +2 SEM from the sample mean (actually + 1.96

SEM) If sample mean IQ=85 (& SEM=2) then 95% of the time the pop mean IQ will be 85+1.96(2)

or 85 +3.92 which is 81.08 to 88.92; 99% CI=79.84 to 90.16 Can be 95% confident that true pop mean is 81.08-88.92

Chapter 11 - Continued

Inferential Statistics (The logic of inferential statistics)

Probability is a predicted occurrence such as 5 in 100 times (5% or .05) In previous example, the probability of the population mean being outside the 95% CI (of

81.08 to 88.92) is 5%

Usually comparing more than one mean Examine difference in 2 sample means to see if how likely the difference in the sample is to

represent a true difference in the population…is it due to a true difference in the pop or only due to sampling error

The SEM of the difference between sample means, called the SED or standard error of the difference is used and w/in +1SED is 68%; +2 SED is 95%; +3 SED is 99%

Chapter 11 - Continued

Inferential Statistics (Hypothesis Testing)

A hypothesis is a predicted relationship Usually comparing means, proportions, or looking for correlations between

groups The heart of infer. stats…is the relationship found in the sample most likely due

to a relationship in the pop, or just due to random sampling error?

The null hypothesis is stated and tested THE NULL ALWAYS SAYS THERE IS NNOO RELATIONSHIP OR

DIFFERENCE!!! Chapter 11 - Continued

Inferential Statistics (Hypothesis Testing)

Research hypothesis is what you really think is going on; opposite of the null Example of hypothesis test

H0 (null) is that mean1=mean2, meaning the mean scores are equal OR the difference between the mean scores is 0

The distribution for a difference of zero between the means is a normal curve centered on zero

As diff between means gets larger, meaning further from the center (in SEM units), the more likely it is to represent a true diff in the pop means

If the prob is .05 or less, reject null…called a statistically significant difference (some fields use .01 or .001)

Chapter 11 - Continued

Inferential Statistics (Hypothesis Testing Process)

State the research hypothesis (Ha or Hr) State the null (H0) (Remember NO) Obtain the sample statistics (means, proportions, correlations) Determine the probability of getting the sample results just by chance if the null is true Small probability (p<.05) means reject null; there is a significant difference (or correlation) in

pop. Large probability (p>.05) means do not reject; there is no significant difference (or correl) in

pop. Note: Just because finding is statistically significant does not mean it is a practical difference

(given a large enough sample most are significant)

Chapter 11 - Continued

Inferential Statistics (Hypothesis Testing)

One tailed versus two tailed tests When literature strongly indicates the need for directional hypothesis then do a

one-tail In a one tail all 5% is on one side (2-tailed cutoff is 1.96SD while 1 tailed cutoff

is 1.65)

Type I (alpha) versus Type II error See Figure 11.16, p. 240 Type I – reject true null; Type II – accept a false Inversely related errors

Chapter 11 - Continued

Inferential Statistics (Inference Techniques)

Parametric tests (for quantitative I/R data from normal distributions of sample size 30+) t-tests compare means of two groups (can be independent or correlated/paired samples) ANOVA tests compare means of two or more groups (use post hoc)

Correlations t-test (with computers just use significance of r)

Nonparametric tests (for categorical data and I/R from non-normal pops or small samples) Mann Whitney U compares ranks of two groups

Kruskal Wallis Oneway ANOVA compares ranks of two plus groups Chi-square test (compares proportions)

Power of tests – use parametrics and increase sample size

Chapter 12

Statistics in Perspective

Approaches to research Either 2 or more groups compared OR variables in 1 group studied AND data are either

categorical or quantitative

Comparing groups on quantitative data Can compare freq distributions (histograms), m. of center, and m. of spread OR all three Interpretation – improves with experience…need to know when something statistically

significant is not practically significant Calculate effect size - look at size of difference or delta Δ…if it is greater than .5, practically

significant Use infer. stats judicially paying attention to size of diff. and sample size and method it is

based on

Chapter 12 - continued

Statistics in Perspective

Relating variables within group w/ quant data Scatterplot and correl coeff – examine plot carefully Beyond significance pay attn to size of r and especially to r-squared Examine how sample data collected

Comparing groups w/ categorical data Use freq and percent in crossbreak tables Look at summary stats carefully and pay attn to sample size

Relating variables within a group with categorical data – use one sample chi-square

Chapter 12 - continued

Statistics in Perspective

Recap Use graphics and numbers Pay attention to outliers Pay attention to magnitude of differences Use inference tests for generalizing purposes and examine sampling Use multiple techniques and CIs