26
Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Embed Size (px)

Citation preview

Page 1: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Warsaw Summer School 2011, OSU Study Abroad Program

Fundamentals of Research DesignData organization

Page 2: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Why the Social Researcher Uses Statistics

• The Nature of Social Research• Why Test Hypotheses? • The Stages of Social Research• Using Series of Numbers to Do Social Research• Functions of Statistics

Page 3: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

The Nature of Social Research

. Social Researchers Test hypotheses

• A hypothesis is a prediction about the relationship between variables. It is usually based upon theoretical expectations about how things work.

• At minimum any hypothesis involves two variables: an independent variable (IV) and a dependent variable (DV).

• X Y

• What are independent and dependent variables? Presumed cause and effect notion.

Page 4: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Why Do We Test Hypotheses?

• Hypothesis testing is a foundation of science.

• In statistical inference, hypotheses generally take one of the two forms: substantive and null.

• A substantive hypothesis represents an actual expectation. E.g.: higher education increases the likelihood of upward mobility.)

• To decide whether a substantive hypothesis is supported by the evidence it is necessary to test a related hypothesis called the null hypothesis. (E.g.: education has no effect on upward mobility.)

Page 5: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Hypothesis Testing

• Testing hypotheses involving: • (a) Standard-comparisons: Is the mean value in our

sample the same as in the population? Is the observed distribution of X a normal distribution? IV: standard/non-standard

• (b) Group-comparisons: Do men earn (on the average) more than women? Is the variance in the GPA greater among sociology students than among mathematics students? IV: one group vs. another

• (c) Assessment of correlation: Are diet preferences (X1) associated with political preferences (X2)? IV: X1 or X2, undetermined

• (d) Impact assessment: To what extend does education (X) influence earnings (Y)? IV: X

Page 6: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

The Stages of Social Research

• 1) Specify research goals. What you want to investigate and why?

• 2) Review the literature. Place your question in the context• 3) Formulate hypotheses. Provide a theoretical model (a

set of propositions). Chose variables and specify hypotheses.

• 4) Measure and record. (A) Define population and select sample. B) Develop instruments. C) Collect data.

• 5) Analyze the data: Test hypotheses. Draw conclusions• 6) Invite scrutiny. Make decisions about the fit of data and

theory. Results are communicated to an audience.(Confirm or reject your initial theory)

Page 7: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Statistics

• Quantitative methods in sociology involve statistics - a set of mathematical techniques used to organize, analyze, and interpret information that takes the form of numbers.

• Full measurement procedure specifies values for each variable across all observation units.

• Data (plural) are specific measurements across all observation units.

• A datum (singular) is a single measurement (= variable value).

Page 8: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

A Framework for Statistical Work

Units of observation/analysis (cases):

- individuals, households, families, groups

- communities, cities

- countries, regions

Variables: data characterizing units of observation

Page 9: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Matrix form of data, Xij, i = unit of analysis (1,…,N), j = variable (1,…,K)

CasesVariables

Age

j =1

Gender

j = 2

Education

j =3

… PoliticalPartyj = k

i =1 (john) 21 0 15 … 1

i =2 (jil) 27 1 16 … 2

i =3 (beth) 18 1 12 … 0

i =4 (alicia) 23 1 16 … 1

i =5 (josh) 34 0 21 … 2

…. … … … … …

i = N - 1 17 0 11 … 2

i = N (last person) 36 1 17 … 3

Page 10: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Labels

• Variable names vs. variable labels (values)

• The variable name is a mnemonic. The variable name is a descriptive phrase, usually only a few words long that captures the essence of what the variable is about.

Page 11: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Labels

Assigning value labels

• Continuous variables usually do not need value labels. Examples: income, results of complicated tests, age, year in the labor force.

STATA researchers’ advise:

• If a variable has limited number of values k (k < 10), it is better to label them all or at least a subset, independently of the level of measurement.

Page 12: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Missing Values

Meaning:

• Lack of information.

• Erroneous information.

• Non-interpretable information.

Don’t know as a special category – Is this missing datum?

Page 13: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Surveys as Data Sources for Hypotheses Testing

A survey is a method of gathering information from a sample of a population, through:

- written questionnaires that respondents fill out themselves;

- face-to-face interviews,

- telephone interviews,

- e-mail

Page 14: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Functions of Statistics

• - to describe the data

• - to infer from a sample about the population

Page 15: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Levels of Measurement

The level of measurement of a variable refers to the type of information that the numbers assigned to units of observation contain.

Four levels of measurement:• nominal (categorical; discrete) • ordinal (rank-order)• interval (distance)• ratio (zero-reference)

Page 16: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Level of Measurement: Nominal

Nominal• Names are assigned to units of observation as labels. The

name in the set is the "value" For practical data processing the names are numerals, but in

that case the numerical values is irrelevant. Nominal Variables (qualitative):

observations consist of separate categories that are labeled (we cannot order them).

Page 17: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Level of Measurement: Nominal

Ex: • “Dummy” (dichotomous) variables; • Religious affiliation (1, 2, 3, 4, 5 where 1 = Catholic, 2 =

Protestant, 3 = Jewish, 4= Muslim, 5 = Other)• Party affiliation• Social Class• Gender (0, 1 – where 0 = female; 1 = male)

Page 18: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Level of Measurement: Ordinal

• The numbers assigned to objects represent the rank order (1st, 2nd, 3rd, n-th) of the entities measured. The numbers are called ordinals. The variables are called ordinal variables or rank variables. Comparisons of greater and less can be made, in addition to equality and inequality. However, operations such as conventional addition and subtraction are still meaningless.

Page 19: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Level of Measurement: Interval

• The numbers assigned to objects have all the features of ordinal measurements, and the differences between arbitrary pairs of values can be meaningfully compared. Operations such as addition and subtraction are therefore meaningful.

• The zero point on the scale is arbitrary; negative values can be used.

• Ratios between numbers on the scale are not meaningful, so operations such as multiplication and division cannot be carried out directly.

Page 20: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Levels of Measurement: Ratio

• The numbers assigned to objects have all the features of interval measurement and also have meaningful ratios between arbitrary pairs of numbers. Operations such as multiplication and division are therefore meaningful. The zero value on a ratio scale is non-arbitrary. Variables measured at the ratio level are called ratio variables.

Page 21: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization
Page 22: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Level of measurement and the purpose of the study

• Example 2: Education1. Elementary, 2. Some High School, 3. High School Completed, 4. Community College, 5. Liberal Arts College Incomplete, 6. College Completed, 7. Above College

• Nominal Scale (Labels? See 4, 5)• Ordinal Scale? • Interval? After recoding into years of schooling: 1=8, 2=10,

3=12, 4=14, 5=14, 6=16, 7=18

Page 23: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Specifying Levels of Measurement

• STATA distinguishes between the scale level (that is numerical: interval and ratio) from ordinal and nominal.

Page 24: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Recoding

Recoding into “metric variables”

• Any nominal variable can be recoded into a set of 0,1 variables, called also dummies.

• Ordinal variables can be recoded into interval variables, if

(a) ranks are interpretable as having a property of equal distances between them (e.g. Likert scale);

(b) we can assign some know values to the ranks (e.g. years of schooling);

(c) we can derive the values from the distribution properties (e.g. mid-points of the cumulative distribution).

Page 25: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Measurement

The measure should be:• Valid• Reliable• Exhaustive • Mutually Exclusive

Validity refers to the extent to which an empirical measure adequately reflects the real meaning of the concept under consideration.

Reliability refers to the likelihood that a given measurement procedure will yield the same description of a given phenomenon if that measurement is repeated.Reliability is the consistency of measurement.

Page 26: Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

Measurement

A good measuring device must meet the condition of exhausting the possibilities of what it is intended to measure.

Mutually exclusive means that each observation fits one and only one of the scale values (categories).

__________________________________________Missing values. Lack of information. Erroneous information.

Non-interpretable information__________________________________________