70
INFO 515 Lecture #2 1 Action Research Measurement Scales and Descriptive Statistics INFO 515 Glenn Booker

Action Research Measurement Scales and Descriptive Statistics

  • Upload
    zenia

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Action Research Measurement Scales and Descriptive Statistics. INFO 515 Glenn Booker. Measurement Needs. Need a long set of measurements for one project, and/or many projects to examine statistical trends Could use measurements to test specific hypotheses - PowerPoint PPT Presentation

Citation preview

Page 1: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 1

Action ResearchMeasurement Scales and

Descriptive Statistics

INFO 515Glenn Booker

Page 2: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 2

Measurement Needs Need a long set of measurements for one

project, and/or many projects to examine statistical trends

Could use measurements to test specific hypotheses

Other realistic uses of measurement are to help make decisions and track progress

Need scales to make measurements!

Page 3: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 3

Measurement Scales There are four types of measurement

scales Nominal Ordinal Interval Ratio

Completely optional mnemonic: to remember the sequence, I think of ‘NOIR’ like in the expression ‘film noir’ (‘noir’ is French for ‘black’)

Page 4: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 4

Nominal Scale A nominal (“name”) scale groups or

classifies things into categories, which: Must be jointly exhaustive (cover everything) Must be mutually exclusive (one thing can’t

be in two categories at once) Are in any sequence (none better or worse)

So a nominal variable is putting things into buckets which have no inherant order to them

Page 5: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 5

Nominal Scale Examples include

Gender (though some would dispute limitations of only male/female categories)

Dewey decimal system The Library of Congress system Academic majors Makes of stuff (cars, computers, etc.) Parts of a system

Page 6: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 6

Ordinal Scale This measurement ranks things in order Sequence is important, but the intervals

between ranks is not defined numerically Rank is relative, such as “greater than” or

“less than” E.g. letter grades, urgency of problems,

class rank, inspection ratings So now the buckets we’re using have

some sense or order or direction

Page 7: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 7

Interval Scale An interval scale measures quantitative

differences, not just relative Addition and subtraction are allowed E.g. common temperature scales (°F or C),

a single date (Feb 15, 1999), maybe IQ scores Let me know if you find any more examples

A zero point, if any, is arbitrary (90 °F is *not* six times hotter than 15 °F!)

Page 8: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 8

Ratio Scale A ratio scale is an interval scale with a

non-arbitrary zero point Allows division and multiplication The “best” type of scale to use, if possible E.g. defect rates for software, test scores,

absolute temperature (Kelvin or Rankine), the number or count of almost anything, size, speed, length, …

Page 9: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 9

Summary of Scales Nominal

names different categories, not ordered, not ranked: Male, Female, Republican, Catholic..

Ordinal Categories are ordered: Low, High, Sometimes, Never,

Interval Fixed intervals, no absolute zero: IQ, Temperature

Ratio Fixed intervals with an absolute zero point: Age, Income, Years of

Schooling, Hours/Week, Weight Age could be measured as ratio (years), ordinal (young,

middle, old), or nominal (baby boomer, gen X) Scale of measurement affects (may determine) type of

statistics that you can use to analyze the data

Page 10: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 10

Scale Hierarchy Measurement scales are hierarchical:

ratio (best) / interval / ordinal / nominal Lower level scales can always be derived

from data which uses a higher scale E.g. defect rates (a ratio scale) could be

converted to {High, Medium, Low} or {Acceptable, Not Acceptable} (ordinal scales)

Page 11: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 11

Reexamine Central Tendencies If data are nominal, only the mode is

meaningful If data are ordinal, both median and mode

may be used If data are ratio or interval (called “scale”

in SPSS), you may use mean, median, and mode

Page 12: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 12

Reexamine Variables Discrete variables use counting units or

specific categories Example: makes of cars, grades, … Use Nominal or Ordinal scales

Continuous = Integer or Real Measurements Example: IQ Test scores, length of a table,

your weight, etc. Use Ratio or Interval scales

Page 13: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 13

Refine Research Types Qualitative Research tends to use Nominal

and/or Ordinal scale variables Quantitative Research tends to use

Interval and/or Ratio scale variables

Page 14: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 14

Frequency distributions describe how many times each value occurs in a data set

They are useful for understanding the characteristics of a data set

Frequencies are the count of how many times each possible value appears for a variable (gender = male, or operating system = Windows 2000)

Frequency Distributions

Page 15: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 15

They are most useful when there is a fixed and relatively small number of options for that variable

They’re harder to use for variables which are numbers (either real or integer) unless there are only a few specific options allowed (e.g. test responses 1 to 5 for a multiple choice question)

Frequency Distributions

Page 16: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 16

Generating Frequency Distributions Select the command Analyze /

Descriptive Statistics / Frequencies… Select one or more “Variable(s):” Note that the Frequency (count) and

percent are included by default; other outputs may be selected under the “Statistics...” button A bar chart can be generated as well using

the “Charts…” button; see another way later

Page 17: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 17

Sample Frequency OutputEDUCATIONAL LEVEL

53 11.2 11.2 11.2

190 40.1 40.1 51.3

6 1.3 1.3 52.5

116 24.5 24.5 77.0

59 12.4 12.4 89.5

11 2.3 2.3 91.8

9 1.9 1.9 93.7

27 5.7 5.7 99.4

2 .4 .4 99.8

1 .2 .2 100.0

474 100.0 100.0

8

12

14

15

16

17

18

19

20

21

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Page 18: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 18

Analysis of Frequency Output The first, unlabeled column has the values of data

– here, it first lists all Valid values (there are no Invalid ones, or it would show those too)

The Frequency column is how many times that value appears in the data set

The Percent column is the percent of cases with that value; in the fourth row, the value 15 appears 116 times, which is 24.5% of the 474 total cases (116/474*100 = 24.5%)

Page 19: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 19

Analysis of Frequency Output The Valid Percent column divides each

Frequency by the total number of Valid cases (= Percent column if all cases valid)

The Cumulative Percent adds up the Valid Percent values going down the rows; so the first entry is the Valid Percent for first row, the second entry is from 11.2 + 40.1 = 51.3%, next is 51.3 + 1.3 = 52.5% and so on

Round-off error

Page 20: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 20

Generating Frequency Graphs Frequency is often shown using a

bar graph Bar graphs help make small amounts of

data more visible To generate a frequency graph alone

Click on the Charts menu and select “Bar…” Leave the “Simple” graph selected, and leave

“Summaries are for groups of cases” selected; click the “Define” button

Page 21: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 21

Generating Frequency Graphs

Let the Bars Represent remain “N of cases” Click on variable “Educational Level (years)”

and move it into the Category Axis field Click “OK” You should get the graph on the next slide.

Notice that the text below the X axis is the Label for the Category Axis.

Page 22: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 22

Sample Frequency Output

Notice that the exact same graph can be generated from Frequencies, or just as a bar graph

Page 23: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 23

Frequency Distributions A frequency distribution is a tabulation

that indicates the number of times a score or group of scores occurs

Bar charts best used to graph frequency of nominal & ordinal data

Histograms best used to display shape of interval & ratio data

Page 24: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 24

Employment Category

Employment Category

ManagerCustodialClerical

Fre

qu

en

cy

400

300

200

100

0

SPSS for Windows, Student Version

Frequency Distribution Example

Employment Category

363 76.6 76.6 76.6

27 5.7 5.7 82.3

84 17.7 17.7 100.0

474 100.0 100.0

Clerical

Custodial

Manager

Total

ValidFrequency Percent

ValidPercent

CumulativePercent

Page 25: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 25

Basic Measures - Ratio Used for two exclusive populations

(every case fits into one OR the other) Ratio = (# of testers) /

(# of developers) E.g. tester to developer ratio is 1:4

Page 26: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 26

Proportions and Fractions Used for multiple (> 2) populations Proportion = (Number of this population) /

(Total number of all populations) Sum of all proportions equals unity (one)

E.g. survey results Proportions are based on integer units Fractions are based on real numbered

units

Page 27: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 27

Percentage A proportion or fraction multiplied by 100

becomes a percentage Only report percentages when N (total

population measured) is above ~30 to 50; and always provide N for completeness Why? Otherwise a percentage will imply

more accuracy than the data supports If 2 out of 3 people like something, it’s misleading

to report that 66.667% favor it

Page 28: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 28

Percents Percent = the percentage of cases having

a particular value. Raw percent = divide the frequency of

the value by the total number of cases (including missing values)

Valid percent = calculated as above but excluding missing values

Page 29: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 29

Percent Change The percent increase in a measurement is

the new value, minus the old one, divided by the old value; negative means decrease:% increase = (new - old) / old

The percent change is the absolute value of the percent increase or decrease:% change = | % increase |

Page 30: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 30

Percent Increase Later Value – Earlier Value

Earlier Value So if a collection goes from 50,000

volumes in 1965 to 150,000 in 1975, the percent increase is:

150,000-50,000 = 2 = 200% 50,000

Always divide by where you started

Carpenter and Vasu, (1978)

Page 31: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 31

Percentiles A percentile is the point in a distribution at

or below a given percentage of scores. The median is the 50% percentile Think of the SAT scores - what percentile

were you for verbal, math, etc. - means what percent of people did worse than you

Page 32: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 32

Rate Rate conveys the change in a

measurement, such as over time, dx/dt. Rate = (# observed events) / (# of opportunities)*constant

Rate requires exposure to the risk being measured

E.g. defects per KSLOC (1000 lines of code) = (# defects)/(# of KSLOC)*1000

Page 33: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 33

Exponential Notation You might see output of the form

+2.78E-12 The ‘E’ means ‘times ten to the power of’

This is +2.78 * 10-12 (+2.78*10**-12) A negative exponent, e.g. –12, makes it a very

small number 10-12 = 0.000000000001 10+12 = 1,000,000,000,000

The leading number, here +2.78, controls whether it is a positive or negative number

Page 34: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 34

Exponential Notation

0

+5*10**+12 (a positive number >>1)

+5*10**-12 (a positive number <<1)-5*10**-12 (a negative number <<1)

-5*10**+12 (a negative number >>1)

Pos.

Neg.

Page 35: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 35

Precision Keep your final output to a consistent level

of precision (significant digits) Don’t report one value as “12” and another

as “11.86257523454574123” Pick a level of precision to match the

accuracy of your inputs (or one digit more), and make sure everything is reported that way consistently (e.g. 12.0 and 11.9)

Page 36: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 36

Data Analysis Raw data is collected, such as the dates a

particular problem was reported and closed

Refined data is extracted from raw data, e.g. the time it took a problem to be resolved

Derived data is produced by analyzing refined data, such as the average time to resolve problems

Page 37: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 37

Descriptive statistics describes the key characteristics of one set of data (univariate) Mean, median, mode, range (see also

last week) Standard deviation, variance Skewness Kurtosis Coefficient of variation

Descriptive Statistics

Page 38: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 38

Mean A.k.a.: Average Score The mean is the arithmetic average of the

scores in a distribution Add all of the scores Divide by the total number of scores

The mean is greatly influenced by extreme scores; they pull it off center

Page 39: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 39

HOLDINGS IN 7 DIFFERENT LIBRARIES

X Mean = X N

7400 6500 39200 = 56006200 75900 51004300 Here, sum every data value3800

X= 39200

Mean Calculation

Page 40: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 40

X (IQ) F=Freq FX = F*X140 2 280135 1 135132 2 264130 1 130128 1 128126 1 126125 4 500123 1 123120 4 480110 3 330101 1 101

21 2597

Mean = ∑FX = 2597 = 123.67 = 124 (round off) N 21N = F

Mean with a Frequency Distribution

Page 41: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 41

Staff Salaries $4100 6000 6000 Mode = $6000 6000 8000 Median = 9 + 1 = 5th value = $8000 9000 21000011000 Mean = ∑X = 80100 = $890020000 N 9

Carpenter and Vasu, (1978)

Central Tendency Example

Page 42: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 42

Handling Extreme Values In cases where you have an extreme value

(high or low) in a distribution, it is helpful to report both the median and the mean

Reporting both values gives some indication (through comparison) of a skewed distribution

Page 43: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 43

Measures of Variation Measures which indicate the variation,

or spread of scores in a distribution Range (see last week) Variance Standard Deviation

Page 44: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 44

Standard Deviation, Variance Standard deviation is the average amount

the data differs from the mean (average)SD = ( (Xi-X)**2 / (N-1) )SD = ( Variance )

Variance is the standard deviation squaredVariance = (Xi-X)**2 / (N-1)

[per ISO 3534-1, para 2.33 and 2.34]

Page 45: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 45

Standard Deviation The standard deviation is the square root

of the variance. It is expressed in the same units as the original data.

Since the variance was expressed “squared units” it doesn’t make much practical sense. For example, what are “squared books” or “squared man-hours?”

Page 46: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 46

Computing the VarianceS2 = ∑(X – Mean)2

N 1. Subtract the mean from each score

2. Square the result

3. Sum the squares for all data points

4. Divide by the N of cases

Page 47: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 47

Divide by N or N-1??? You’ll see different formulas for variance

and standard deviation – some divide by N, some by N-1 (e.g. slides 43 and 45); why? If your data covers the entire population (you

have all of the possible data to analyze), then divide by N

If your data covers a sample from the population, divide by N-1

Page 48: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 48

X F FX X2 FX2

17 2 34 289 57816 4 64 256 102414 5 70 196 98010 2 20 100 200 9 3 27 81 243 6 1 6 36 36

221 3061

σ = √ (∑FX2 – (∑FX)2/N) = √ (3061- (221)2/17) N 17

= √ ((3061- 2873)/17) = 3.3

Notice that FX2 is F*(X2), not (F*X)2

Standard Deviation of Bookmobile Distribution

Standard Deviation for Freq Dist.

Page 49: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 49

Distance from Target FrequencyIn Meters Battery A Battery B 200 2 0 150 4 1 100 5 5 50 7 10 0 9 13 -50 7 10-100 5 5-150 4 1-200 2 0

Mean =0 Mean =0Standard D. = Standard D. =102.74 65.83

Runyon and Haber (1984)

Std Dev Reflects Consistency

Page 50: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 50

Standard Deviation vs. Std. Error To be precise, the standard error is the

standard deviation of a statistic used to estimate a population parameter [per ISO 3534-1, para 2.56 and 2.50]

So standard error pertains to sample data, while standard deviation should describe the entire population

We often use them interchangeably

Page 51: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 51

Skewness is a measure of the asymmetry of a distribution. The normal distribution is symmetric,

and has a skewness value of zero. A distribution with a significant positive

skewness has a long right tail Positive skewness means the mean and

median are more positive than the mode (the peak of the distribution)

Negative skewness has a long left tail.

Skewness

Page 52: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 52

Skewness As a rough guide, a skewness magnitude

more than two (>2 or <-2) is taken to indicate a significant departure from symmetry

From www.riskglossary.com

Positive skewness Negative skewness

Both curves have same mean and standard deviation.

Page 53: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 53

Kurtosis is a measure of the extent to which data clusters around a central point For a normal distribution, the value of the

kurtosis is 3 The kurtosis excess (= kurtosis-3) is zero

for a normal distribution Positive kurtosis excess indicates that the data

have longer tails than “normal” Negative kurtosis excess indicates the data

have shorter tails

Kurtosis

Page 54: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 54

Kurtosis

The curve on the right has higher kurtosis than the curve on the left. It is more peaked at the center, and it has fatter tails. If a distribution’s kurtosis is greater than 3, it is said to be leptokurtic (sharp peak). If its kurtosis is less than 3, it is said to be platykurtic (flat peak). They might have equal standard deviation.

Mesokurtic is the “normal” curve, which has kurtosis = 3.From www.riskglossary.com

Platykurtic Leptokurtic

tail

Page 55: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 55

Skewness & Kurtosis Example From the Employee data set, use Analyze /

Descriptive Statistics / Descriptives, select the ‘salary’ variable; Under Options…, select Skewness and Kurtosis

Skewness is 2.125, so there is significant positive skewness to the data

Kurtosis is 5.378, so the data is leptokurtic

Page 56: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 56

Coefficient of Variation The coefficient of variation (CV) is the ratio

of the standard deviation to the mean:CV = [per ISO 3534-1, para 2.35]

Smaller CV means the more representative the mean is for the total distribution

Can compare means and standard deviations of two different populations Higher CV means more variability

Page 57: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 57

Coefficient of Variation Divide the standard deviation by the mean

to get CV. CV = The smaller the decimal fraction this

produces, the more representative is the mean for the total distribution

The larger the decimal fraction, the worse job the mean does of giving us a true picture of the distribution

Page 58: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 58

Frequency graphs can be generated for variables which have many integer or real values (e.g. salary), by using a histogram

A histogram shows how many data points fall into various ranges of values

The closest “normal” curve can be shown for comparison

Generating a Histogram

Page 59: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 59

Generating a Histogram The “¾ rule” is helpful for histograms

The tallest bar should be ¾ of the height of the Y axis

Be sure to label X and Y axes appropriately The each bar shows how many data points

fall within a range of X axis values See How to Lie with Statistics, by Darrell Huff

Page 60: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 60

Histogram of Salary

CURRENT SALARY

54000.0

50000.0

46000.0

42000.0

38000.0

34000.0

30000.0

26000.0

22000.0

18000.0

14000.0

10000.0

6000.0

CURRENT SALARYF

req

ue

ncy

140

120

100

80

60

40

20

0

Std. Dev = 6830.26

Mean = 13767.8

N = 474.00

Page 61: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 61

Another Note on Histograms SPSS will define its own bar widths for a

histogram, e.g. how wide the range of salary values is for each bar

Later in the course, we’ll look at how you can define your own variables to make predefined histograms bars

Page 62: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 62

A histogram can also be made in the shape of a pie

This should be limited to variables with a small number of possible values

Pie Chart Histogram

Page 63: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 63

A *bad* pie chart histogramCURRENT SALARY

15660

15540

15480

15420

15360

15120

15060

15000

14820

14640

14460

14400

14280

14220

14100

14040

10140

10080

10020

9960

9900

9840

9780

9720

9660

9600

9540

9480

9420

9360

9300

9240

9180

(I had to include this one just because it’s

colorful)

Page 64: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 64

This is a better example:EDUCATIONAL LEVEL

21

20

19

18

17

16

15

14

12

8

This visually implies the percentages of data in each value.

Page 65: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 65

Case/Bookmobile

Value of Var.No. of Stops

XNo. of Stops

F No. ofBookmobiles

A 6 17 2B 9 16 4C 10 14 5D 14 10 2E 16 9 3F 17 6 1G 14H 16 N = 17I 14J 10K 9L 14M 14N 16O 9P 17Q 16

Bookmobile examples taken from Carpenter and Vasu, (1978)Same data as used on slides 48 & 66.

Bookmobile Data

Page 66: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 66

Bookmobile Distributions

Stops f % CF CF C% 17 2 11.8 17 2 100 16 4 23.5 15 6 88 14 5 29.4 11 11 64 10 2 11.8 6 13 35 9 3 17.6 4 16 23 6 1 5.8 1 17 6

Cumulative freq adding down

Cumulative freq adding up

Percent cumulative freq counting down

Page 67: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 67

Number of Bookmobile Stops

17.515.012.510.07.55.0

10

8

6

4

2

0

Std. Dev = 3.43

Mean = 13.0

N = 17.00

HISTOGRAM OF BOOKMOBILE STOPS

F

Page 68: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 68

Some data sets are not very close to a normal distribution

Sometimes it helps to transform the independent variable by applying a math function to it, such as looking at log(x) (the logarithm of each x value) instead of just x

Normalizing Data

Page 69: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 69

Normalizing Data In SPSS this can be done by defining a new

variable, such as “log_x” Then use Transform / Compute to

calculatelog_x = LG10(x)assuming that ‘x’ is the original

variable Then generate a histogram showing the

normal curve, to see if log_x is closer to a normal distribution

Page 70: Action Research Measurement Scales and Descriptive Statistics

INFO 515 Lecture #2 70

Who cares if we have a normal distribution?

Many tests in statistics can only be applied to a variable which has a normal distribution – so it’s worth our while to transform the variable

Normalizing Data