- Home
- Documents
*Computing in Archaeology Session 11. Correlation and regression analysis © Richard Haddlesey*

Click here to load reader

prev

next

of 63

View

217Download

0

Tags:

Embed Size (px)

- Slide 1

Computing in Archaeology Session 11. Correlation and regression analysis Richard Haddlesey www.medievalarchitecture.net Slide 2 Lecture aims To introduce correlation and regression techniques To introduce correlation and regression techniques Slide 3 The scattergram In correlation, we are always dealing with paired scores, and so values of the two variables taken together will be used to make a scattergram In correlation, we are always dealing with paired scores, and so values of the two variables taken together will be used to make a scattergram Slide 4 example Quantities of New Forrest pottery recovered from sites at varying distances from the kilns Quantities of New Forrest pottery recovered from sites at varying distances from the kilns Site Distance (km) Quantity 1498 22060 33241 43447 52462 Slide 5 Negative correlation Here we can see that the quantity of pottery decreases as distance from the source increases Slide 6 Positive correlation Here we see that the taller a pot, the wider the rim Slide 7 Curvilinear monotonic relation Again the further from source, the less quantity of artefacts Slide 8 Arched relationship (non-monotonic) Here we see the first molar increases with age and is then worn down as the animal gets older Slide 9 Slide 10 scattergram This shows us that scattergrams are the most important means of studying relationships between two variables This shows us that scattergrams are the most important means of studying relationships between two variables Slide 11 REGRESSION Regression differs from other techniques we have looked at so far in that it is concerned not just with whether or not a relationship exists, or the strength of that relationship, but with its nature Regression differs from other techniques we have looked at so far in that it is concerned not just with whether or not a relationship exists, or the strength of that relationship, but with its nature In regression analysis we use an independent variable to estimate (or predict) the values of a dependent variable In regression analysis we use an independent variable to estimate (or predict) the values of a dependent variable Slide 12 Regression equation y = f(x) y = y axis (in this case the dependent y = y axis (in this case the dependent f = function (of x) f = function (of x) x = x axis x = x axis Slide 13 y = f(x) y = x y = 2x y = x 2 Slide 14 Slide 15 General linear equations y = a + bx y = a + bx Where y is the dependent variable, x is the independent variable, and the coefficients a and b are constants, i.e. they are fixed for a given data Where y is the dependent variable, x is the independent variable, and the coefficients a and b are constants, i.e. they are fixed for a given data Slide 16 Therefore: If x = 0 then the equation reduces to y = a, so a represents the point where the regression line crosses the y axis (the intercept) If x = 0 then the equation reduces to y = a, so a represents the point where the regression line crosses the y axis (the intercept) The b constant defines the slope of gradient of the regression line The b constant defines the slope of gradient of the regression line Thus for the pottery quantity in relation to distance from source, b represents the amount of decrease in pottery quantity from the source Thus for the pottery quantity in relation to distance from source, b represents the amount of decrease in pottery quantity from the source Slide 17 y = a + bx Slide 18 Slide 19 Slide 20 least-squares Slide 21 Slide 22 Slide 23 Slide 24 y = a + bx Slide 25 Slide 26 y = 102.64 1.8x Slide 27 Slide 28 Slide 29 CORRELATION Slide 30 1 correlation coefficient Slide 31 CORRELATION 1 correlation coefficient 2 significance Slide 32 CORRELATION 1 correlation coefficient r 2 significance Slide 33 CORRELATION 1 correlation coefficient r -1 to +1 2 significance Slide 34 Slide 35 nominal in name only ordinal forming a sequence interval a sequence with fixed distances ratio fixed distances with a datum point Levels of measurement: Slide 36 nominal ordinal interval ratio Levels of measurement: Slide 37 nominal ordinal interval Product-Moment Correlation Coefficient ratio Levels of measurement: Slide 38 nominal ordinal Spearmans Rank Correlation Coefficient interval ratio Levels of measurement: Slide 39 Slide 40 The Product-Moment Correlation Coefficient Slide 41 length (cm) width (cm) sample 20 bronze spearheads n=20 Slide 42 length (cm) width (cm) r = nxy (x)(y) g [nx 2 (x) 2 ] [ny 2 (y) 2 ] n=20 Slide 43 r = nxy (x)(y) g [nx 2 (x) 2 ] [ny 2 (y) 2 ] n=20 Slide 44 r = nxy (x)(y) g [nx 2 (x) 2 ] [ny 2 (y) 2 ] n=20 Slide 45 r = nxy (x)(y) g= +0.67 [nx 2 (x) 2 ] [ny 2 (y) 2 ] n=20 Slide 46 Test of product moment correlation coefficient Slide 47 H 0 : true correlation coefficient = 0 Slide 48 Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Slide 49 Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables approximately random Slide 50 Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables approximately random Sample statistics needed: n and r Slide 51 Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables approximately random Sample statistics needed: n and r Test statistic: TS = r Slide 52 Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables approximately random Sample statistics needed: n and r Test statistic: TS = r Table: product moment correlation coefficient table. Slide 53 Slide 54 n = 20 Slide 55 n = 20 r = 0.67 p