12
Regression Analysis: Lab 1 Introduction on SPSS Scatterplots, Correlation and Regression Significance [email protected] Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays https:// www.uts.edu.au/future-students/science/student-experience/maths-study -centre www.khanacademy.org www.mahritaharahap.wordpress.com/teachingareas/regression START>ALL PROGRAMS>IBM SPSS>IBM SPSS STATISTICS 19

[email protected] Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Embed Size (px)

Citation preview

Page 1: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Regression Analysis: Lab 1Introduction on SPSS

Scatterplots, Correlation and Regression [email protected]

Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays https://www.uts.edu.au/future-students/science/student-experience/maths-study-centre

www.khanacademy.orgwww.mahritaharahap.wordpress.com/teachingareas/regression

START>ALL PROGRAMS>IBM SPSS>IBM SPSS STATISTICS 19

Page 2: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

SPSSStatistics involves a large amount of computational

work and statistical software packages are used to do the ‘grunt’ work. Finding the mean of 200 observations by hand is very boring!

In Regression Analysis, we will be using SPSS You will need to use SPSS for labs, and be able to interpret SPSS output for exams.A computer software package used for analyzing

statistics and data. There are other programs we can use such as SAS and Minitab.

Page 3: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Lab 0 – Introduction to SPSSData editor has two sheets. Data view sheet is used to

entering and editing data. Variable View is used for information about the variables.

Summary Statistics of a numerical variable: ANALYZE>DESCRIPTIVE STATISTICS>DESCRIPTIVES

Extended Summary Statistics of a numerical variable: ANALYZE>DESCRIPTIVE STATISTICS>EXPLORE

Information on a categorical variable: ANALYZE>DESCRIPTIVE STATISTICS>FREQUENCIES>CHARTS

Comparing Means:ANALYZE>COMPARE MEANS>INDEPENDENT-SAMPLES T TEST>DEFINE GROUPS

Page 4: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Hypothesis Testing In statistics, a hypothesis is a statement about a population parameter.The null hypothesis, denoted H0 is a statement or claim about a

population parameter that is initially assumed to be true. Is always an equality. (Eg. H0: ρ=0)

The alternative hypothesis, denoted by H1 is the competing claim. (Eg. H1: ρ≠0)

Test Statistic: a measure of compatibility between the statement in the null hypothesis and the data obtained.

Decision Criteria: determine whether the test statistic falls in the rejection region. The P-value is the probability of obtaining a test statistic as extreme or more extreme ( < and/or > ) than the observed sample value given H0 is true.If p-value≤α reject Ho

If p-value>α do not reject Ho

Or just compare your test statistic with the critical value.Conclusion:Make your conclusion in context of the problem.

Page 5: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Comparing two means:Two-independent Samples T-testIf we want to test if there is evidence that the

unknown population means are significantly different from each other.

If p-value≤α reject Ho. Therefore we can conclude that the population means are significantly different from each other.If p-value>α do not reject Ho . Therefore we cannot conclude that the population means are significantly different from each other.

Page 6: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Lab 1:Linear RegressionThe least squares line, also called the line of best fit, is the line which minimises the sum of the squared residuals. It characterises the relationship between two numerical variables. The linear regression line equation is based on the equation of a line in mathematics.

a+bX

Page 7: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

X:Predictor VariableExplanatory VariableIndependent VariableVariable one can control.

Y:Response VariableDependent VariableThe outcome to be measured.

a+bX

Page 8: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Hypothesis Test for Correlation Coefficient

H0: ρ=0

H1: ρ≠0 (or ρ≠ρ*, ρ<ρ* or ρ>ρ*)

Test Statistic:~tn-2

If p-value≤α, reject H0. We conclude that the correlation is significantly different from zero.

If p-value>α, do not reject H0. We conclude that the correlation is not significantly different from zero.

Correlation is not significant.Correlation is significant.

Correlation measures the strength of the linear association between two variables.

Page 9: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Interpretations of slope and interceptINTERPRETATION OF THE SLOPE

The slope ‘b’ represents the predicted change in the response variable y given a one unit increase in the explanatory variable x.

INTERPRETATION OF THE INTERCEPTThe intercept ‘a’ represents the predicted value of the response variable y if the explanatory variable x is zero. The interpretation may be nonsensical since it is often not reasonable for the explanatory variable to be zero. As “x” is zero, the response variable is ….. If zero is not in the given sample x range then the intercept cannot be interpreted because 0 is outside of the sample range. Avoid trying to apply a regression line to predict values far from those that were used to create it. Do we know the relationship between the two variables at this level of temperature?

Page 10: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Confidence Interval

Page 11: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

Hypothesis Test for Regression Significance (Simple Linear Regression +)H0: β=0. There is no association between the response variable and the

independent variable. (Regression is insignificant) E[y|x]= α + 0*X

H1: β≠0. The independent variables will affect the response variable.

(Regression is significant) E[y|x]= α + βXTest Statistic:

α+βX

bs

bt

Page 12: Mahrita.Harahap@uts.edu.au Maths Study Centre CB01.16.11 Open 11am – 5pm Semester Weekdays

g)

h)

iii yye ˆ

Residual