Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average...

Analysis of Individual Variables

• Descriptive – – Measures of Central Tendency

• Mean – Average score of distribution (1st moment)• Median – Middle score (50th percentile) of distribution

– Measures of Variation (used to measure the range of the distribution relative to the measures of central tendency)

• Range – Distance between lowest and highest data point• Mean Deviation – Average distance between Mean and data

points • Variance – Sum of Squared distance from mean (2nd moment)• Standard Deviation – Square root of variance

Analysis of Individual Variables

Obs Income1 20.502 31.503 47.704 26.205 44.006 8.287 30.808 17.209 19.90 Mean 31.28

10 9.96 Median 25.7011 55.80 Variance 500.6812 25.20 Stdev 22.3813 29.0014 85.5015 15.1016 28.5017 21.4018 17.7019 6.4220 84.90

Analysis of Relationship among Variables

• Correlation• Regression

– Two Variable Models– Multiple Variable Models– Discrete Dependent Variable Models

Scatter Plot of Money Supply Growth and Inflation

Correlation

• A scatter plot is a graph that shows the relationship between the observations for two data series in two dimensions

• Correlation analysis expresses this numerically– In contrast to a scatter plot, which graphically depicts the

relationship between two data series, correlation analysis expresses this same relationship using a single number

– The correlation coefficient is a measure of how closely related two data series are

– The correlation coefficient measures the linear association between two variables

Correlation

• Determine association between 2 variables • Measured on a scale from +1 to -1

– values close to +1.0 indicates strong positive relationship

– values close to -1.0 indicates strong negative relationship

– values close to 0 indicates little or no relationship

+1 0 -1

Variables with Perfect Positive Correlation

Variables with Perfect Negative Correlation

Variables with a Correlation of 0

Variables with a Non-Linear Association

Calculating correlations

• The sample correlation coefficient ‘r’ is,

YYXXYXCov

YXCovr

))((),(

• E.g.: Is it true that higher education leads to higher compensation?– To answer this question, we need to look at the data and

calculate correlation

Years of Education

Compensation (000)

17.97 163.3022.86 142.0517.25 100.0013.35 103.5514.97 90.0015.87 97.5013.17 90.0011.1 80.00

13.86 90.258.97 49.50

• The sample correlation coefficient ‘r’ is,

Y of average

X, of average

calculate toneed weso

))((),(

YYXXYXCov

YXCovr

Calculating correlationsYears of

Ed. Comp (000) (X-XBar)2 (Y-YBar)2 (X-XBar)(Y-YBar)17.97 163.30 9.20 3929.41 190.1222.86 142.05 62.77 1716.86 328.2917.25 100.00 5.35 0.38 -1.4213.35 103.55 2.52 8.61 -4.6614.97 90.00 0.00 112.68 -0.3515.87 97.50 0.87 9.70 -2.9113.17 90.00 3.12 112.68 18.7611.10 80.00 14.72 424.98 79.1013.86 90.25 1.16 107.43 11.168.97 49.50 35.61 2612.74 305.00

Sums: 135.32 9035.48 923.10:

XBar 14.94YBar 100.62n -1 9.00

Covariance 102.57

SX 3.88

SY 31.69

r 0.83

Calculations

Calculating correlations (EXCEL)

Years of Ed.

Comp (000)

17.97 163.3022.86 142.0517.25 100.0013.35 103.5514.97 90.0015.87 97.5013.17 90.0011.10 80.0013.86 90.258.97 49.50

Correlation =CORREL(array1, array2)Correlation 0.83

Correlation Matrix

US Eqt UK US FI Japan Korea Mexico China HK S'pore IndiaUS Eqt 1.00

UK 0.27 1.00US FI -0.13 -0.27 1.00Japan 0.20 -0.15 0.08 1.00Korea -0.13 -0.17 0.28 -0.01 1.00

Mexico -0.10 0.28 -0.35 -0.38 -0.01 1.00China 0.17 -0.12 0.29 0.09 0.19 0.00 1.00

HK 0.22 0.24 -0.38 -0.23 -0.55 0.32 -0.08 1.00S'pore 0.52 0.24 0.00 0.08 -0.02 0.30 0.35 -0.01 1.00India 0.30 0.57 0.17 -0.12 -0.11 -0.17 0.24 0.01 0.35 1.00

Correlations Among Stock Return Series

Regression

• Most times its not enough to just say whether 2 variables are correlated– we would like to define a relationship between the two variables– E.g. when the economy grows 1%, how much will the S&P500

increase

• To do this, we use a technique of Regression

Regression

• How the term Regression came to be applied to the subject of statistical models.

• 19th century scientist, Sir Francis Galton, studying human subjects found in all things "regression toward mediocrity”– E.g. If your parents are very smart, you are likely to

be significantly less smart - so its really not your fault!!

Regression

• In modern times, when we talk of Regression analysis, we make an implicit assumption of a ‘mean’ relationship between variables and we try to determine that relationship.

• Regression analysis is concerned with –– the study of the dependence of one variable (the dependent

variable) – on one or more other variables (the explanatory variables) – with a view to estimating and/or predicting the mean or

average value of the former – in terms of the fixed values of the latter.

Two Variable Regression Model

• Regression analysis is concerned with relationship of 2 variables, say ‘y’ and ‘x’ and can be written as –

– All this means is that the value of ‘y’ is a function of the value of ‘x’– Another way of saying it is that ‘y’ doesn’t independently get its

value, but somehow depends on ‘x’ to get its value– Thus y can so how be derived from ‘x’– Thus ‘y’ is a dependent variable and ‘x’ is an independent variable

• Regression is thus, the study of a relationship between the dependent and independent variables

)( ii xfy

Regressionx y

1.0 2.01.5 3.02.0 4.02.5 5.03.0 6.03.5 7.04.0 8.04.5 9.05.0 10.05.5 11.06.0 12.06.5 13.07.0 14.07.5 15.025 ?

2 where,*)(

50,25 if so , *2

25 when x y, is what :Q

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5

Regression

303*33

202*22

10 when x y3, y2, y1, are what :Q

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5

x y1 y2 y30.0 0.0 0.0 0.00.5 0.5 1.0 1.51.0 1.0 2.0 3.01.5 1.5 3.0 4.52.0 2.0 4.0 6.02.5 2.5 5.0 7.53.0 3.0 6.0 9.03.5 3.5 7.0 10.54.0 4.0 8.0 12.04.5 4.5 9.0 13.55.0 5.0 10.0 15.05.5 5.5 11.0 16.56.0 6.0 12.0 18.06.5 6.5 13.0 19.57.0 7.0 14.0 21.07.5 7.5 15.0 22.510

Regression

2 1, where, *)(

21,10 if so

10 when x y, is what :Q

x y10.0 1.00.5 2.01.0 3.01.5 4.02.0 5.02.5 6.03.0 7.03.5 8.04.0 9.04.5 10.05.0 11.05.5 12.06.0 13.06.5 14.07.0 15.07.5 16.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5

• Regression analysis is concerned with –– the study of a relationship between the dependent and

independent variables

– In reality, we can are estimating a relationship, so we can calculate the value of a random variable

)( ii xfy

• Real data from which we estimate relationship is never very good because we deal with random variables– What we end up having is some thing like this

– What we try to do in regression is estimates the “Line of Best Fit”, so that we can come up with this equation

– This is also the equation of line, so this form of regression is called a ‘Linear regression”

errorxy ii

y = 0.841+0.3909x

R2 = 0.7247

2.002.202.402.602.803.003.203.403.603.804.00

2.00 3.00 4.00 5.00 6.00 7.00 8.00

• Regression Model – Equation of a Line

• Terminology – ‘y’– Dependent Variable, or– Left-Hand Side Variable, or– Explained Variable, or

iii xy

• Terminology – ‘x’– Independent Variable, or– Right-Hand Side Variable, or– Explanatory Variable, or– Regressor, Covariate, Control Variable

• Terminology – ‘’– Error– Disturbance

iii xy

• Terminology – – ‘’ - Intercept– ‘’ – Slope– ‘’ - error

Assumptions of the Linear Regression Model

• The relationship between the dependent variable, Y, and the independent variable, X is linear

• The independent variable, X, is not random• About the error –

– The expected value (remember average) of the error term is 0– The error term is normally distributed– The variance of the error term is the same for all observations– The error term is uncorrelated across observations

Regression Relationship estimation

• The model is estimated by the “Least Squares Estimation” method

• Inferences from Regression can be made about– Model - how well does the specified model perform, i.e., are

the specified independent variables, taken together a good predictor of the dependent variable (R2)

– Independent Variables – The contribution of each independent variable in predicting the dependent variable (hypothesis test)

Inferences from Regression

iii xy 11

Model power

variationTotal

variationdUnexplaine1

variationTotal

variationexp variationTotal

variation2

lainedUnToal

ExplainedR

Inference about Model

• Coeff. of Determination (R2)

• So, higher the R2 – better model (Yes? That would be too easy!)

x1-xm)

(x1, y1)

SST SSE

SSRSST

SSESSRSST

Inference about Model

• If the model is correctly specified, R2 is an ideal measure

• Addition of a variable to a regression will increase the R2 (by construction)

• This fact can be exploited to get regressions with R2 ~ 100% by addition of variables, but this doesn’t mean that the model is any good

• Adj-R2 should be reported

Inference about Parameters

• Coefficients are estimated with a confidence interval• To know if a specific independent variable (xi) is

influential in predicting the dependent variable (y), we test whether the corresponding coefficient is statistically different from 0 (i.e. i = 0).

• We do so by calculating the t-statistic for the coefficient

• If the t-stat is sufficient large, it indicates that bi is significantly different from 0 indicating that i * xi plays a role in determining y

Inference about parameters

• We can test to see if the slope coefficient is significant by using a t-test.

In Excel

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.405156042R Square 0.164151419Adjusted R Square 0.149740236Standard Error 0.05350165Observations 60

ANOVAdf SS MS F Significance F

Regression 1 0.032604637 0.032604637 11.39055864 0.001321732Residual 58 0.166020739 0.002862427Total 59 0.198625377

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -9.72076E-05 0.007438982 -0.01306732 0.98961893 -0.014987948 0.014793533X Variable 1 0.939398568 0.278341127 3.374990169 0.001321732 0.382238272 1.496558865

Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average...

Documents

4.8:Mean, Median, Mode & Range Measures of Central Tendency

Measures of Central Tendency: Mean, Mode, Median By Chandrappa

CENTRAL TENDENCY: MEAN, MEDIAN, MODE - University of

UNDERSTANDING · Descriptive Statistics—Central Tendency, 51 Mean, 52 Median, 52 Mode, 54 Using Excel1 and SPSS1 to Understand Central Tendency, 56 Excel1,56 SPSS1,58 Distributions,

Central tendency mean median modus

2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

Section 3.1 Measures of Central Tendency: Mode, Median, and Mean

Central Tendency & Dispersion Types of Distributions: Normal, Skewed Central Tendency: Mean, Median, Mode Dispersion: Variance, Standard Deviation This

Working with one variable data. Measures of Central Tendency In statistics, the three most commonly used measures of central tendency are: Mean Median

3. Summarizing Distributions...A. Central Tendency 1. What is Central Tendency 2. Measures of Central Tendency 3. Median and Mean 4. Additional Measures 5. Comparing measures B. Variability

Central Tendency and Variability Chapter 4. Central Tendency >Mean: arithmetic average Add up all scores, divide by number of scores >Median: middle score

© aSup-2007 Central Tendency 1 CENTRAL TENDENCY Mean, Median, and Mode

AP PSYCHOLOGY REVIEW RESOURCESMeasure of central tendency Mean Skewed Median Mode Regression to the mean Range Standard Deviation Z-score ... cephalocaudal proximal distal Placenta

CENTRAL TENDENCY: MEAN, MEDIAN, MODE

CENTRAL TENDENCY: MEAN, MEDIAN, MODEpages.ucsd.edu/~phsmith/ps30/POLI30_Session5_2008.pdf · Central Tendency Mean or “average” value: ... •Mean, median, and mode converge with

Review Measures of Central Tendency –Mean, median, mode Measures of Variation –Variance, standard deviation

Measures of Central Tendency: Mean, Mode, Median

CENTRAL TENDENCY: Mean, Median, Mode...Central Tendency • Measures of Central Tendency: – Mean • The sum of all scores divided by the number of scores. – Median • The score

Measures of Central Tendency: Mean, Median, Mode & Range · 2020-05-04 · Measures of Central Tendency: Mean, Median, Mode & Range MEAN: The average of a group of data points. To

Central Tendency Mean, Median, Mode, Range, Outlier