28
A’ LEVEL GEOGRAPHY GEO-STATISTICAL ANALYSIS AND PRESENTATION Prepared for MEGTA 2017 Annual Conference By Amos Tendai Munzara (0773 245 970) 1.0 Levels of Measurement Measurement is done on variables. A variable is a characteristic under study which assumes different values for different elements of a population or sample. The levels of measurement are: nominal data, consists solely of names or labels e.g farming regions, classification of settlements rural or urban. ordinal data, consists of observations that are ranked in terms of size or importance e.g classification of settlements as A1or A2; or as high, medium or low density suburbs. Magnitudes of differences between ranks are not clear cut. interval data, measured on a scale that does not have a meaningful zero; the zero is assigned arbitrarily hence ratios of interval scale values have no meaning e.g temperature. A temperature of 0 o C does not imply the absence of heat. A temperature of 60 o C is not twice as hot as a temperature of 30 o C. ratio data, has a meaningful zero and the ratios of measurements are consistent irrespective of units of Page 1 of 28

downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

A’ LEVEL GEOGRAPHY

GEO-STATISTICAL ANALYSIS AND PRESENTATION

Prepared for MEGTA 2017 Annual Conference

By Amos Tendai Munzara (0773 245 970)

1.0 Levels of Measurement

Measurement is done on variables. A variable is a characteristic under study which assumes

different values for different elements of a population or sample.

The levels of measurement are:

nominal data, consists solely of names or labels e.g farming regions, classification of

settlements rural or urban.

ordinal data, consists of observations that are ranked in terms of size or importance

e.g classification of settlements as A1or A2; or as high, medium or low density

suburbs. Magnitudes of differences between ranks are not clear cut.

interval data, measured on a scale that does not have a meaningful zero; the zero is

assigned arbitrarily hence ratios of interval scale values have no meaning e.g

temperature. A temperature of 0oC does not imply the absence of heat. A temperature

of 60oC is not twice as hot as a temperature of 30oC.

ratio data, has a meaningful zero and the ratios of measurements are consistent

irrespective of units of measurement. The absolute zero is uniquely defined. For

example, if the distance between two points is 0cm, it clearly implies the two points

are not separated from each other. A distance of 20km is twice as long as a distance of

10km. Other good examples are amount of rainfall, population data, migration data.

cyclic data, data consisting of directions or times in which the measurement scale is

cyclic e.g after 3590 comes 00, after 31 December comes 1 January.

Range of statistical analyses possible on each data type:

Nominal and ordinal data; arithmetic operations of +, -, x and / not possible

Interval and ratio; arithmetic operations of +, -, x and / and also measures of central tendency

and dispersion are possible and meaningful.

Page

1 of 20

Page 2: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

2.0 Univariate Statistics

Univariate Statistics refers to statistics pertaining to one variable.

2.1 Measures of central tendency

These are averages (mean, mode, median) which locate the centre of a data set and are known

collectively as measures of central tendency.

The mode of a data set is that value with the highest frequency, that is, the value that occur the greatest number of times.

The median is the middle observation in an ordered data set. The data is usually ordered by

arranging values in ascending order. Let the ordered values of a data set be y1 , y2 , y3 …,yn

where n is the number of observations. The median is given by

a. y n+1

2 if n is odd

b.

12( y n

2

+ y n+22

)if n is even.

To get the mean of you have to find the sum of all the observations and then divide the sum

by the number of observations.

The population mean of a set of N measurements is given by the formula:

where are the observations, is the population size.

The symbol is called sigma and it means add.

Page

2 of 20

μ= 1N ∑

i=1

N

xi

Nx1 , x2 ,. .. , xN

Page 3: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

The sample mean is obtained using the formula:

where are the observations and n is the sample size.

Example 1

The amount of rainfall (in mm) recorded at 15 different locations in the country on a

particular day are:

44 59 35 41 46 25 47 60 54 46 35 46 41 34 19

Find the (a) mode, (b) median and (c) mean of the amount of rainfall received on that

particular day.

Solution 1

(a) Mode = 46 mm

(b) Median

Rearranging: 19 25 34 35 35 41 41 44 46 46 46 47 54 59 60

Since n = 15 is odd, median corresponds to

y n+12

= y15+12

= y162

= y8=44

Page

3 of 20

x=1n∑i=1

n

x i

x1 , x2 ,. .. , xn

1N ∑

i=1

N

x i

Page 4: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

(c) mean =

= 115

(632)

= 42.13333333

= 42.13mm

A choice often has to be made on which measure of location to use between the mode,

median and mean. When choosing a suitable average to use, the following factors are

considered:

The type of data being dealt with. The mode is suitable when you are dealing with

discrete data made up of a relatively few different values. It is not suitable for

continuous data

The shape of the distribution. The mean is affected by outliers (extremely high or

low values) while the median is not affected. The median is therefore appropriate if

you have a skewed distribution whilst the mean is suitable for a near-symmetrical

distribution.

Use further statistical analysis. The mean has a more extensive role within Statistics

compared to the median and the mode, and therefore it is preferred where further

statistical analysis is needed e.g confidence interval estimation and hypothesis testing.

Page

4 of 20

Page 5: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

2.2 Measures of dispersion

The range gives a simple indicator of the variability of a set of observations. The range of a

set of observations is the difference between the largest observation and the smallest

observation.

Range = highest observed value – lowest observed value

Although it is very easy to use and understand, the range is not a reliable way of measuring

the spread of data because it is only based on only two observations which are the highest and

lowest values. If one of these two values is an outlier, then the spread of data is rather

exaggerated.

The variance and standard deviation allow us to avoid the shortcomings of the range as a

measure of dispersion because they take into account all the observations in the data set as

opposed to just selecting a few.

The variance of a set of data is the average squared deviation of the data points from their

mean. Computationally, the variance ( ) of a sample of n observations is

obtained by the formula:

Page

5 of 20

x1 , x2 ,. .. , xns2

Page 6: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

The formula for population variance is given by:

σ 2= 1N (∑i=1

N

x i2− 1

N (∑i=1

N

x i)2)

The standard deviation of a set of observations is the positive square root of the variance of

the set.

Example 2

Find the (a) range and (b) standard deviation for the data in Example 1.

Solution 2

(a) Range = 60 – 19 = 41mm

(b) Standard deviation

σ 2= 1N (∑i=1

N

x i2− 1

N (∑i=1

N

x i)2)

= 115

[28 444−(632)2

15]

Page

6 of 20

s2= 1n−1 (∑i=1

n

x i2−1

n (∑i=1

n

xi)2)

σ 2

Page 7: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

= 115

[28 444−26 628 .26667 ]

= 115

[1815 . 733333 ]

= 121.0488889

Standard deviation =

= 11.00mm

2.3 Hypothesis testing

A hypothesis is a claim or assertion made concerning a population parameter. A hypothesis

test is a test used to verify whether the claim is likely to be true or false. In this section, you

will be introduced to tests concerning the mean of a single population.

Hypothesis testing involves gathering evidence from a random sample drawn from the

population of interest in order to decide whether the null hypothesis is likely to be true or

false. The hypothesis is rejected if evidence from the sample is not consistent with the stated

hypothesis, otherwise it is accepted. However, the acceptance of the stated hypothesis does

not necessarily imply that it is true, rather it is a result of insufficient evidence to reject it.

2.3.1 Types of hypotheses

There are two types of hypotheses which are called the:

null hypothesis, and

Page

7 of 20

√121. 0488889

Page 8: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

alternative hypothesis

A null hypothesis is an assertion about the value of a population parameter. It is a formal

statement of the claim being made concerning a population measure. The null hypothesis is

denoted by H0.

The alternative hypothesis, denoted by H1, is the negation of the null hypothesis. For example,

a null hypothesis might assert that the population mean is equal to a specified valueμ0 . We

write this asH0 : μ=μ0 . The alternative hypothesis oppose this assertion and it is written as

H 1 : μ≠μ0 . In this case, the alternative hypothesis suggests that the mean takes values that

are either below μ0 or above it. Therefore, to investigate H0 we conduct a non-directional test

which is known as a two-tailed test.

A null hypothesis might assert that the population mean is at least equal to a certain specified

valueμ0 . We writeH 0 : μ≥μ0 . In this case, the alternative hypothesis would consist of

values belowμ0 , that is, H1 : μ<μ0 . Similarly, if a null hypothesis assert that the population

mean is less than or equal to a specified valueμ0 , that is, H 0 : μ≤μ0 , the alternative will be

H1 : μ>μ0 . In both these cases, since the alternative hypotheses consist of values either

below or above the specified valueμ0 , we conduct a one-sided test or a one-tailed test.

2.3.2 Type I and Type II Errors

In deciding to reject or accept a null hypothesis, there will be chances for erroneously

rejecting or accepting it. Such errors may be due to faulty sampling procedures.

A type I error is committed when a true null hypothesis is rejected. The probability of

committing a type I error is called the level of significance and it is denoted byα . It is

common to use 1%, 5% and 10% level of significance in calculations.

A type II error is committed if we accept the null hypothesis when it is false. The probability

that the test will be able to detect a false null hypothesis is called the power of a test. In other

words, the power of a test is the probability of rejecting H0 when indeed H0 is false.

Page

8 of 20

Page 9: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

2.3.3 Steps Followed in Hypothesis Testing

The following steps should be followed when conducting a hypothesis test:

Step 1: State the Null and Alternative Hypothesis

The null and alternative hypotheses are specified at this initial stage before gathering any

evidence. It would be unethical and rather manipulative to formulate the H0 and H1 at one’s

convenience after gathering evidence; a practise that we refer to as data snooping.

Step 2: Identify the Distribution

Choose between two distributions namely the z-distribution and the t-distribution.

When testing for the population mean μ we use:

a) The z-distribution when the:

population standard deviation σ is known

population standard deviation is unknown and n is large (n≥30)

b) The t-distribution when the population standard deviation σ is unknown and the

sample size n is small (n<30 ) .

Step 3: Determine the Rejection and Acceptance Region

Depending on the distribution identified and the level of significance desired, you find a

value from statistical tables which we call a critical value. The critical value separates the

acceptance region from the rejection region. The rejection region is made up of a range of

values such that if a test statistic calculated from sample data falls in it the null hypothesis

would be rejected. The rejection region also depends on the nature of the alternative

hypothesis as shown in the following figures.

area of rejection (α /2 ) area of rejection (α /2 )

critical value 0 critical value

Rejection Region for H 1 : μ≠μ0

Page

9 of 20

Page 10: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

Area of rejection

0 critical value

Rejection Region for H1 : μ>μ0

area of rejection (α )

Critical value 0

Rejection Region for H1 : μ<μ0

Step 4: Calculate the Test Statistic

A test statistic is a value calculated from sample data that is used to decide whether or not to

reject H0. Once the test statistic falls within the rejection region, H0 is rejected.

The calculation of the test statistic depends on whether the population standard deviation σ is

known or unknown and also on the sample size as summarised in Table 1 below.

Table 1: Test Statistic for Testing μ

When σ is known When σ is unknown

Case I: n is large or small

Zcal=x−μ0

σ /√n ~ N(0,1)

Case II: n is large

Zcal=x−μ0

s /√n ~N(0,1)

Page

10 of 20

Page 11: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

Case III: n is small

T cal=x−μ0

s /√n ~ t(n-1)

Step 5: Decide Whether or not to Reject H0

The decision is made on the basis of a comparison between the value of the test statistic and

the critical value. If the test statistic is greater than the critical value in absolute terms, it will

fall in the rejection region thus leading to the rejection of H0.

Step 6: Make a Conclusion

If H0 is rejected, we conclude that H1 is probably true. If we fail to reject H0, we conclude that

the evidence gathered is insufficient to warrant the rejection of H0.

Example 3

An immigration officer claims that at least 40 people cross the border illegally into South

Africa everyday. To test the claim, the number of people crossing illegally into South Africa

was noted on 12 randomly selected days as follows:

28 41 36 50 17 39 21 64 26 30 42 12

Test the claim made by the immigration officer at a 5% level of significance.

Solution 3

1. H 0 : μ≥40

H1 : μ<40

2. The population standard deviation σ is unknown, but the sample size n =12 is small,

so we use the t-distribution.

Page

11 of 20

Page 12: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

3. α=0 . 05 and it is a one-tailed test. The critical value is −tα(n−1)=−t0.05(11)=−1. 80

-1.80 0

We would reject H0 if |T cal|>¿ ¿1.80

4. You should verify that x=33 . 8333 and s =14.6339

T cal=

x−μ0

s /√n

=

38 . 8333−4014 . 6339/√12

= -0.2762

5. Since |T cal|= 0.2762 < 1.80, we fail to reject H0

6. The data does not provide sufficient evidence at 5% level of significance to reject H 0,

therefore we conclude that the claim is likely to be true.

3.0 Bivariate Statistics

Page

12 of 20

Page 13: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

Bivariate Statistics refers to statistics pertaining to two variables.

3.1 Correlation analysis

Correlation analysis is a statistical procedure that is used to measure the extent to which two

variables are related. Two variables are highly correlated if they move well together such that

it is possible to predict the movement of one variable using knowledge about the movement

of the other variable. For example, good rains are associated with good harvests (other

conditions held constant).

The correlation coefficient denoted by r is a numerical measure of the strength of the linear

relationship between two variables X and Y. A correlation coefficient takes values between -

1 and +1 inclusive, that is −1≤r≤+1 .

Possible values of r are interpreted as follows:

r is equal to zero, indicates there is no correlation between the variables.

r =+1 indicates a perfect positive correlation between the variables.

r=−1 indicates a perfect negative correlation between the variables

r close to +1 indicates a strong positive correlation

r close to -1 indicates a strong negative correlation

r close to zero implies a weak correlation between the variables

The following adjectives may help you to describe the degree of linear association between

two variables:

Values of r Suitable adjectives

+0.7 to +1.0 Strong, positive

+0.4 to

+0.69

Fair/moderate, positive

+0.3 to

+0.39

Weak, positive

0.0 to +0.29 Negligible/scant positive

0.0 to -0.29 Negligible/scant negative

-0.3 to -0.39 Weak, negative

-0.4 to -0.69 Fair/moderate, negative

-0.7 to -1.0 Strong, negative

Page

13 of 20

Page 14: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

There are two commonly used correlation coefficients which are:

a) Pearson’s product moment correlation coefficient(r )

b) Spearman’s rank correlation coefficient (r s )

3.1.1 Pearson’s product moment correlation coefficient

Pearson’s correlation coefficient is calculated for ratio data or interval data. It is based on the

mean and standard deviation and therefore can be affected by extreme values. Pearson’s

coefficient is obtained using the following computational formula:

r=

n∑ xy−∑ x∑ y

√ (n∑ x2−(∑ x )2) (n∑ y2−(∑ y )2)

Example 4

The following data are a random sample of indexed prices of gold and platinum over a six

year period:

Gold (X) 12 10 1

4

11 1

2

9

Platinum

(Y)

18 17 2

3

19 2

0

15

Calculate and interpret the correlation coefficient for the data.

Solution 4

n=6 ∑ x=68 ∑ x2=786 ∑ y=112 ∑ y2=2128 ∑ xy=1292

r=n∑ xy−∑ x∑ y

√ (n∑ x2−(∑ x )2) (n∑ y2−(∑ y )2)

=

6 (1292)−(68)(112 )

√ [6(786 )−(68)2 ] [6(2128 )−(112 )2 ]

Page

14 of 20

Page 15: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

=136

√(92 )(224 )

=0. 9474

A correlation coefficient of 0.9474 is close to +1 and it indicates a strong positive correlation

between the prices of gold and platinum. When platinum prices are going up, gold prices are

also expected to go up, other things being equal.

3.1.2 Spearman’s rank correlation coefficient

The Spearman’s correlation coefficient denoted by r s is calculated for ranked data. This is

data measured on the ordinal scale. The correlation coefficient r s is interpreted in the same

way as the Pearson’s correlation coefficientr . The computational formula for r s is given by

r s=1−

6∑ d i2

n(n2−1)

where d is the difference in ranks obtained by subtracting the ranks of y values from the

ranks of x values for each pair of observations.

Example 5

A panel of two judges ranked the quality of potatoes harvested from 5 A1 farms (A, B, C, D

and E) as follows:

A1 farm A B C D E

Judge 1 4 5 1 3 2

Judge 2 5 4 2 3 1

Is there agreement in the manner in which the judges perceive the quality of potatoes from

the five farms?

Solution 5

Judge 1 Judge

2

di di2

4 5 -1 1

5 4 1 1

Page

15 of 20

Page 16: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

1 2 -1 1

3 3 0 0

2 1 1 1

∑ d i2=4

r s=1−6∑ d i

2

n(n2−1)

=1−

6 (4 )5 (52−1 )

=1−24

120 = 0.8

The correlation coefficient is fairly high and positive showing that the judges do not differ

much in the way they perceive the quality of potatoes from the 5 farms.

3.2 Regression Analysis

In real life situations we are usually interested in relationships between variables. Simple

linear regression is used to establish the functional form of a relationship between two

variables.

3.2.1 Types of Variables

There are two types of variables that we deal with in simple linear regression analysis. The

variable that has a dependence relationship on the other variable is called the response

variable or dependent variable, denoted by Y. The other variable is called the explanatory or

independent variable denoted by X.

3.2.2 Scatter Plots

Random pairs of observations for the independent and dependent variables are plotted on a

graph to show the kind of relationship that exists between the variables. From the scatter plot

Page

16 of 20

Page 17: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

you can tell if the relationship can be modelled by a straight line equation before proceeding

to do linear regression analysis.

Example 6

A researcher would like to establish the impact of population growth on economic

development. He collected the following cross-sectional data from six SADC countries.

Draw a scatter plot to represent the data. Comment on the kind of relationship between

population growth and economic development.

Solution 6

4 6 8 10 12 14 160

10

20

30

40

50

60

Population growth rate

Annu

al G

DP(U

S$bi

llion

)

Page

17 of 20

Population growth rate (%) Annual GDP (US$ billion)

7 18

9 35

5 12

15 50

12 36

6 24

Page 18: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

The points seem to be following a line with positive gradient. If you insert a line of best fit

through the points, you will see that the points do not deviate much from the line. We can

therefore conclude that there is a strong positive linear relationship between population

growth and economic development.

3.2.3 The Simple Linear Regression Model

The simple linear regression model is given by

Y= β0+β1 X+e

where Y is the dependent variable, X is the independent variable, e is the error term, β0 and

β1 are called population regression coefficients representing the intercept and slope

respectively. The model represents the unknown population relationship between the two

variables. The error term is included in the model to capture the effect of other important

variables that are omitted in the model.

The regression equation is estimated by:

Y=a+bX

where b=

n∑ xy−∑ x∑ yn∑ x2−(∑ x )2

and a=

∑ y−b∑ xn

Y=a+bX is the estimated regression equation connecting variables X and Y, where a and

b are estimates of the population interceptβ0 and population slopeβ1 of the line respectively.

Example 7

Estimate the regression equation for the data of Example 6

Solution 7

Page

18 of 20

Page 19: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

Using the two variable statistical mode on your calculator, you will obtain the following

results:

n=6 ∑ x=54 ∑ x2=560 ∑ y=175 ∑ xy=1827

b=n∑ xy−∑ x∑ y

n∑ x2−(∑ x )2

=

6(1827 )−54 (175)6 (560)−(54 )2

=1512

444 = 3.405405405

a=∑ y−b∑ x

n

=

175−3 . 405405405(54 )6

= -1.481981978

The regression equation is Y=−1. 481981978+3 .405405405 X

3.2.4 Interpretation of a and b

The parameter a is the intercept on the dependent variable. It is the value that the variable Y

is predicted to assume if the variable X has a zero value. You should guard against

interpreting the intercept in terms of the dependent variable if the range of X -values used to

construct the model do not include zero.

The parameter b represents the rate of change of Y with respect to X . Thus the value of b

shows the corresponding change in the value of Y for every unit change in the value of X .

In the equation obtained above, the value a=−1 . 481981978cannot be interpreted in terms of

economic development since the X values used to construct the equation did not include

zero.

The value b=3 .405405405 represents the corresponding increase in GDP for every unit

increase in the population growth rate.

Page

19 of 20

Page 20: downloads.swappyworld.com€¦  · Web viewGEO-STATISTICAL ANALYSIS AND PRESENTATION. Prepared for MEGTA 2017 Annual Conference. By Amos Tendai Munzara (0773 245 970) 1.0 Levels

3.2.5 Estimating Values of the Dependent Variable

The regression model can be used to estimate values of the dependent variable given values

of the independent variable. A given value of X is substituted into the regression equation to

obtain the corresponding value of Y. The model can give reliable estimates of values of Y for

X-values within the range of values used to construct the model. Outside the range of X-

values used in the construction of the model, the model becomes unpredictable and may give

misleading results. For this reason, you should guard against extrapolation in regression

analysis.

Example 8

Use the model obtained in Example 7 to estimate the level of economic development if the

population of a country grows by 10%.

Solution 8

X=10⇒Y=−1 . 481981978+3 . 405405405(10 ) = US$ 32.57207207 billion

Page

20 of 20