Research methods and Statistics_Ismail Khater

Embed Size (px)

Citation preview

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    1/11

    Page0

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    2/11

    Page1

    Urban Communication

    Table of Contents

    Introduction ........................................................................ 2

    Data and Methodology ........................................................ 3

    Regression ........................................................................... 4

    Conclusion ........................................................................... 8

    MatLab Commands ............................................................. 9

    Bibliography ...................................................................... 10

    Table of Figures

    Fig. 1: collected data (world bank data) .................................................................................... 3

    Fig. 2: internet use and GDP ...................................................................................................... 4

    Fig. 3: cell phone use and GDP ................................................................................................... 4

    Fig. 4: urban population and GDP .............................................................................................. 4

    Fig. 5: individual line fitting (with prediction bands) ................................................................. 5

    Fig. 6: regression analysis .......................................................................................................... 6

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    3/11

    Page2

    Introduction

    In the past two decades, and especially since the beginning of the 21st

    century, new

    technological visual, oral and written communication methods have become the norm,

    facilitating communities to exchange information on a global level. The internet, as the

    virtual part of urbanism, had a huge impact on assisting communication on a vast variety of

    levels, such as education, health care and business. It is also reducing the transportation

    requirement of information, and has a relatively instant speed of delivery. With the growth

    of internet availability and integration with governments, marketing opportunities

    increased, as well as job opportunities. [1] Secondly, cell phone communication also has a

    vast impact on the economy and well being of a country, especially in remote areas, where

    the substitute would have been physical transport. [2] Finally, it could also be argued that

    the increase in the urban population, here looked at as the increase in physical

    communication due to proximity, has the potential of reducing communication burdens,

    and therefore progress faster towards a wealthier nation. All of these influence the

    economy and standards of living, as well as education and many others.

    This paper examines the implications of different variables related to virtual and physical

    communication tools on the overall living standard, by running a multiple linear regression

    and analyzing it. For this particular analysis the program Matlab is used, which is a high-level

    technical computing language and interactive environment for algorithm development, data

    visualization, data analysis, and numeric computation program. [3] For the regression I will

    explore the relationships between the urban population percentages (X1), the internet users

    per hundred users(X2) and cell phones subscriptions (X3) in a random set of 23 countries,

    and regress them to the most used indicator for wealth of a nation and standard of living,

    the GDP (Gross domestic product) (Y).

    The data sets used in the regression are for the countries: Ethiopia, Kenya, India, Sudan,

    Iraq, Egypt, Indonesia, South Africa, Mexico, Turkey, Brazil, Israel, Spain, United Kingdom,

    Singapore, Canada, Germany, France, Australia, Sweden, United States, United Arab

    Emirates and Switzerland, and the source is from the Worldbank data. [4] The year for all

    the data is 2009.

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    4/11

    Page3

    Data and Methodology

    The following data sample has been collected from the Worldbank to perform the analysis:

    The urban population data refers to people living in urban areas as defined by national

    statistical offices. It is calculated using World Bank population estimates and urban ratios

    from the United Nations World Urbanization Prospects. The internet users data refers to

    the number of users per 100 p with access to the World Wide Web. Mobile cellular

    telephone subscriptions are subscriptions to a public mobile telephone service using cellular

    technology, which provide access to the public switched telephone network. Post-paid and

    prepaid subscriptions are included. At last, the GDP is calculated by the following equation:

    Y=C+I+G+(XM), where GDP (Y) is a sum of Consumption (C), Investment (I), Government

    Spending (G) and Net Exports (X (exports)- M (imports)).

    Fig. 1: collected data (world bank data)

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    5/11

    Page4

    The first step will be making simple regressions, for each of the three explanatory variables

    (Xis), to the independent variable (Y), where the X represents the urban population, the

    internet users and the cell phone subscriptions, and the Y correspond to the GDP.

    The linear function is (Y = 0 + 1X1 +) where the 1 is the slope, with an error term.

    The next step will be running the hypothesis test (H0: i = 0) to exclude it from the 95%

    confidence interval.

    The third step would be a multiple linear regression. Each two dependent variables will be

    regressed to the independent variable. At the end, all dependant variables will be taken into

    one multiple regression.

    The R Square is giving us the percentage of the variability of Y explained by the X is, simply

    put; it explains the behavior of Y. The P-value will provide us with a credibility coefficient to

    determine the significance of the findings. [5]

    Regression

    First, I plotted the three dependant variables separately to the dependant variable:

    From the graphs we can tell that the relationship switches direction in the three Xis. This

    means that we cannot say that the larger the internet users or cell users or urban

    population the higher the GDP is.

    Fig. 4: urban population and GDP Fig. 2: internet use and GDP Fig. 3: cell phone use and GDP

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    6/11

    Page5

    Then, I used the OLS (ordinary least squares) to fit a line for each of the dependants. The

    dotted lines in the following graph represent the 95% confidence, which here are called the

    prediction bands.

    Fig. 5: individual line fitting (with prediction bands)

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    7/11

    Page6

    The following tables include the findings of the three single and four multiple linear

    regressions, with the GDP as the intercept, which will be analyzed afterwards:

    Fig. 6: regression analysis

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    8/11

    Page7

    When looking at the single regression 2 we can see that the internet use has an 87.1% in R2,

    which means that this amount of the variability of the dependent variable (GDP) is

    explained by the variability of the independent variable (Internet). The P value of the

    dependent variable (internet) is near (but not) zero. This shows a high level of confidence,

    and falsifies the null hypothesis.

    The function for the single regression is:

    Y = -3739 +590.9X2 +

    The effect of adding a second variable, like adding the urban population variable in the

    multiple regression 1 for example, shows a slight and insignificant increase in the R2

    of only

    0.4%. As the P value of the added variable (urban population) is 0.4, it shows us that it

    doesnt have a significant relationship with the independent variable (GDP).

    The last multiple regression takes the three dependent variables into the calculation. The

    addition of the cell phone users reduced the P value of the urban population variable, and

    raised the overall percentage of the explanation of the dependant Y to 88.04%.

    From all the regressions we can see that the more related (even if not significantly) variables

    added, the more coverage of the interpretation of the independent variable. This is only in

    the case that the null hypothesis is falsified.

    The function for the last multiple regression is:

    Y = -611.5 -124.86X1 + 608.47 X2 +48.89 X3+

    The huge difference in beta zero shows us that it was estimated too low. The enormous

    fluctuation, even between positive and negative, of the urban population beta coefficient is

    another signal next to the p value that it is either irrelevant or the data set is not naturally

    distributed. The high p value 0.915 of the intercept shows that it is not reliable and random.

    Therefore, the rejection of the regression is reasonable and thus the use of the multiple

    regression with the cell and internet usage only would be more accurate. High GDP

    countries could exist with a medium urban population.

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    9/11

    Page8

    Conclusion

    The indicators for GDP cannot be looked at in isolation, and cannot be separated from

    broader demographic, economic and social influences, such as natural resources for

    example. In order to use this model, certain assumptions have to be met. Since the sample

    is not a natural distribution (from the central limit theorem), and not randomly selected, we

    can hold the test (with the three dependents) for untrue and from the plots we can see that

    its a homoscedastic behavior. This could be due to the fact that the selection of countries

    was bias, as I have chosen to use the countries of my co-students and other countries of

    interest. The standard deviation of the regression model shows us that there is about 30%

    of error. This means that this model is not good to be used for prediction for the GDP. This is

    calculated by dividing the error by the mean value of the GDP (independent).

    Even when GDP is widely used by economists, it has a lot of limitations. This can be seen in

    its ignorance of externalities, such as the damage of the environment, and also its lack of

    quality, for example in not showing the wealth distribution. However, I found no other

    indicator that would fit the calculation of wellbeing in this exercise.

    We have seen that the low density or rural or in other words the urban sprawl does not

    directly relate to the GDP of a country. Nevertheless, it is certain that the immense

    infrastructure and resource use to create and connect these unsustainable car driven

    suburbs has a direct negative relationship with the economy and also the environment,

    which will be multiplied when we run out of fossil fuels. [6]

    On the other hand, the relationship between the internet usage and GDP is significant, and

    proved to be consistent. It can be stated confidently, that the use of internet contributes

    through the spill out effect of knowledge to the increase of economic growth.

    At last, the variables internet and cell usage play a significant but not complete role in

    explaining the GDP, as there are many other variables that fill the inexplicable part of it.

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    10/11

    Page9

    MatLab Commands

    load('x1_urbanpop')

    load('x2_internetusers')

    load('x3_cellusers')

    load('y_gdp')y_gdp=y_gdp';

    X_multiple1_internet_urban=vertcat(x2_internetusers,x1_urbanpop)';

    X_multiple2_cell_urban=vertcat(x3_cellusers,x1_urbanpop)';

    X_multiple3_internet_cell=vertcat(x2_internetusers,x3_cellusers)';

    X_multiple4_urban_internet_cell=vertcat(x1_urbanpop,x2_internetusers,x3_cellusers)';

    whichstats = {'beta', 'yhat', 'r', 'rsquare', 'tstat', 'fstat'};

    stats = regstats(y_gdp, x1_urbanpop, 'linear', whichstats);

    (beta = stats.beta;

    yhat = stats.yhat;

    r = stats.r;

    rsquare = stats.rsquare;

    tstat = stats.tstat;fstat = stats.fstat;

    disp ' '

    disp 'the Single Regression 1(with urbanpop):'

    disp ' '

    disp 'the estimated coefficients ...'

    disp 'beta0 (for the Intercept), beta1 (for urbanpop)'

    tstat.beta

    disp 'Press any key to continue ...'

    pause

    disp ' '

    disp 'the t-statistics ...'

    tstat.t

    disp 'Press any key to continue ...'

    pause

    disp ' '

    disp 'p-values'

    tstat.pval

    disp 'Press any key to continue ...'

    pause

    disp ' '

    disp 'R-squared:'

    rsquare

    disp ' '

    pause)(note:from this point the commands between brackets would be referred to as *C1)

    stats = regstats(y_gdp, x2_internetusers, 'linear', whichstats);

    (*C1)

    stats = regstats(y_gdp, x3_cellusers, 'linear', whichstats);

    (*C1)

    stats = regstats(y_gdp, X_multiple1_internet_urban, 'linear', whichstats);

    (*C1)

    stats = regstats(y_gdp, X_multiple2_cell_urban, 'linear', whichstats);

    (*C1)

    stats = regstats(y_gdp, X_multiple3_internet_cell, 'linear', whichstats);

    (*C1)

    stats = regstats(y_gdp, X_multiple4_urban_internet_cell, 'linear', whichstats);

    (*C1)

    (end of script)

  • 8/2/2019 Research methods and Statistics_Ismail Khater

    11/11

    Page10

    Bibliography

    [1] The Internet and its Effect on the Economy and Government. Martine Kalaw.

    people.hamilton.ed. [Online].

    http://people.hamilton.edu/bhouse/EconAndGov/EconAndGov.html

    [2] The Impact of Telecoms on Economic Growth in Developing Countries. Meloria Meschi

    and Melvyn Fuss Leonard Waverman. web.si.umich.edu. [Online].

    http://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-

    %20Telecoms%20Growth%20in%20Dev.%20Countries.pdf

    [3] Matlab Product Description. mathworks.com. [Online].

    http://www.mathworks.com/products/matlab/description1.html

    [4] data.worldbank.org. [Online]. http://data.worldbank.org/

    [5] Introductory statistics for business and economics, 4th ed.Ronald J. Wonnacott

    Thomas H. Wonnacott,: John wiley and sons.

    [6] Urban Density and Climate Change. David Dodman. unfpa.org. [Online].

    http://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodm

    an%20Paper.pdf

    http://people.hamilton.edu/bhouse/EconAndGov/EconAndGov.htmlhttp://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-%20Telecoms%20Growth%20in%20Dev.%20Countries.pdfhttp://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-%20Telecoms%20Growth%20in%20Dev.%20Countries.pdfhttp://www.mathworks.com/products/matlab/description1.htmlhttp://data.worldbank.org/http://data.worldbank.org/http://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodman%20Paper.pdfhttp://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodman%20Paper.pdfhttp://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodman%20Paper.pdfhttp://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodman%20Paper.pdfhttp://data.worldbank.org/http://www.mathworks.com/products/matlab/description1.htmlhttp://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-%20Telecoms%20Growth%20in%20Dev.%20Countries.pdfhttp://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-%20Telecoms%20Growth%20in%20Dev.%20Countries.pdfhttp://people.hamilton.edu/bhouse/EconAndGov/EconAndGov.html