Upload
ismail-khater
View
221
Download
0
Embed Size (px)
Citation preview
8/2/2019 Research methods and Statistics_Ismail Khater
1/11
Page0
8/2/2019 Research methods and Statistics_Ismail Khater
2/11
Page1
Urban Communication
Table of Contents
Introduction ........................................................................ 2
Data and Methodology ........................................................ 3
Regression ........................................................................... 4
Conclusion ........................................................................... 8
MatLab Commands ............................................................. 9
Bibliography ...................................................................... 10
Table of Figures
Fig. 1: collected data (world bank data) .................................................................................... 3
Fig. 2: internet use and GDP ...................................................................................................... 4
Fig. 3: cell phone use and GDP ................................................................................................... 4
Fig. 4: urban population and GDP .............................................................................................. 4
Fig. 5: individual line fitting (with prediction bands) ................................................................. 5
Fig. 6: regression analysis .......................................................................................................... 6
8/2/2019 Research methods and Statistics_Ismail Khater
3/11
Page2
Introduction
In the past two decades, and especially since the beginning of the 21st
century, new
technological visual, oral and written communication methods have become the norm,
facilitating communities to exchange information on a global level. The internet, as the
virtual part of urbanism, had a huge impact on assisting communication on a vast variety of
levels, such as education, health care and business. It is also reducing the transportation
requirement of information, and has a relatively instant speed of delivery. With the growth
of internet availability and integration with governments, marketing opportunities
increased, as well as job opportunities. [1] Secondly, cell phone communication also has a
vast impact on the economy and well being of a country, especially in remote areas, where
the substitute would have been physical transport. [2] Finally, it could also be argued that
the increase in the urban population, here looked at as the increase in physical
communication due to proximity, has the potential of reducing communication burdens,
and therefore progress faster towards a wealthier nation. All of these influence the
economy and standards of living, as well as education and many others.
This paper examines the implications of different variables related to virtual and physical
communication tools on the overall living standard, by running a multiple linear regression
and analyzing it. For this particular analysis the program Matlab is used, which is a high-level
technical computing language and interactive environment for algorithm development, data
visualization, data analysis, and numeric computation program. [3] For the regression I will
explore the relationships between the urban population percentages (X1), the internet users
per hundred users(X2) and cell phones subscriptions (X3) in a random set of 23 countries,
and regress them to the most used indicator for wealth of a nation and standard of living,
the GDP (Gross domestic product) (Y).
The data sets used in the regression are for the countries: Ethiopia, Kenya, India, Sudan,
Iraq, Egypt, Indonesia, South Africa, Mexico, Turkey, Brazil, Israel, Spain, United Kingdom,
Singapore, Canada, Germany, France, Australia, Sweden, United States, United Arab
Emirates and Switzerland, and the source is from the Worldbank data. [4] The year for all
the data is 2009.
8/2/2019 Research methods and Statistics_Ismail Khater
4/11
Page3
Data and Methodology
The following data sample has been collected from the Worldbank to perform the analysis:
The urban population data refers to people living in urban areas as defined by national
statistical offices. It is calculated using World Bank population estimates and urban ratios
from the United Nations World Urbanization Prospects. The internet users data refers to
the number of users per 100 p with access to the World Wide Web. Mobile cellular
telephone subscriptions are subscriptions to a public mobile telephone service using cellular
technology, which provide access to the public switched telephone network. Post-paid and
prepaid subscriptions are included. At last, the GDP is calculated by the following equation:
Y=C+I+G+(XM), where GDP (Y) is a sum of Consumption (C), Investment (I), Government
Spending (G) and Net Exports (X (exports)- M (imports)).
Fig. 1: collected data (world bank data)
8/2/2019 Research methods and Statistics_Ismail Khater
5/11
Page4
The first step will be making simple regressions, for each of the three explanatory variables
(Xis), to the independent variable (Y), where the X represents the urban population, the
internet users and the cell phone subscriptions, and the Y correspond to the GDP.
The linear function is (Y = 0 + 1X1 +) where the 1 is the slope, with an error term.
The next step will be running the hypothesis test (H0: i = 0) to exclude it from the 95%
confidence interval.
The third step would be a multiple linear regression. Each two dependent variables will be
regressed to the independent variable. At the end, all dependant variables will be taken into
one multiple regression.
The R Square is giving us the percentage of the variability of Y explained by the X is, simply
put; it explains the behavior of Y. The P-value will provide us with a credibility coefficient to
determine the significance of the findings. [5]
Regression
First, I plotted the three dependant variables separately to the dependant variable:
From the graphs we can tell that the relationship switches direction in the three Xis. This
means that we cannot say that the larger the internet users or cell users or urban
population the higher the GDP is.
Fig. 4: urban population and GDP Fig. 2: internet use and GDP Fig. 3: cell phone use and GDP
8/2/2019 Research methods and Statistics_Ismail Khater
6/11
Page5
Then, I used the OLS (ordinary least squares) to fit a line for each of the dependants. The
dotted lines in the following graph represent the 95% confidence, which here are called the
prediction bands.
Fig. 5: individual line fitting (with prediction bands)
8/2/2019 Research methods and Statistics_Ismail Khater
7/11
Page6
The following tables include the findings of the three single and four multiple linear
regressions, with the GDP as the intercept, which will be analyzed afterwards:
Fig. 6: regression analysis
8/2/2019 Research methods and Statistics_Ismail Khater
8/11
Page7
When looking at the single regression 2 we can see that the internet use has an 87.1% in R2,
which means that this amount of the variability of the dependent variable (GDP) is
explained by the variability of the independent variable (Internet). The P value of the
dependent variable (internet) is near (but not) zero. This shows a high level of confidence,
and falsifies the null hypothesis.
The function for the single regression is:
Y = -3739 +590.9X2 +
The effect of adding a second variable, like adding the urban population variable in the
multiple regression 1 for example, shows a slight and insignificant increase in the R2
of only
0.4%. As the P value of the added variable (urban population) is 0.4, it shows us that it
doesnt have a significant relationship with the independent variable (GDP).
The last multiple regression takes the three dependent variables into the calculation. The
addition of the cell phone users reduced the P value of the urban population variable, and
raised the overall percentage of the explanation of the dependant Y to 88.04%.
From all the regressions we can see that the more related (even if not significantly) variables
added, the more coverage of the interpretation of the independent variable. This is only in
the case that the null hypothesis is falsified.
The function for the last multiple regression is:
Y = -611.5 -124.86X1 + 608.47 X2 +48.89 X3+
The huge difference in beta zero shows us that it was estimated too low. The enormous
fluctuation, even between positive and negative, of the urban population beta coefficient is
another signal next to the p value that it is either irrelevant or the data set is not naturally
distributed. The high p value 0.915 of the intercept shows that it is not reliable and random.
Therefore, the rejection of the regression is reasonable and thus the use of the multiple
regression with the cell and internet usage only would be more accurate. High GDP
countries could exist with a medium urban population.
8/2/2019 Research methods and Statistics_Ismail Khater
9/11
Page8
Conclusion
The indicators for GDP cannot be looked at in isolation, and cannot be separated from
broader demographic, economic and social influences, such as natural resources for
example. In order to use this model, certain assumptions have to be met. Since the sample
is not a natural distribution (from the central limit theorem), and not randomly selected, we
can hold the test (with the three dependents) for untrue and from the plots we can see that
its a homoscedastic behavior. This could be due to the fact that the selection of countries
was bias, as I have chosen to use the countries of my co-students and other countries of
interest. The standard deviation of the regression model shows us that there is about 30%
of error. This means that this model is not good to be used for prediction for the GDP. This is
calculated by dividing the error by the mean value of the GDP (independent).
Even when GDP is widely used by economists, it has a lot of limitations. This can be seen in
its ignorance of externalities, such as the damage of the environment, and also its lack of
quality, for example in not showing the wealth distribution. However, I found no other
indicator that would fit the calculation of wellbeing in this exercise.
We have seen that the low density or rural or in other words the urban sprawl does not
directly relate to the GDP of a country. Nevertheless, it is certain that the immense
infrastructure and resource use to create and connect these unsustainable car driven
suburbs has a direct negative relationship with the economy and also the environment,
which will be multiplied when we run out of fossil fuels. [6]
On the other hand, the relationship between the internet usage and GDP is significant, and
proved to be consistent. It can be stated confidently, that the use of internet contributes
through the spill out effect of knowledge to the increase of economic growth.
At last, the variables internet and cell usage play a significant but not complete role in
explaining the GDP, as there are many other variables that fill the inexplicable part of it.
8/2/2019 Research methods and Statistics_Ismail Khater
10/11
Page9
MatLab Commands
load('x1_urbanpop')
load('x2_internetusers')
load('x3_cellusers')
load('y_gdp')y_gdp=y_gdp';
X_multiple1_internet_urban=vertcat(x2_internetusers,x1_urbanpop)';
X_multiple2_cell_urban=vertcat(x3_cellusers,x1_urbanpop)';
X_multiple3_internet_cell=vertcat(x2_internetusers,x3_cellusers)';
X_multiple4_urban_internet_cell=vertcat(x1_urbanpop,x2_internetusers,x3_cellusers)';
whichstats = {'beta', 'yhat', 'r', 'rsquare', 'tstat', 'fstat'};
stats = regstats(y_gdp, x1_urbanpop, 'linear', whichstats);
(beta = stats.beta;
yhat = stats.yhat;
r = stats.r;
rsquare = stats.rsquare;
tstat = stats.tstat;fstat = stats.fstat;
disp ' '
disp 'the Single Regression 1(with urbanpop):'
disp ' '
disp 'the estimated coefficients ...'
disp 'beta0 (for the Intercept), beta1 (for urbanpop)'
tstat.beta
disp 'Press any key to continue ...'
pause
disp ' '
disp 'the t-statistics ...'
tstat.t
disp 'Press any key to continue ...'
pause
disp ' '
disp 'p-values'
tstat.pval
disp 'Press any key to continue ...'
pause
disp ' '
disp 'R-squared:'
rsquare
disp ' '
pause)(note:from this point the commands between brackets would be referred to as *C1)
stats = regstats(y_gdp, x2_internetusers, 'linear', whichstats);
(*C1)
stats = regstats(y_gdp, x3_cellusers, 'linear', whichstats);
(*C1)
stats = regstats(y_gdp, X_multiple1_internet_urban, 'linear', whichstats);
(*C1)
stats = regstats(y_gdp, X_multiple2_cell_urban, 'linear', whichstats);
(*C1)
stats = regstats(y_gdp, X_multiple3_internet_cell, 'linear', whichstats);
(*C1)
stats = regstats(y_gdp, X_multiple4_urban_internet_cell, 'linear', whichstats);
(*C1)
(end of script)
8/2/2019 Research methods and Statistics_Ismail Khater
11/11
Page10
Bibliography
[1] The Internet and its Effect on the Economy and Government. Martine Kalaw.
people.hamilton.ed. [Online].
http://people.hamilton.edu/bhouse/EconAndGov/EconAndGov.html
[2] The Impact of Telecoms on Economic Growth in Developing Countries. Meloria Meschi
and Melvyn Fuss Leonard Waverman. web.si.umich.edu. [Online].
http://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-
%20Telecoms%20Growth%20in%20Dev.%20Countries.pdf
[3] Matlab Product Description. mathworks.com. [Online].
http://www.mathworks.com/products/matlab/description1.html
[4] data.worldbank.org. [Online]. http://data.worldbank.org/
[5] Introductory statistics for business and economics, 4th ed.Ronald J. Wonnacott
Thomas H. Wonnacott,: John wiley and sons.
[6] Urban Density and Climate Change. David Dodman. unfpa.org. [Online].
http://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodm
an%20Paper.pdf
http://people.hamilton.edu/bhouse/EconAndGov/EconAndGov.htmlhttp://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-%20Telecoms%20Growth%20in%20Dev.%20Countries.pdfhttp://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-%20Telecoms%20Growth%20in%20Dev.%20Countries.pdfhttp://www.mathworks.com/products/matlab/description1.htmlhttp://data.worldbank.org/http://data.worldbank.org/http://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodman%20Paper.pdfhttp://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodman%20Paper.pdfhttp://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodman%20Paper.pdfhttp://www.unfpa.org/webdav/site/global/users/schensul/public/CCPD/papers/Dodman%20Paper.pdfhttp://data.worldbank.org/http://www.mathworks.com/products/matlab/description1.htmlhttp://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-%20Telecoms%20Growth%20in%20Dev.%20Countries.pdfhttp://web.si.umich.edu/tprc/papers/2005/450/L%20Waverman-%20Telecoms%20Growth%20in%20Dev.%20Countries.pdfhttp://people.hamilton.edu/bhouse/EconAndGov/EconAndGov.html