Upload
cristina-focsa
View
231
Download
0
Embed Size (px)
Citation preview
8/3/2019 Case Study Focsa Cristina
1/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
1
CASE STUDY
Record 4060 data for two statistical variables (X and Y) at your choice:
Admision - sesion July 2011
"Alexandru Ioan Cuza" Police Academy
List of admitted candidates
Police - Law
X -
independent
variable
Y -
dependen
t variable
Selection
unit
(county)
NUME PRENUME GenderBaccalaureate
Mark
Admision
Mark
AG GHINESCU ANDREEA-FLORENTINA F 6,50 8,98
AG MORARU ANCA-NICOLETA F 6,50 8,90
AG ALEXE SIDONIA-CRESCENZIA F 6,50 8,75
AR RADU LUIZA-CLAUDIA F 6,80 8,75
AG ARSENE STEFAN-ALEXANDRU M 7,00 8,80
AG FLORESCU IONUT-ANDREI M 7,00 8,80
DJ MARCULESCU ROXANA F 7,50 8,85
AG TRACHE CAMELIA-ELENA F 7,50 8,85
AG CALUGAROIU LIVIU-MARIAN M 7,80 8,93
CS TEODORESCU SILVIU-PETRU M 7,80 8,93
BR STOILESCU FLORIAN M 7,90 8,95
AG ANITA ALEXANDRU M 7,90 8,95
SV DUMITRAS CORINA-LAVINIA F 8,00 8,90BT ISTRATE DANIELA-ANDREEA F 8,00 8,90
AG CIOBANU DANIEL-VALENTIN M 8,00 8,90
BT GABOR ANDREEA-CATALINA F 8,30 8,98
BZ OPREA ALEXANDRU-ION M 8,40 9,00
AG IORDACHESCU MIHAI-CIPRIAN M 8,40 9,00
GJ DRAGOESCU LAVINIA F 8,70 9,00
PH MOCANU RAZVAN-DANIEL M 8,70 9,00
VL TURCU GEORGE-IONUT M 8,70 9,00
8/3/2019 Case Study Focsa Cristina
2/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
2
AG VASILE ROXANA-MARIA F 8,80 9,03
BC TIRON RADU-MARIAN M 8,90 9,05
AG MINCA IOANA-CATALINA F 8,90 9,05
OT CIOBANU MARIANA F 8,90 8,98
DJ BUTOI FLORIAN-COSMIN M 8,90 8,98
PH BEZNEA MIHAI-FLORIN M 9,00 9,00
OT ANDOR VLAD-CRISTIAN M 9,00 9,00
PH CONSTANTIN MADALIN M 9,00 9,00
BC SPANU CONSTANTIN M 9,00 9,00
BZ RACOREANU COSMIN-LAURENTIU M 9,00 9,00
AG ANGHELOIU LOREDANA-ELENA F 9,00 9,00
CT PETCULESCU ELENA-ADELINA F 9,00 9,00
GJ DRAGOTA LARISA-PETRUTA F 9,00 8,93
MH POPA IOANA F 9,00 8,93
AG PATRU ANDREEA-GEORGIANA F 9,00 8,93
MM DRAGOMIR RADU-STEFAN M 9,00 8,93
BZ DIMOFTE CRISTIAN-DANIEL M 9,10 8,95
DJ VOICULESCU ROBERT-CRISTIAN M 9,10 8,95
SV VAMANU IONELA-DANIELA F 9,20 8,98
GJ OGORANU IONUT-ADRIAN M 9,20 8,90
NT PETRESCU TEDY-FLORIN M 9,30 8,93
DJ RADUT ROXANA-FLORENTINA F 9,70 9,03
BZ BESNEA CATALIN-GEORGE M 9,80 9,05
VN TANASE CATALIN-TITEL F 9,80 9,05
BZ MATEI RAZVAN-COSMIN M 9,80 9,05
GJ GRECU FLAVIUS-ANDREI M 10,00 9,03
SV IVANUTA LAURA-VASILICA F 10,00 9,03
IF SIMA MARIAN-IONUT M 10,00 9,03
DB ATANASIE ALIN-IONUT M 10,00 9,03
Source:
http://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdf
http://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdfhttp://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdfhttp://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdf8/3/2019 Case Study Focsa Cristina
3/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
3
I For each of the two variables:
a) Calculate and interpret the average, standard deviation and the coefficient
of variation for row data. Interpret the results. Is the data series homogenous?
a.1) Average (or mean) is the arithmetic average of the scores (Baccalaureate
Mark and Admission Mark), also the average is a measure of central tendency
(a parameter enabling the researcher to determine the average score of a group
of scores).
In order to give an answer to this question and to offer an interpretation I have to
understand the relationship between the average, the median and the mode and
then to interpret the skewness parameter for the both cases (X and Y).
Firstly, by choosing the Descriptive Statistics tool from the Data Analysis
toolpack provided by MS EXCEL, I will have an overview picture of the
measures of central tendency, dispersion of data, skew of data and the kurtosis
of data.
Descriptive statistics will help us to examine:
1. central tendency (location) of data, i.e. where data tend to fall, as
measured by the mean, median, and mode.
2. dispersion (variability) of data, i.e. how spread out data are, as measured
by the variance and its square root, the standard deviation.
3. skew (symmetry) of data, i.e. how concentrated data are at the low or high
end of the scale, as measured by the skew index.
4. kurtosis (peakedness) of data, i.e. how concentrated data are around a
single value, as measured by the kurtosis index.
Baccalaureate Mark Admision Mark
Mean 8,6060 Mean 8,9570
Standard Error 0,1352 Standard Error 0,0106
Median 8,9000 Median 8,9750
Mode 9,0000 Mode 9,0000
Standard Deviation 0,9561 Standard Deviation 0,0751
Sample Variance 0,9140 Sample Variance 0,0056
Kurtosis -0,1215 Kurtosis 0,9812Skewness -0,6893 Skewness -1,1143
8/3/2019 Case Study Focsa Cristina
4/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
4
Range 3,5000 Range 0,3000
Minimum 6,5000 Minimum 8,7500
Maximum 10,0000 Maximum 9,0500
Sum 430,3000 Sum 447,8500Count 50,0000 Count 50,0000
Largest(1) 10,0000 Largest(1) 9,0500
Smallest(1) 6,5000 Smallest(1) 8,7500
Confidence Level(95,0%) 0,2717
Confidence
Level(95,0%) 0,0213
So, I computed the average of X and Y as follows:
X Y
AVERAGE
X
AVERAGE
Y
8,61 8,96
Comparing the averages computed (X and Y) and those provided by the
Descriptive Statistics tool, I will find them identical.
In both cases the average () is different from their median and mode, meaning
that the arrays (X and Y ) arent normal distributed data.
Comparing the average with the median and mode, I can see that:
average < median < mode in the both cases (X: 8,60 < 8,90 < 9,00) and
(Y: 8,95 < 8,97 < 9,00),
so this 3 parameters show us that the distributions of data in our datas are non-
normal distributions (skewed distributions) and, in this case, I have for X and Y
a non-bell-shaped distribution of scores.
Looking at the Skewness parameter provided by the Descriptive Statistics tool, I
see that the Skewness is negative in both cases, so both of the arrays (X and Y )
are negatively skewed or skewed left, meaning that the left tail is longer.
8/3/2019 Case Study Focsa Cristina
5/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
5
As the skewness in the Y array is between 1 and ( -0,6893), the distribution
is moderately skewed.
As the skewness in the X array is less than 1 (-1,1143), the distribution ishighly skewed.
If the data is very skewed, then the arithmetic mean might become misleading,
so we can conclude that in the Y data the average is not a good parameter to
measure the central tendency, while in the X data, the average might be taken
into consideration when talking about the central tendency.
a.2) Standard deviation is the square root of variance providing an index ofvariability in the distribution of scores, also the standard deviation is a measure
of variability (a parameter enabling the researcher to indicate how spread out a
group of scores are).
Computation in excel:
Total No. of
observations
= N
Standard
Deviation
X
(X -
X)
variance
of X
=
standard
dev of X
Standard
Deviation
Y
(Y -
Y)
variance
of Y
=
standard
dev of Y
50 0,9561 4,4352 0,9140 0,9561 0,0751 0,0003 0,0056 0,0751
4,4352 0,0032
4,4352 0,0428
3,2616 0,0428
2,5792 0,0246
2,5792 0,0246
1,2232 0,0114
1,2232 0,0114
0,6496 0,0010
0,6496 0,0010
0,4984 0,0000
0,4984 0,0000
0,3672 0,0032
0,3672 0,0032
0,3672 0,0032
0,0936 0,0003
8/3/2019 Case Study Focsa Cristina
6/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
6
0,0424 0,0018
0,0424 0,0018
0,0088 0,0018
0,0088 0,0018
0,0088 0,0018
0,0376 0,0046
0,0864 0,0086
0,0864 0,0086
0,0864 0,0003
0,0864 0,0003
0,1552 0,0018
0,1552 0,0018
0,1552 0,0018
0,1552 0,0018
0,1552 0,0018
0,1552 0,0018
0,1552 0,0018
0,1552 0,0010
0,1552 0,0010
0,1552 0,0010
0,1552 0,0010
0,2440 0,0000
0,2440 0,0000
0,3528 0,0003
0,3528 0,0032
0,4816 0,0010
1,1968 0,0046
1,4256 0,0086
1,4256 0,0086
1,4256 0,0086
1,9432 0,0046
1,9432 0,0046
1,9432 0,0046
1,9432 0,0046
8/3/2019 Case Study Focsa Cristina
7/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
7
Comparing standard deviation computed ( X and Y) and those provided by
the Descriptive Statistics tool, I will find them identical.
I have X= 0,9561 and Y= 0,0751.
The values computed are not so relevant measure of dispersion in their
relationship with the average because in both cases (X and Y) I have non-bell-
shaped distributions, as I have negatively skewed or skewed left distributions.
Even though, by comparing the two values of the standard deviation, I can see
that X is bigger than Y even if the Y average is bigger than the X average.
With this information I conclude by saying that the X array has a great variety ofvariables while the Y array has every variable proximate to the Y average.
a.3) Coefficient of variation: measures relative dispersion.
I have chosen to express the coefficient of variation in percentage and the values
computed are:
Coefficient of variation
CVx= x
/ X
CVy= y /
Y
11% 1%
The coefficient of variation values certify what I concluded in the interpretation
of the standard deviation values. Once again, I can say that Y has a lower
relative variability than X.
a.4) Is the data series homogenous?
Homogeneity measures the differences or similarities between the several
variables.
8/3/2019 Case Study Focsa Cristina
8/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
8
Having a 11% coefficient of variation in the first array, I can say that
homogeneity is low while the 1% coefficient of variation show us that the
second array has a high homogeneity.
b) Summarize the data in an appropriate number of classes. Construct the
frequency distribution.
b.1) To solve this point, I have to identify the lowest and highest values in the
list for X and Y, so I compute the min and max:
Min X Max X Min Y Max Y
6,50 10,00 8,75 9,05
Secondly, I have to compute the Range (Maximum Value Minimum Value)
for each variable (X, Y):
Range X Range Y
3,50 0,30
Thirdly, I compute the number of classes by using the grouping rule:
> N.
In our case, I have N (no. of observations) =50 and Ill have k=6 classes.
For identifying the exact classes I have to divide each ranges by k (Lx= 0, 5 and
Ly: 0, 05) to establish the length of the interval:
For X: For Y:
LL UL
6,500 7,0007,000 7,500
8/3/2019 Case Study Focsa Cristina
9/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
9
b.2) Frequency distribution for X:
LL ULMidpointmarks =
mxi
Nostudents=
xfi
6,500 7,000 6,750 6
7,000 7,500 7,250 2
7,500 8,000 7,750 7
8,000 8,500 8,250 3
8,500 9,000 8,750 19
9,500 10,000 9,750 13
Total 50
Frequency distribution for Y:
LL ULMidpointmarks =
myi
Nostudents=
yfi
8,750 8,800 8,775 4
8,800 8,850 8,825 1
8,850 8,900 8,875 6
8,900 8,950 8,925 9
8,950 9,000 8,975 19
9,000 9,050 9,025 11
total 50
c)Calculate and interpret for the frequency distribution the average, standard
deviation and coefficient of variance. Compare with the results from point a).
Explain the differences.
c.1) Average of the frequency distribution ():
LL ULMidpointmarks =
No students=xfi
mxi*xfi X
7,500 8,000
8,000 8,500
8,500 9,000
9,500 10,000
LL UL
8,750 8,800
8,800 8,850
8,850 8,900
8,900 8,950
8,950 9,000
9,000 9,050
8/3/2019 Case Study Focsa Cristina
10/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
10
mxi
6,500 7,000 6,750 6 40,500 8,540
7,000 7,500 7,250 2 14,500
7,500 8,000 7,750 7 54,250
8,000 8,500 8,250 3 24,750
8,500 9,000 8,750 19 166,250
9,500 10,000 9,750 13 126,750
Total 50 427,000
LL ULMidpointmarks =
myi
Nostudents=
yfi
myi*yfi Y
8,750 8,800 8,775 4 35,100 8,946
8,800 8,850 8,825 1 8,825
8,850 8,900 8,875 6 53,250
8,900 8,950 8,925 9 80,325
8,950 9,000 8,975 19 170,525
9,000 9,050 9,025 11 99,275
total 50 447,300
c.2) Standard deviation of the frequency distribution ():
LL ULMidpointmarks =
mxi
Nostudents=
xfimxi*xfi X
xfi(mxi-X)
Standard
Deviation
X
6,500 7,000 6,750 6 40,500 8,540 19,225 0,970
7,000 7,500 7,250 2 14,500 3,328
7,500 8,000 7,750 7 54,250 4,369
8,000 8,500 8,250 3 24,750 0,252
8,500 9,000 8,750 19 166,250 0,838
9,500 10,000 9,750 13 126,750 19,033
Total 50 427,000 47,045
8/3/2019 Case Study Focsa Cristina
11/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
11
LL ULMidpointmarks =
myi
Nostudents=
yfimyi*yfi Y
yfi(myi-
Y)
Standard
Deviation
Y
8,750 8,800 8,775 4 35,100 8,946 0,117 0,071
8,800 8,850 8,825 1 8,825 0,015
8,850 8,900 8,875 6 53,250 0,030
8,900 8,950 8,925 9 80,325 0,004
8,950 9,000 8,975 19 170,525 0,016
9,000 9,050 9,025 11 99,275 0,069
total 50 447,300 0,250
c.3) Coefficient of variance of the frequency distribution (CV):
LL ULMidpointmarks =
mxi
Nostudents=
xfimxi*xfi X
xfi(mxi-
X)
Standard
Deviation
X
CVx=
x / X
6,500 7,000 6,750 6 40,500 8,540 19,225 0,970 11%
7,000 7,500 7,250 2 14,500 3,328
7,500 8,000 7,750 7 54,250 4,369
8,000 8,500 8,250 3 24,750 0,252
8,500 9,000 8,750 19 166,250 0,838
9,500 10,000 9,750 13 126,750 19,033
Total 50 427,000 47,045
LL ULMidpointmarks =
myi
Nostudents=
yfimyi*yfi Y
yfi(myi-
Y)
Standard
Deviation
Y
CVy=
y /
Y
8,750 8,800 8,775 4 35,100 8,946 0,117 0,071 1%
8,800 8,850 8,825 1 8,825 0,015
8,850 8,900 8,875 6 53,250 0,030
8,900 8,950 8,925 9 80,325 0,004
8,950 9,000 8,975 19 170,525 0,016
9,000 9,050 9,025 11 99,275 0,069
8/3/2019 Case Study Focsa Cristina
12/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
12
total 50 447,300 0,250
c.4) Compare with the results from point a). Explain the differences.
Values computed for the row data: Values computed for the grouped data:
Average_X 8,61
Average_Y 8,96
Standard_deviation_X 0,9561
Standard_deviation_Y: 0,0751
Coefficient_of_variation_X 11%
Coefficient_of_variation_Y 1%
Average_X 8,64
Average_Y 8,94
Standard_deviation_X 0,970
Standard_deviation_Y: 0,071
Coefficient_of_variation_X 11%
Coefficient_of_variation_Y 1%
There are small differences between the average and the standard deviationparameters computed for the row data and those computed for the grouped data
and the values resulted in the second place are more accurate because I interpret
them by grouping the arrays and narrow the errors that might occur.
d) Construct a histogram and describe the shape of the distribution based on
the histogram.
d.1) Construct a histogram:
Firstly, Ive created the intervals (bins) by using the CONCATENATE function,
to merge the low limit with the upper limit and then Ive copied the frequencies
as follows (I have chosen to do this way because its more elegant than using the
Histogram Tool from the Data Analysis Tool pack):
HYSTOGRAM X HYSTOGRAM Y
Intervals Xfi Intervals Yfi
6,5-7 6 8,75 - 8,8 4
7-7,5 2 8,8 - 8,85 1
7,5-8 7 8,85 - 8,9 6
8/3/2019 Case Study Focsa Cristina
13/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
13
8-8,5 3 8,9 - 8,95 9
8,5-9 19 8,95 - 9 19
9,5-10 13 9 - 9,05 11
The next step was to copy and paste special values in the sheet called
HISTOGRAM and to select the frequencies, click Insert - Column Chart, delete
the Series legend, right click on the edge of the graph and choose Select data,
and enter the Intervals for the Horizontal-Axis.
Then I modified some Layout and Design elements, I added a trendline, and
now looks like this:
0
2
4
6
8
10
12
14
16
18
20
6,5-7 7-7,5 7,5-8 8-8,5 8,5-9 9,5-10
F
r
e
qu
e
n
c
y
Baccalaureate Marks grouped in classes
Frequency distribution of the baccalaureate marks
8/3/2019 Case Study Focsa Cristina
14/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
14
d.2) Describe the shape of the distribution based on the histogram.
Our both histograms are one right heaped with a longer tail on the left this is
way this histograms are negatively skewed (and moderately skewed).
e)In which interval is expected that about 95% of the data will fall? Is this
assumption true for this data?
X Y
LL UP freq Percentageof data
LL UP freq Percentageof data
7,65 9,56 42,00 84% 8,88 9,03 45 90%
6,69 10,52 8,00 16% 8,81 9,10718356 5 10%
5,74 11,47 0,00 0% 8,73172466 9,18227534 0 0%
0
2
4
6
8
10
12
14
16
18
20
8,75 - 8,8 8,8 - 8,85 8,85 - 8,9 8,9 - 8,95 8,95 - 9 9 - 9,05
F
r
e
q
u
e
n
c
y
Admision Marks grouped in classes
Frequency distribution of the admision marks
8/3/2019 Case Study Focsa Cristina
15/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
15
total obs 50,00 total obs 50
II Using the Pivot Table Wizard in EXCEL, build a pivot table on your
spreadsheet (using also the second variable). You may have to change the
order of the rows (You should define the intervals first using VLookup
function).
I start by creating a new work sheet called VLookUp which contains the
following columns:
Unitatea
selectoareGender Baccalaureate
Mark
Baccalaureate
Categories
Admision
Mark
Admision
Categories
The next step will be to create 2 table arrays with the intervals and categories foreach mark (baccalaureate and admission):
Using the VLookUp formula I will fill the Categories columns with the values
presented .above:
table array
LL Baccalaureate
6,5 lucky
,25 normal
10 smart
table array
LL Admision
8,75 extreme low
8,98 normal
9,05 extreme high
Unitatea
selectoare
Gender Baccalaureate
Mark
Baccalaureate
Categories
Admision
Mark
Admision
Categories
AG F 6,50 lucky 8,75 extreme low
AG F 6,50 lucky 8,90 extreme low
AG F 6,50 lucky 8,98 extreme low
AR F 6,80 lucky 8,75 extreme low
AG M 7,00 lucky 8,80 extreme low
8/3/2019 Case Study Focsa Cristina
16/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
16
Using the data above, I clicked on the Pivot Table button from the Insert Fieldand I have created the pivot table below:
AG M 7,00 lucky 8,80 extreme low
DJ F 7,50 lucky 8,85 extreme low
AG F 7,50 lucky 8,85 extreme low
AG M 7,80 lucky 8,93 extreme low
CS M 7,80 lucky 8,93 extreme low
BR M 7,90 lucky 8,95 extreme low
AG M 7,90 lucky 8,95 extreme low
SV F 8,00 lucky 8,90 extreme low
BT F 8,00 lucky 8,90 extreme low
AG M 8,00 lucky 8,90 extreme low
BT F 8,30 normal 8,98 extreme low
BZ M 8,40 normal 9,00 normal
AG M 8,40 normal 9,00 normal
GJ F 8,70 normal 9,00 normal
PH M 8,70 normal 9,00 normalVL M 8,70 normal 9,00 normal
AG F 8,80 normal 9,03 normal
BC F 8,90 normal 8,98 extreme low
AG M 8,90 normal 8,98 extreme low
OT M 8,90 normal 9,05 normal
DJ F 8,90 normal 9,05 extreme high
PH F 9,00 normal 8,93 extreme low
OT F 9,00 normal 8,93 extreme low
PH F 9,00 normal 8,93 extreme low
BC M 9,00 normal 8,93 extreme low
BZ M 9,00 normal 9,00 normal
AG M 9,00 normal 9,00 normal
CT M 9,00 normal 9,00 normal
GJ M 9,00 normal 9,00 normal
MH M 9,00 normal 9,00 normal
AG F 9,00 normal 9,00 normal
MM F 9,00 normal 9,00 normal
BZ M 9,10 normal 8,95 extreme low
DJ M 9,10 normal 8,95 extreme low
SV M 9,20 normal 8,90 extreme low
GJ F 9,20 normal 8,98 extreme lowNT M 9,30 normal 8,93 extreme low
DJ F 9,70 normal 9,03 normal
BZ M 9,80 normal 9,05 extreme high
VN F 9,80 normal 9,05 extreme high
BZ M 9,80 normal 9,05 extreme high
GJ M 10,00 smart 9,03 normal
SV M 10,00 smart 9,03 normal
IF F 10,00 smart 9,03 normal
DB M 10,00 smart 9,03 normal
8/3/2019 Case Study Focsa Cristina
17/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
17
Admision
Categories(All)
Baccalaure
ate
Categories
(All)
Count of
Gender
Admisi
on
Mark
Unitatea
selectoare9,05
9,
0
5
9,
0
3
9,
0
3
9,
0
0
8,
9
8
8,
9
8
8,
9
5
8,
9
5
8,
9
3
8,
9
3
8,
9
0
8,
9
0
8,
8
5
8,
8
5
8,
8
0
8,
7
5
Gran
d
Total
AG 1 3 2 1 1 1 1 1 2 1 14
AR 1 1
BC 1 1 2BR 1 1
BT 1 1 2
BZ 2 2 1 5
CS 1 1
CT 1 1
DB 1 1
DJ 1 1 1 1 4
GJ 1 2 1 4
IF 1 1
MH 1 1MM 1 1
NT 1 1
OT 1 1 2
PH 1 2 3
SV 1 1 1 3
VL 1 1
VN 1 1
Grand Total 4 1 3 31
21 4 2 2 5 2 3 2 1 1 2 2 50
The pivot tables shows us:
- the most students who were enrolled this year came from ARGES county
(14 students from AG);
- the most students who were enrolled this year passed the admission exam
with 9,00 (12 students). Notice that we can make the same observations
8/3/2019 Case Study Focsa Cristina
18/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
18
for the baccalaureate marks too if we switch Admission Mark with
Baccalaureate Mark in the Column Labels section of the pivot table.
- By choosing the from report filter the extreme high admission
category we can find which mark was the extreme high mark at the
admission exam, how many students took it and from which counties:
Admision Categories extreme high
Baccalaureate Categories (All)
Count of Gender Admision Mark
Unitatea selectoare 9,05Grand
Total
BZ 2 2
DJ 1 1
VN 1 1
Grand Total 4 4
III Calculate the regression line and interpret and test the regression
coefficients, coefficient of determination and coefficient of correlation.
Interpret the results.
a) Calculate the regression line
The regression line is described as: y = a+ bx, can be computed like this:
430.3 =50 a+ 447.85 b
3856.865 = 447.85 a + 3747.95b
a= (430.3-447.85 b)/50
3856.865=447.85(430.3-447.85 b)/50 +3747.95
b=0,059
8/3/2019 Case Study Focsa Cristina
19/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
19
a= 8,444
Coefficients
Intercept 8,44436615
Baccalaureate
Mark 0,059567029
The a parameter is 8,444 and it represents the intercept of the regression
function and it does not have any economic significance. Geometrically, it is
the point where the regression line intersects OX axis.
The b parameter 0,059 is the slope of the regression line and it is called
regression coefficient. Because it is positive we can say that the relationship
between the two marks is a positive relationship. This parameter shows the fact
that when the Baccalaureate makrs increase by 0,5 points, the Admision Marks
increases by 0,059 points.
b) Test the regression coefficients, coefficient of determination and
coefficient of correlation.
Regression Statistics
y = 0,059x + 8,444
R = 0,575
r=0,758
8.7
8.8
8.9
9
9.1
6.5 7 7.5 8 8.5 9 9.5 10
A
d
m
i
s
i
o
n
M
a
r
k
s
BAC grades
Bac grades vs
Admision Mark
8/3/2019 Case Study Focsa Cristina
20/20
Student: FOCSA CRISTINA
Master program: International Project Management
Course: MANAGERIAL DATA ANALYSIS
Multiple R 0,758398211
R Square 0,575167847
AdjustedR
Square 0,566317177
Standard Error 0,049451391
Observations 50
The coefficient of correlation
r =
[ [ = 0,758
R = 0,575
We can say by interpreting the coefficient of correlation (Multiple R =
0,758398211) that we have a relationship between the two variable and the fact
that the coefficient of correlation is very close to 1 leads to the conclusion that
between the baccalaureat marks and the admision one is a strong relationship(75%).
The coefficient of determination
R square is 0,575167847, meaning that 57.51 % of the variation of the
Admission Marks can be explained by the variation of the Baccalaureate Marks
and rest of percentage by the variation of other factors.
The Adjusted R Square value is 0,566317177, meaning that 56% of the
evolution of the Admission Marks can be explained by the regression model
y = 0,059x + 8,444