Case Study Focsa Cristina

Embed Size (px)

Citation preview

  • 8/3/2019 Case Study Focsa Cristina

    1/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    1

    CASE STUDY

    Record 4060 data for two statistical variables (X and Y) at your choice:

    Admision - sesion July 2011

    "Alexandru Ioan Cuza" Police Academy

    List of admitted candidates

    Police - Law

    X -

    independent

    variable

    Y -

    dependen

    t variable

    Selection

    unit

    (county)

    NUME PRENUME GenderBaccalaureate

    Mark

    Admision

    Mark

    AG GHINESCU ANDREEA-FLORENTINA F 6,50 8,98

    AG MORARU ANCA-NICOLETA F 6,50 8,90

    AG ALEXE SIDONIA-CRESCENZIA F 6,50 8,75

    AR RADU LUIZA-CLAUDIA F 6,80 8,75

    AG ARSENE STEFAN-ALEXANDRU M 7,00 8,80

    AG FLORESCU IONUT-ANDREI M 7,00 8,80

    DJ MARCULESCU ROXANA F 7,50 8,85

    AG TRACHE CAMELIA-ELENA F 7,50 8,85

    AG CALUGAROIU LIVIU-MARIAN M 7,80 8,93

    CS TEODORESCU SILVIU-PETRU M 7,80 8,93

    BR STOILESCU FLORIAN M 7,90 8,95

    AG ANITA ALEXANDRU M 7,90 8,95

    SV DUMITRAS CORINA-LAVINIA F 8,00 8,90BT ISTRATE DANIELA-ANDREEA F 8,00 8,90

    AG CIOBANU DANIEL-VALENTIN M 8,00 8,90

    BT GABOR ANDREEA-CATALINA F 8,30 8,98

    BZ OPREA ALEXANDRU-ION M 8,40 9,00

    AG IORDACHESCU MIHAI-CIPRIAN M 8,40 9,00

    GJ DRAGOESCU LAVINIA F 8,70 9,00

    PH MOCANU RAZVAN-DANIEL M 8,70 9,00

    VL TURCU GEORGE-IONUT M 8,70 9,00

  • 8/3/2019 Case Study Focsa Cristina

    2/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    2

    AG VASILE ROXANA-MARIA F 8,80 9,03

    BC TIRON RADU-MARIAN M 8,90 9,05

    AG MINCA IOANA-CATALINA F 8,90 9,05

    OT CIOBANU MARIANA F 8,90 8,98

    DJ BUTOI FLORIAN-COSMIN M 8,90 8,98

    PH BEZNEA MIHAI-FLORIN M 9,00 9,00

    OT ANDOR VLAD-CRISTIAN M 9,00 9,00

    PH CONSTANTIN MADALIN M 9,00 9,00

    BC SPANU CONSTANTIN M 9,00 9,00

    BZ RACOREANU COSMIN-LAURENTIU M 9,00 9,00

    AG ANGHELOIU LOREDANA-ELENA F 9,00 9,00

    CT PETCULESCU ELENA-ADELINA F 9,00 9,00

    GJ DRAGOTA LARISA-PETRUTA F 9,00 8,93

    MH POPA IOANA F 9,00 8,93

    AG PATRU ANDREEA-GEORGIANA F 9,00 8,93

    MM DRAGOMIR RADU-STEFAN M 9,00 8,93

    BZ DIMOFTE CRISTIAN-DANIEL M 9,10 8,95

    DJ VOICULESCU ROBERT-CRISTIAN M 9,10 8,95

    SV VAMANU IONELA-DANIELA F 9,20 8,98

    GJ OGORANU IONUT-ADRIAN M 9,20 8,90

    NT PETRESCU TEDY-FLORIN M 9,30 8,93

    DJ RADUT ROXANA-FLORENTINA F 9,70 9,03

    BZ BESNEA CATALIN-GEORGE M 9,80 9,05

    VN TANASE CATALIN-TITEL F 9,80 9,05

    BZ MATEI RAZVAN-COSMIN M 9,80 9,05

    GJ GRECU FLAVIUS-ANDREI M 10,00 9,03

    SV IVANUTA LAURA-VASILICA F 10,00 9,03

    IF SIMA MARIAN-IONUT M 10,00 9,03

    DB ATANASIE ALIN-IONUT M 10,00 9,03

    Source:

    http://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdf

    http://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdfhttp://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdfhttp://www.academiadepolitie.ro/old/Facdepol/admitere/2011/rezultate/politie_drept%20-%20admisi.pdf
  • 8/3/2019 Case Study Focsa Cristina

    3/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    3

    I For each of the two variables:

    a) Calculate and interpret the average, standard deviation and the coefficient

    of variation for row data. Interpret the results. Is the data series homogenous?

    a.1) Average (or mean) is the arithmetic average of the scores (Baccalaureate

    Mark and Admission Mark), also the average is a measure of central tendency

    (a parameter enabling the researcher to determine the average score of a group

    of scores).

    In order to give an answer to this question and to offer an interpretation I have to

    understand the relationship between the average, the median and the mode and

    then to interpret the skewness parameter for the both cases (X and Y).

    Firstly, by choosing the Descriptive Statistics tool from the Data Analysis

    toolpack provided by MS EXCEL, I will have an overview picture of the

    measures of central tendency, dispersion of data, skew of data and the kurtosis

    of data.

    Descriptive statistics will help us to examine:

    1. central tendency (location) of data, i.e. where data tend to fall, as

    measured by the mean, median, and mode.

    2. dispersion (variability) of data, i.e. how spread out data are, as measured

    by the variance and its square root, the standard deviation.

    3. skew (symmetry) of data, i.e. how concentrated data are at the low or high

    end of the scale, as measured by the skew index.

    4. kurtosis (peakedness) of data, i.e. how concentrated data are around a

    single value, as measured by the kurtosis index.

    Baccalaureate Mark Admision Mark

    Mean 8,6060 Mean 8,9570

    Standard Error 0,1352 Standard Error 0,0106

    Median 8,9000 Median 8,9750

    Mode 9,0000 Mode 9,0000

    Standard Deviation 0,9561 Standard Deviation 0,0751

    Sample Variance 0,9140 Sample Variance 0,0056

    Kurtosis -0,1215 Kurtosis 0,9812Skewness -0,6893 Skewness -1,1143

  • 8/3/2019 Case Study Focsa Cristina

    4/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    4

    Range 3,5000 Range 0,3000

    Minimum 6,5000 Minimum 8,7500

    Maximum 10,0000 Maximum 9,0500

    Sum 430,3000 Sum 447,8500Count 50,0000 Count 50,0000

    Largest(1) 10,0000 Largest(1) 9,0500

    Smallest(1) 6,5000 Smallest(1) 8,7500

    Confidence Level(95,0%) 0,2717

    Confidence

    Level(95,0%) 0,0213

    So, I computed the average of X and Y as follows:

    X Y

    AVERAGE

    X

    AVERAGE

    Y

    8,61 8,96

    Comparing the averages computed (X and Y) and those provided by the

    Descriptive Statistics tool, I will find them identical.

    In both cases the average () is different from their median and mode, meaning

    that the arrays (X and Y ) arent normal distributed data.

    Comparing the average with the median and mode, I can see that:

    average < median < mode in the both cases (X: 8,60 < 8,90 < 9,00) and

    (Y: 8,95 < 8,97 < 9,00),

    so this 3 parameters show us that the distributions of data in our datas are non-

    normal distributions (skewed distributions) and, in this case, I have for X and Y

    a non-bell-shaped distribution of scores.

    Looking at the Skewness parameter provided by the Descriptive Statistics tool, I

    see that the Skewness is negative in both cases, so both of the arrays (X and Y )

    are negatively skewed or skewed left, meaning that the left tail is longer.

  • 8/3/2019 Case Study Focsa Cristina

    5/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    5

    As the skewness in the Y array is between 1 and ( -0,6893), the distribution

    is moderately skewed.

    As the skewness in the X array is less than 1 (-1,1143), the distribution ishighly skewed.

    If the data is very skewed, then the arithmetic mean might become misleading,

    so we can conclude that in the Y data the average is not a good parameter to

    measure the central tendency, while in the X data, the average might be taken

    into consideration when talking about the central tendency.

    a.2) Standard deviation is the square root of variance providing an index ofvariability in the distribution of scores, also the standard deviation is a measure

    of variability (a parameter enabling the researcher to indicate how spread out a

    group of scores are).

    Computation in excel:

    Total No. of

    observations

    = N

    Standard

    Deviation

    X

    (X -

    X)

    variance

    of X

    =

    standard

    dev of X

    Standard

    Deviation

    Y

    (Y -

    Y)

    variance

    of Y

    =

    standard

    dev of Y

    50 0,9561 4,4352 0,9140 0,9561 0,0751 0,0003 0,0056 0,0751

    4,4352 0,0032

    4,4352 0,0428

    3,2616 0,0428

    2,5792 0,0246

    2,5792 0,0246

    1,2232 0,0114

    1,2232 0,0114

    0,6496 0,0010

    0,6496 0,0010

    0,4984 0,0000

    0,4984 0,0000

    0,3672 0,0032

    0,3672 0,0032

    0,3672 0,0032

    0,0936 0,0003

  • 8/3/2019 Case Study Focsa Cristina

    6/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    6

    0,0424 0,0018

    0,0424 0,0018

    0,0088 0,0018

    0,0088 0,0018

    0,0088 0,0018

    0,0376 0,0046

    0,0864 0,0086

    0,0864 0,0086

    0,0864 0,0003

    0,0864 0,0003

    0,1552 0,0018

    0,1552 0,0018

    0,1552 0,0018

    0,1552 0,0018

    0,1552 0,0018

    0,1552 0,0018

    0,1552 0,0018

    0,1552 0,0010

    0,1552 0,0010

    0,1552 0,0010

    0,1552 0,0010

    0,2440 0,0000

    0,2440 0,0000

    0,3528 0,0003

    0,3528 0,0032

    0,4816 0,0010

    1,1968 0,0046

    1,4256 0,0086

    1,4256 0,0086

    1,4256 0,0086

    1,9432 0,0046

    1,9432 0,0046

    1,9432 0,0046

    1,9432 0,0046

  • 8/3/2019 Case Study Focsa Cristina

    7/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    7

    Comparing standard deviation computed ( X and Y) and those provided by

    the Descriptive Statistics tool, I will find them identical.

    I have X= 0,9561 and Y= 0,0751.

    The values computed are not so relevant measure of dispersion in their

    relationship with the average because in both cases (X and Y) I have non-bell-

    shaped distributions, as I have negatively skewed or skewed left distributions.

    Even though, by comparing the two values of the standard deviation, I can see

    that X is bigger than Y even if the Y average is bigger than the X average.

    With this information I conclude by saying that the X array has a great variety ofvariables while the Y array has every variable proximate to the Y average.

    a.3) Coefficient of variation: measures relative dispersion.

    I have chosen to express the coefficient of variation in percentage and the values

    computed are:

    Coefficient of variation

    CVx= x

    / X

    CVy= y /

    Y

    11% 1%

    The coefficient of variation values certify what I concluded in the interpretation

    of the standard deviation values. Once again, I can say that Y has a lower

    relative variability than X.

    a.4) Is the data series homogenous?

    Homogeneity measures the differences or similarities between the several

    variables.

  • 8/3/2019 Case Study Focsa Cristina

    8/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    8

    Having a 11% coefficient of variation in the first array, I can say that

    homogeneity is low while the 1% coefficient of variation show us that the

    second array has a high homogeneity.

    b) Summarize the data in an appropriate number of classes. Construct the

    frequency distribution.

    b.1) To solve this point, I have to identify the lowest and highest values in the

    list for X and Y, so I compute the min and max:

    Min X Max X Min Y Max Y

    6,50 10,00 8,75 9,05

    Secondly, I have to compute the Range (Maximum Value Minimum Value)

    for each variable (X, Y):

    Range X Range Y

    3,50 0,30

    Thirdly, I compute the number of classes by using the grouping rule:

    > N.

    In our case, I have N (no. of observations) =50 and Ill have k=6 classes.

    For identifying the exact classes I have to divide each ranges by k (Lx= 0, 5 and

    Ly: 0, 05) to establish the length of the interval:

    For X: For Y:

    LL UL

    6,500 7,0007,000 7,500

  • 8/3/2019 Case Study Focsa Cristina

    9/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    9

    b.2) Frequency distribution for X:

    LL ULMidpointmarks =

    mxi

    Nostudents=

    xfi

    6,500 7,000 6,750 6

    7,000 7,500 7,250 2

    7,500 8,000 7,750 7

    8,000 8,500 8,250 3

    8,500 9,000 8,750 19

    9,500 10,000 9,750 13

    Total 50

    Frequency distribution for Y:

    LL ULMidpointmarks =

    myi

    Nostudents=

    yfi

    8,750 8,800 8,775 4

    8,800 8,850 8,825 1

    8,850 8,900 8,875 6

    8,900 8,950 8,925 9

    8,950 9,000 8,975 19

    9,000 9,050 9,025 11

    total 50

    c)Calculate and interpret for the frequency distribution the average, standard

    deviation and coefficient of variance. Compare with the results from point a).

    Explain the differences.

    c.1) Average of the frequency distribution ():

    LL ULMidpointmarks =

    No students=xfi

    mxi*xfi X

    7,500 8,000

    8,000 8,500

    8,500 9,000

    9,500 10,000

    LL UL

    8,750 8,800

    8,800 8,850

    8,850 8,900

    8,900 8,950

    8,950 9,000

    9,000 9,050

  • 8/3/2019 Case Study Focsa Cristina

    10/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    10

    mxi

    6,500 7,000 6,750 6 40,500 8,540

    7,000 7,500 7,250 2 14,500

    7,500 8,000 7,750 7 54,250

    8,000 8,500 8,250 3 24,750

    8,500 9,000 8,750 19 166,250

    9,500 10,000 9,750 13 126,750

    Total 50 427,000

    LL ULMidpointmarks =

    myi

    Nostudents=

    yfi

    myi*yfi Y

    8,750 8,800 8,775 4 35,100 8,946

    8,800 8,850 8,825 1 8,825

    8,850 8,900 8,875 6 53,250

    8,900 8,950 8,925 9 80,325

    8,950 9,000 8,975 19 170,525

    9,000 9,050 9,025 11 99,275

    total 50 447,300

    c.2) Standard deviation of the frequency distribution ():

    LL ULMidpointmarks =

    mxi

    Nostudents=

    xfimxi*xfi X

    xfi(mxi-X)

    Standard

    Deviation

    X

    6,500 7,000 6,750 6 40,500 8,540 19,225 0,970

    7,000 7,500 7,250 2 14,500 3,328

    7,500 8,000 7,750 7 54,250 4,369

    8,000 8,500 8,250 3 24,750 0,252

    8,500 9,000 8,750 19 166,250 0,838

    9,500 10,000 9,750 13 126,750 19,033

    Total 50 427,000 47,045

  • 8/3/2019 Case Study Focsa Cristina

    11/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    11

    LL ULMidpointmarks =

    myi

    Nostudents=

    yfimyi*yfi Y

    yfi(myi-

    Y)

    Standard

    Deviation

    Y

    8,750 8,800 8,775 4 35,100 8,946 0,117 0,071

    8,800 8,850 8,825 1 8,825 0,015

    8,850 8,900 8,875 6 53,250 0,030

    8,900 8,950 8,925 9 80,325 0,004

    8,950 9,000 8,975 19 170,525 0,016

    9,000 9,050 9,025 11 99,275 0,069

    total 50 447,300 0,250

    c.3) Coefficient of variance of the frequency distribution (CV):

    LL ULMidpointmarks =

    mxi

    Nostudents=

    xfimxi*xfi X

    xfi(mxi-

    X)

    Standard

    Deviation

    X

    CVx=

    x / X

    6,500 7,000 6,750 6 40,500 8,540 19,225 0,970 11%

    7,000 7,500 7,250 2 14,500 3,328

    7,500 8,000 7,750 7 54,250 4,369

    8,000 8,500 8,250 3 24,750 0,252

    8,500 9,000 8,750 19 166,250 0,838

    9,500 10,000 9,750 13 126,750 19,033

    Total 50 427,000 47,045

    LL ULMidpointmarks =

    myi

    Nostudents=

    yfimyi*yfi Y

    yfi(myi-

    Y)

    Standard

    Deviation

    Y

    CVy=

    y /

    Y

    8,750 8,800 8,775 4 35,100 8,946 0,117 0,071 1%

    8,800 8,850 8,825 1 8,825 0,015

    8,850 8,900 8,875 6 53,250 0,030

    8,900 8,950 8,925 9 80,325 0,004

    8,950 9,000 8,975 19 170,525 0,016

    9,000 9,050 9,025 11 99,275 0,069

  • 8/3/2019 Case Study Focsa Cristina

    12/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    12

    total 50 447,300 0,250

    c.4) Compare with the results from point a). Explain the differences.

    Values computed for the row data: Values computed for the grouped data:

    Average_X 8,61

    Average_Y 8,96

    Standard_deviation_X 0,9561

    Standard_deviation_Y: 0,0751

    Coefficient_of_variation_X 11%

    Coefficient_of_variation_Y 1%

    Average_X 8,64

    Average_Y 8,94

    Standard_deviation_X 0,970

    Standard_deviation_Y: 0,071

    Coefficient_of_variation_X 11%

    Coefficient_of_variation_Y 1%

    There are small differences between the average and the standard deviationparameters computed for the row data and those computed for the grouped data

    and the values resulted in the second place are more accurate because I interpret

    them by grouping the arrays and narrow the errors that might occur.

    d) Construct a histogram and describe the shape of the distribution based on

    the histogram.

    d.1) Construct a histogram:

    Firstly, Ive created the intervals (bins) by using the CONCATENATE function,

    to merge the low limit with the upper limit and then Ive copied the frequencies

    as follows (I have chosen to do this way because its more elegant than using the

    Histogram Tool from the Data Analysis Tool pack):

    HYSTOGRAM X HYSTOGRAM Y

    Intervals Xfi Intervals Yfi

    6,5-7 6 8,75 - 8,8 4

    7-7,5 2 8,8 - 8,85 1

    7,5-8 7 8,85 - 8,9 6

  • 8/3/2019 Case Study Focsa Cristina

    13/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    13

    8-8,5 3 8,9 - 8,95 9

    8,5-9 19 8,95 - 9 19

    9,5-10 13 9 - 9,05 11

    The next step was to copy and paste special values in the sheet called

    HISTOGRAM and to select the frequencies, click Insert - Column Chart, delete

    the Series legend, right click on the edge of the graph and choose Select data,

    and enter the Intervals for the Horizontal-Axis.

    Then I modified some Layout and Design elements, I added a trendline, and

    now looks like this:

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    6,5-7 7-7,5 7,5-8 8-8,5 8,5-9 9,5-10

    F

    r

    e

    qu

    e

    n

    c

    y

    Baccalaureate Marks grouped in classes

    Frequency distribution of the baccalaureate marks

  • 8/3/2019 Case Study Focsa Cristina

    14/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    14

    d.2) Describe the shape of the distribution based on the histogram.

    Our both histograms are one right heaped with a longer tail on the left this is

    way this histograms are negatively skewed (and moderately skewed).

    e)In which interval is expected that about 95% of the data will fall? Is this

    assumption true for this data?

    X Y

    LL UP freq Percentageof data

    LL UP freq Percentageof data

    7,65 9,56 42,00 84% 8,88 9,03 45 90%

    6,69 10,52 8,00 16% 8,81 9,10718356 5 10%

    5,74 11,47 0,00 0% 8,73172466 9,18227534 0 0%

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    8,75 - 8,8 8,8 - 8,85 8,85 - 8,9 8,9 - 8,95 8,95 - 9 9 - 9,05

    F

    r

    e

    q

    u

    e

    n

    c

    y

    Admision Marks grouped in classes

    Frequency distribution of the admision marks

  • 8/3/2019 Case Study Focsa Cristina

    15/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    15

    total obs 50,00 total obs 50

    II Using the Pivot Table Wizard in EXCEL, build a pivot table on your

    spreadsheet (using also the second variable). You may have to change the

    order of the rows (You should define the intervals first using VLookup

    function).

    I start by creating a new work sheet called VLookUp which contains the

    following columns:

    Unitatea

    selectoareGender Baccalaureate

    Mark

    Baccalaureate

    Categories

    Admision

    Mark

    Admision

    Categories

    The next step will be to create 2 table arrays with the intervals and categories foreach mark (baccalaureate and admission):

    Using the VLookUp formula I will fill the Categories columns with the values

    presented .above:

    table array

    LL Baccalaureate

    6,5 lucky

    ,25 normal

    10 smart

    table array

    LL Admision

    8,75 extreme low

    8,98 normal

    9,05 extreme high

    Unitatea

    selectoare

    Gender Baccalaureate

    Mark

    Baccalaureate

    Categories

    Admision

    Mark

    Admision

    Categories

    AG F 6,50 lucky 8,75 extreme low

    AG F 6,50 lucky 8,90 extreme low

    AG F 6,50 lucky 8,98 extreme low

    AR F 6,80 lucky 8,75 extreme low

    AG M 7,00 lucky 8,80 extreme low

  • 8/3/2019 Case Study Focsa Cristina

    16/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    16

    Using the data above, I clicked on the Pivot Table button from the Insert Fieldand I have created the pivot table below:

    AG M 7,00 lucky 8,80 extreme low

    DJ F 7,50 lucky 8,85 extreme low

    AG F 7,50 lucky 8,85 extreme low

    AG M 7,80 lucky 8,93 extreme low

    CS M 7,80 lucky 8,93 extreme low

    BR M 7,90 lucky 8,95 extreme low

    AG M 7,90 lucky 8,95 extreme low

    SV F 8,00 lucky 8,90 extreme low

    BT F 8,00 lucky 8,90 extreme low

    AG M 8,00 lucky 8,90 extreme low

    BT F 8,30 normal 8,98 extreme low

    BZ M 8,40 normal 9,00 normal

    AG M 8,40 normal 9,00 normal

    GJ F 8,70 normal 9,00 normal

    PH M 8,70 normal 9,00 normalVL M 8,70 normal 9,00 normal

    AG F 8,80 normal 9,03 normal

    BC F 8,90 normal 8,98 extreme low

    AG M 8,90 normal 8,98 extreme low

    OT M 8,90 normal 9,05 normal

    DJ F 8,90 normal 9,05 extreme high

    PH F 9,00 normal 8,93 extreme low

    OT F 9,00 normal 8,93 extreme low

    PH F 9,00 normal 8,93 extreme low

    BC M 9,00 normal 8,93 extreme low

    BZ M 9,00 normal 9,00 normal

    AG M 9,00 normal 9,00 normal

    CT M 9,00 normal 9,00 normal

    GJ M 9,00 normal 9,00 normal

    MH M 9,00 normal 9,00 normal

    AG F 9,00 normal 9,00 normal

    MM F 9,00 normal 9,00 normal

    BZ M 9,10 normal 8,95 extreme low

    DJ M 9,10 normal 8,95 extreme low

    SV M 9,20 normal 8,90 extreme low

    GJ F 9,20 normal 8,98 extreme lowNT M 9,30 normal 8,93 extreme low

    DJ F 9,70 normal 9,03 normal

    BZ M 9,80 normal 9,05 extreme high

    VN F 9,80 normal 9,05 extreme high

    BZ M 9,80 normal 9,05 extreme high

    GJ M 10,00 smart 9,03 normal

    SV M 10,00 smart 9,03 normal

    IF F 10,00 smart 9,03 normal

    DB M 10,00 smart 9,03 normal

  • 8/3/2019 Case Study Focsa Cristina

    17/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    17

    Admision

    Categories(All)

    Baccalaure

    ate

    Categories

    (All)

    Count of

    Gender

    Admisi

    on

    Mark

    Unitatea

    selectoare9,05

    9,

    0

    5

    9,

    0

    3

    9,

    0

    3

    9,

    0

    0

    8,

    9

    8

    8,

    9

    8

    8,

    9

    5

    8,

    9

    5

    8,

    9

    3

    8,

    9

    3

    8,

    9

    0

    8,

    9

    0

    8,

    8

    5

    8,

    8

    5

    8,

    8

    0

    8,

    7

    5

    Gran

    d

    Total

    AG 1 3 2 1 1 1 1 1 2 1 14

    AR 1 1

    BC 1 1 2BR 1 1

    BT 1 1 2

    BZ 2 2 1 5

    CS 1 1

    CT 1 1

    DB 1 1

    DJ 1 1 1 1 4

    GJ 1 2 1 4

    IF 1 1

    MH 1 1MM 1 1

    NT 1 1

    OT 1 1 2

    PH 1 2 3

    SV 1 1 1 3

    VL 1 1

    VN 1 1

    Grand Total 4 1 3 31

    21 4 2 2 5 2 3 2 1 1 2 2 50

    The pivot tables shows us:

    - the most students who were enrolled this year came from ARGES county

    (14 students from AG);

    - the most students who were enrolled this year passed the admission exam

    with 9,00 (12 students). Notice that we can make the same observations

  • 8/3/2019 Case Study Focsa Cristina

    18/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    18

    for the baccalaureate marks too if we switch Admission Mark with

    Baccalaureate Mark in the Column Labels section of the pivot table.

    - By choosing the from report filter the extreme high admission

    category we can find which mark was the extreme high mark at the

    admission exam, how many students took it and from which counties:

    Admision Categories extreme high

    Baccalaureate Categories (All)

    Count of Gender Admision Mark

    Unitatea selectoare 9,05Grand

    Total

    BZ 2 2

    DJ 1 1

    VN 1 1

    Grand Total 4 4

    III Calculate the regression line and interpret and test the regression

    coefficients, coefficient of determination and coefficient of correlation.

    Interpret the results.

    a) Calculate the regression line

    The regression line is described as: y = a+ bx, can be computed like this:

    430.3 =50 a+ 447.85 b

    3856.865 = 447.85 a + 3747.95b

    a= (430.3-447.85 b)/50

    3856.865=447.85(430.3-447.85 b)/50 +3747.95

    b=0,059

  • 8/3/2019 Case Study Focsa Cristina

    19/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    19

    a= 8,444

    Coefficients

    Intercept 8,44436615

    Baccalaureate

    Mark 0,059567029

    The a parameter is 8,444 and it represents the intercept of the regression

    function and it does not have any economic significance. Geometrically, it is

    the point where the regression line intersects OX axis.

    The b parameter 0,059 is the slope of the regression line and it is called

    regression coefficient. Because it is positive we can say that the relationship

    between the two marks is a positive relationship. This parameter shows the fact

    that when the Baccalaureate makrs increase by 0,5 points, the Admision Marks

    increases by 0,059 points.

    b) Test the regression coefficients, coefficient of determination and

    coefficient of correlation.

    Regression Statistics

    y = 0,059x + 8,444

    R = 0,575

    r=0,758

    8.7

    8.8

    8.9

    9

    9.1

    6.5 7 7.5 8 8.5 9 9.5 10

    A

    d

    m

    i

    s

    i

    o

    n

    M

    a

    r

    k

    s

    BAC grades

    Bac grades vs

    Admision Mark

  • 8/3/2019 Case Study Focsa Cristina

    20/20

    Student: FOCSA CRISTINA

    Master program: International Project Management

    Course: MANAGERIAL DATA ANALYSIS

    Multiple R 0,758398211

    R Square 0,575167847

    AdjustedR

    Square 0,566317177

    Standard Error 0,049451391

    Observations 50

    The coefficient of correlation

    r =

    [ [ = 0,758

    R = 0,575

    We can say by interpreting the coefficient of correlation (Multiple R =

    0,758398211) that we have a relationship between the two variable and the fact

    that the coefficient of correlation is very close to 1 leads to the conclusion that

    between the baccalaureat marks and the admision one is a strong relationship(75%).

    The coefficient of determination

    R square is 0,575167847, meaning that 57.51 % of the variation of the

    Admission Marks can be explained by the variation of the Baccalaureate Marks

    and rest of percentage by the variation of other factors.

    The Adjusted R Square value is 0,566317177, meaning that 56% of the

    evolution of the Admission Marks can be explained by the regression model

    y = 0,059x + 8,444