SPAD7 Data Miner Guide.pdf

Embed Size (px)

Citation preview

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    1/176

    22 quai gallieni - 92150 Suresnes - France

    Tl : +33 1 57 32 60 60- Fax : +33 1 5732 62 [email protected] www.coheris.comSiret : 39946792700105 - APE : 5829CRegister number training: 11-92-1522492

    DATA MINERGUIDE

    Descriptive Statistics - Factorial Analyses - Clustering

    Linear Models Discriminant Analyses

    Scoring Decision Trees

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    2/176

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    3/176

    3

    Table of contents

    DESCRIPTIVE STATISTICS WITH SPAD 4

    STATS - MARGINAL DISTRIBUTIONS,HISTOGRAMS 5

    DEMODAUTOMATIC CHARACTERIZATION OF A QUALITATIVE VARIABLE 16DESCO - AUTOMATIC CHARACTERIZATION OF A CONTINUOUS VARIABLE 21

    TABLE - CROSS TABLES 25

    BIVAR - BIVARIATE ANALYSIS 28

    FACTORIAL ANALYSES WITH SPAD 30

    PCA - PRINCIPAL COMPONENT ANALYSIS 32

    SCA - SIMPLE CORRESPONDENCE ANALYSIS 45

    MCA - MULTIPLE CORRESPONDENCE ANALYSIS 50

    CLUSTERING WITH SPAD 62

    RECIP/SEMIS - CLUSTERING ON FACTORS SCORES 63

    PARTI-DECLA- CUT OF THE TREE AND CLUSTERS DESCRIPTION 69

    CLASS-MINER - CLUSTERS DESCRIPTION 78

    ESCAL - STORING THE FACTORIAL AXES AND THE PARTITIONS 79

    THE LINEAR MODEL AND ITS APPLICATIONS 80

    REGRESSION AND ANALYSIS OF VARIABCE, GENERAL LINEAR MODEL 80

    OPTIMAL REGRESSIONS RESEARCH 85

    LOGISTIC REGRESSION 94

    THE DISCRIMINANT AND ITS METHODS 105

    FUWILD - OPTIMAL DISCRIMINANT ANALYSIS 105DIS2GD - LINEAR DISCRIMINANT ANALYSIS BASED ON CONTINUOUS VARIABLES 117

    DIS2GFP - LINEAR DISCRIMINANT ANALYSIS BASED ON PRINCIPAL FACTORS 126

    DISCO - DISCRIMINANT ANALYSIS BASED ON QUALITATIVE VARIABLES 134

    SCORE - SCORING FUNCTION 134IDT1 - INTERACTIVE DECISION TREE 1 154IDT2 - INTERACTIVE DECISION TREE 2 154

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    4/176

    4

    DESCRIPTIVE STATISTICS WITH SPAD

    STATS: marginal distributions, histograms, matrix plot, box plot

    DEMOD: automatic characterization of a qualitative variable

    DESCO: automatic characterization of a continuous variable

    TABLE: Crossed tables

    BIVAR: Bivariate analysis

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    5/176

    Descriptive Statistics with SPAD

    5

    STATS - MARGINAL DISTRIBUTIONS,HISTOGRAMS

    This procedure supplies a rapid and automatic description of your nominal andcontinuous variables.

    The Survey.sbabase is an opinion survey file, which will be used for this example. The file is

    supplied with the application and installed automatically on your PC.

    SET THE PARAMETERS FOR A METHOD

    Before it can be executed, a method must have its parameters set.

    To access the parameter settings of a method, right click on the method then on the Set themethod command or double-click on the method icon.

    The rules for calculation and parameter settings of each of the methods are available on line.

    The Cases, Weighting and Parameters tabs are available for almost all SPAD methods.

    Cases: the Cases tab lets you select the cases used for the method

    Weighting: the weighting tab allows you to adjust the distribution of the cases in the sampleParameters: options and settings of the method

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    6/176

    STATS - marginal distributions, Histograms

    6

    The Cases tab

    The Cases tab lets you select the cases with one of the following methods:

    All the available cases One or more logical filters (selection criteria combined with AND/OR)

    A name list of cases A selection made in one or more intervals Random draw

    Apply a logical filter

    In case of error, you can delete an expression from the filter by selecting the expression to discard,

    and click on Delete.

    The cases satisfying the filter are considered as active, while the others are supplementary.

    Select the individuals from a list

    Click on Logical filterSelect the chosen

    variable

    Click on the operator

    Click on

    Validate

    Global Definition

    of the filter

    Click on the

    operand

    Select the chosen

    method by List

    Choose your cases in the Availablelistand

    use the transfer buttons to select them.

    Select the statusof

    the cases

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    7/176

    Descriptive Statistics with SPAD

    7

    Select cases by interval

    You can save the definition of the selection made, by clicking on the Savebutton. This allows you

    to re-use it later.

    Do a Random Draw

    This selection lets you apply the method to a sample before applying it to the entire SPAD base.

    It also lets you, by executing the same method several times, after having taken the precaution tochange the number of preliminary request, to test the stability of the results of the method.

    Indicate the number of preliminary

    requests for the random draw. On

    another execution of the selection, you

    do not need to change the value of this

    number unless you want to generatedifferent draws

    Enter the percentageof thedraw by random, or the

    sample size after the draw

    Click on OK

    Select by interval as the

    method of choice

    Select the statusof the

    cases

    Define the interval as a

    function of its rank in the

    Base SPAD

    Click on the arrow button

    to move your choice to thecases statuswindow

    Click on the Yesradio button

    to run a random draw

    Click on Define to set the

    parameters for the

    random draw

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    8/176

    STATS - marginal distributions, Histograms

    8

    The Weighting tab

    The weightingtab allows you to adjust the distribution of the cases in the sample:

    According to a Weighting variable already in the file. As a function of one or more theoretical percentages (calculation by adjustment).

    Enter the theoretical percentage for each category and click on OK.

    You can repeat this operation for another variable. In this way you get an adjustment as a function

    of several variables with a simple weighting variable. This requires a calculation by successive

    approximations, as shown in the window below:

    Click on the optionsin thefirst window, to access the

    options window for the

    weighting system.

    In the case of calculation byadjustment, in the available

    variables window, choose the

    variable serving to correct and

    click on the button Define

    Select the

    weighting

    type

    For a category, enter

    the theoreticalpercentage and hit

    Enter

    You can use the options

    by default, or change theoptions for fitting

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    9/176

    Descriptive Statistics with SPAD

    9

    Attention:The weighting calculation in the weighting tab page for a method is temporary (the

    weighting variable is not saved). This approach lets you make quick tests and also to measure the

    influence of the weighting on the results of the method. When a satisfactory weighting variable has

    been obtained, it is preferable to create a permanent weighting variable with the menu Tools

    Weightingof the main menu (Data Management Manual, paragraph 4.3).

    Then in the weighting tab of a method, we will select this variable as the weight variable.

    The Marginal distributions tab

    We select the categorical variables in the list below.

    The Parameters button allows you to display or not the categories without anyrespondent and to display or not the missing data as a new category.

    The Statistics button displays summary statistics on the selected variables. For example,select the Region where the respondent lives (V1), then click on the statistics button. Awindow opens with statistics on the variable:

    This statistics window shows for the categoricalvariables: the count and percentage associated foreach category. For the continuous variables; thestatistic window shows the count, the mean, thestandard deviation, as well as the minimum and

    maximum.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    10/176

    STATS - marginal distributions, Histograms

    10

    The Histograms - Categorization tab

    This tab allows you to select continuous variables both for histograms/summary statisticsand for categorization (marginal distributions of the variables values)

    The Parameters button allows you to set global or specific parameters for the histogramscharacteristics such as the number of classes, the min and max bounds and the histogrambar width.

    You can also select continuous variables for categorization. As a result, each distinct valueis displayed with its frequency.It is a preliminary step before splitting the continuous variable into classes.

    It is not allowed to do both histograms and categorization for the same variable.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    11/176

    Descriptive Statistics with SPAD

    11

    The Marginal distributions by categories tab

    This tab is useful for variables that are based on the same categories. The categories oftheses variables must have the same labels and must be ranked in the same order (we cancheck it with the marginal distributions tab).

    The Parameters tab

    This tab allows you to export the results into excel or not.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    12/176

    STATS - marginal distributions, Histograms

    12

    Once you have specified your request, then you validate the method by clicking on theOK button.

    RESULTS

    Results are accessible in the Execution view or by right-clicking on the method andchoosing the Results command. Then, depending on the method, different choices areavailable between the results editor, the Graphics gallery and Excel results.

    The results editor

    The Result Editoropens up in a new window.

    The information list has a tree structure.

    By clicking on you open a branch of the tree, and by clicking on you close abranch of the tree. You can use the mouse to navigate through the tree.

    By double clicking on the title, you display the relevant results in the new window.

    The Layout option of the File menu allows you to customize results display on the screen.The results can be printed or copied into your word processor, but they cannot be changedin this editor.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    13/176

    Descriptive Statistics with SPAD

    13

    THE RESULTS OF THE STATSMETHOD

    SUMMARY STATISTICS OF THE VARIABLES

    MARG I NA L D I S TRI BUT I ONS OF CATEGOR I CA L VAR I AB LES- - - - - - - - COUNTS - - - - - - - -ACTUAL %/ TOTAL %/ EXPR. HI STOGRAM OF WEI GHTS

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -1 . Re g i o n w h e r e t h e r e s p o n d e n t l i v e s

    Rg1 - Par i s r egi on 56 17. 78 17. 78 *** ** ****Rg2 - Par i s Basi n 51 16. 19 16. 19 ** **** **Rg3 - nort h 24 7. 62 7. 62 ** **Rg4 - east 29 9. 21 9. 21 ** ***Rg5 - west 45 14. 29 14. 29 ** ** ** *Rg6 - south- west 38 12. 06 12. 06 ** ** **Rg7 - cent er east 36 11. 43 11. 43 *** ** *Rg8 - medi t er r anean 36 11. 43 11. 43 ** ** **

    OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    2 . U r b a n a r e a s i ze ( n u m b e r o f i n h a b i t a n t s )

    Agg1 - l ess t han 2000 84 26. 67 26. 67 ** **** ** ** ***Agg2 - 2001 t o 5000 18 5. 71 5. 71 ** *Agg3 - 5001 t o 10000 18 5. 71 5. 71 ** *Agg4 - 10001 t o 20000 12 3. 81 3. 81 **Agg5 - 20001 t o 50000 23 7. 30 7. 30 ** **Agg6 - 50001 t o 100000 18 5. 71 5. 71 ** *Agg7 - 100001 t o 200000 28 8. 89 8. 89 ** ** *Agg8 - mor e t han 200000 68 21. 59 21. 59 ** ** *** ***Agg9 - pari s, pari s. aggl o 46 14. 60 14. 60 *** *** *

    OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    3 . Se x o f r e s p o n d e n t

    Sex1 - mal e 138 43. 81 43. 81 ** **** ** ** **** ** ** ***Sex2 - f emal e 177 56. 19 56. 19 *** *** *** *** *** *** *** *** **

    OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    MARG I NA L D I S TRI BUT I ONS CATEGOR I ZED VAR I AB LES

    - - - - - - - - - - - COUNTS - - - - - - - - - - - -ACTUAL %/ TOTAL %/ EXPR. % CUM. HI STOGRAM OF WEI GHTS

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    14/176

    STATS - marginal distributions, Histograms

    14

    2 6 . N u m b e r o f p e r s o n s in a h o u s i n g

    1. 000 38 12. 06 12. 06 12. 06 ** ** **2. 000 90 28. 57 28. 57 40. 63 **** ** **** ** *3. 000 69 21. 90 21. 90 62. 54 *** ** ** ** *4. 000 71 22. 54 22. 54 85. 08 *** ** ** ** *5. 000 34 10. 79 10. 79 95. 87 ** ** *6. 000 7 2. 22 2. 22 98. 10 *7. 000 4 1. 27 1. 27 99. 37 *8. 000 2 0. 63 0. 63 100. 00 *

    OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -2 8 . N u m b e r o f c h i ld r e n

    0. 000 70 22. 22 22. 22 22. 22 *** ** ** ** *1. 000 67 21. 27 21. 27 43. 49 *** ** ** ** *2. 000 94 29. 84 29. 84 73. 33 **** ** **** ** *3. 000 54 17. 14 17. 14 90. 48 ** ** ** **4. 000 9 2. 86 2. 86 93. 33 **5. 000 11 3. 49 3. 49 96. 83 **6. 000 2 0. 63 0. 63 97. 46 *7. 000 2 0. 63 0. 63 98. 10 *8. 000 2 0. 63 0. 63 98. 73 *9. 000 4 1. 27 1. 27 100. 00 *

    OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    SUMMARY STAT IST I CS OF CONT I NUOUS VAR I ABLES

    TOTAL COUNT : 315TOTAL WEI GHT : 315. 00+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +| NUM . LABEL COUNT WEI GHT | MEAN STD. DEV. | MI NI MUM MAXI MUM | MI N. 2 MAX. 2 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +| 4 . Age of r espondent 315 315. 00 | 43. 756 16. 581 | 18. 000 86. 000 | 19. 000 83. 000 || 41 . Fami l y, chi l dren : i 315 315. 00 | 6. 651 1. 062 | 1. 000 7. 000 | 2. 000 6. 000 || 42 . Work, prof essi on : i 315 315. 00 | 5. 956 1. 544 | 1. 000 7. 000 | 2. 000 6. 000 || 43 . Free t i me, r el ax: i m 315 315. 00 | 5. 295 1. 454 | 0. 000 7. 000 | 1. 000 6. 000 || 44 . Fri ends, acquai ntanc 315 315. 00 | 5. 190 1. 424 | 1. 000 7. 000 | 2. 000 6. 000 || 45 . Rel ati ves, brothers, 315 315. 00 | 5. 629 1. 436 | 1. 000 7. 000 | 2. 000 6. 000 || 46 . Rel i gi on : i mpor t anc 315 315. 00 | 3. 241 2. 022 | 0. 000 7. 000 | 1. 000 6. 000 || 47 . Pol i t i c, pol i t i cal l 315 315. 00 | 3. 111 1. 770 | 0. 000 7. 000 | 1. 000 6. 000 || 50 . Stat e benef i t s : ave 283 283. 00 | 533. 795 926. 899 | 0. 000 5100. 000 | 15. 000 4980. 000 || 51 . Sal ary of t he r espon 267 267. 00 | 4408. 547 4575. 339 | 0. 000 40000. 000 | 300. 000 24000. 000 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +

    H I S TOGRAMS OF CONT I NUOUS VAR I AB LES

    V AR I A B L E 4 : A g e o f r e s p o n d e n t

    LOW. LI MI T| MEAN | WEI GHT| HI STOGRAM ( BETWEEN 16. 00 I NCLUDED AND 88. 00 EXCLUDED,BAR I NTERVAL WI DTH = 2. 00)

    - - - - - - - - - - +- - - - - - - - - - +- - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -16. 00 | 20. 93 | 28 | XXXXXXXXXXXXXX24. 00 | 27. 85 | 68 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX32. 00 | 35. 31 | 58 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXX40. 00 | 43. 35 | 37 | XXXXXXXXXXXXXXXXXX48. 00 | 52. 08 | 39 | XXXXXXXXXXXXXXXXXXX56. 00 | 59. 06 | 33 | XXXXXXXXXXXXXXXX64. 00 | 67. 09 | 33 | XXXXXXXXXXXXXXXX72. 00 | 74. 71 | 14 | XXXXXXX80. 00 | 82. 20 | 5 | XX

    +- - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

    | | OVERALL | HI STOGRAM || | ( FROM 18. 00 TO 86. 00) | ( FROM 16. 00 TO 88. 00) |+- - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| WEI GHT | 315. 00 | 315. 00 || MEAN | 43. 756 | 43. 756 || STD. DEV. | 16. 581 | 16. 440 |+- - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +WEI GHTS OF REMAI NI NG CASES : STRI CTLY LESS THAN . . . . . 16. 00 : 0. 00

    GREATER THAN OR EQUAL TO 88. 00 : 0. 00

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    15/176

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    16/176

    DEMOD Automatic Characterization of a qualitative variable

    16

    DEMODAUTOMATIC CHARACTERIZATION OF AQUALITATIVE VARIABLE

    This extremely powerful procedure provides the automatic characterization of anycategorical variable.This is the IDEAL procedure to find out everything about a variable in one question. Thewell-structured outputs form comprehensive study reports.

    One can characterize either each category of a variable, or globally the variable itself. Allthe elements available (active and illustrative) may participate in the characterization: thecategorical variables of the categorical variables, the categorical variables themselves, andthe continuous variables.

    The following table summarizes all the capabilities of the DEMOD procedure:

    Elements to characterize Characterizing elements

    Groups of cases (defined by the categories of thevariable to characterize)

    We describe each category with all its significant characterizingelements.

    categories

    categorical variables

    continuous variables

    The categorical variable to characterize

    We cross the variable with all the characterizing elements anddisplay only the elements that are dependant from the variableto characterize.

    categoriescategorical variables

    continuous variables

    A group of cases is defined by a category of the variable to characterize. We have as muchgroups of cases as the number of categories of the variable to characterize.

    Double-click on the demod icon in order to access the settings of the method.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    17/176

    Descriptive Statistics with SPAD

    17

    THE VARIABLESTAB

    The scrolling menu allows you to select the variables to characterize and the characterizingelements.

    In this example, the variable to characterize is V8 The family is the only place where youfeel well. All the other variables whether categorical or continuous are selected ascharacterizing elements.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    18/176

    DEMOD Automatic Characterization of a qualitative variable

    18

    THE PARAMETERSTAB

    This tab allows you to modify the default parameters for the DEMOD method.

    Once you have set the parameters, then you validate the method by clicking on the OKbutton and run the chain.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    19/176

    Descriptive Statistics with SPAD

    19

    THE DEMODRESULTS

    THE DEMOD-5EXCEL SHEET

    % of category in group :Frequency of the category in the group divided by the frequency of the group

    % of category in set:Frequency of the category in the population

    % of group in category:Frequency of the group in the category divided by the frequency of category

    Test-value:When the test-value is greater than zero, it means that the category is over-represented in the group. The category is under-represented if the test-value is

    negative. By default, SPAD displays only characterizing elements with a test-valuegreater equal than 1.96 (i.e. a probability equal to 0.025 for an unilateral test).

    Probability:The probability evaluates the scale of the difference between the percentage of thecategory in the group and the percentage of the category in the population. Lower isthe probability, more significant is the difference and greater is the test-value relatedto this probability (the test-value is the fractile of the normal law that corresponds tothe same probability).

    Weight:Weight of the cases in the category

    Characterisation by categories of groups of

    The family is the only place where you feel well

    Group: Yes (Count: 230 - Percentage: 73.02)

    Variable labelCaracteristic

    categories

    % of

    category in

    group

    % of

    category in

    set

    % of group

    in categoryTest-value Probability Weight

    Marital status married 78,26 70,79 80,72 4,55 0,000 223

    Do you watch TV every day 62,61 55,87 81,82 3,83 0,000 176

    Opinion about marriage indissoluble 31,30 25,71 88,89 3,79 0,000 81

    Are you worried about the risk of a nuclear plant accident a lot 32,61 28,25 84,27 2,76 0,003 89

    Do you have children yes 81,30 77,14 76,95 2,68 0,004 243

    Are you worried about the risk of a road accident a lot 40,87 36,51 81,74 2,55 0,005 115

    Educational level of the respondent primary school 20,43 17,14 87,04 2,50 0,006 54

    Current situation of the respondent retired people 20,43 17,14 87,04 2,50 0,006 54

    Are you worried about the risk of a mugging a lot 33,04 29,21 82,61 2,38 0,009 92

    Do you think the society needs to change I do not know 11,30 9,21 89,66 2,01 0,022 29

    Current situation of the respondent unemployed person 5,22 7,30 52,17 -2,02 0,022 23

    Are you worried about the risk of a mugging not at all 23,04 26,35 63,86 -2,02 0,022 83

    Current situation of the respondent student 2,17 3,81 41,67 -2,06 0,020 12Educational level of the respondent technical and GCSE 3,48 5,40 47,06 -2,10 0,018 17

    Marital status cohabitation 3,04 5,08 43,75 -2,30 0,011 16

    Do you have work-personal life problems yes 20,43 24,13 61,84 -2,33 0,010 76

    Urban area size (number of inhabitants) more than 200000 17,83 21,59 60,29 -2,46 0,007 68

    Your opinion on the life conditions in the future improving a lot 3,91 6,67 42,86 -2,81 0,002 21

    Do you watch TV quite often 19,57 24,13 59,21 -2,90 0,002 76

    Marital status single 9,57 13,33 52,38 -2,93 0,002 42

    Do you have children no 17,39 21,90 57,97 -2,96 0,002 69

    Opinion about marriage dissolved if agreem 30,87 36,19 62,28 -3,07 0,001 114

    Are you worried about the risk of a road accident a little 15,65 20,32 56,25 -3,13 0,001 64

    Educational level of the respondent more high school 9,13 13,65 48,84 -3,49 0,000 43

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    20/176

    DEMOD Automatic Characterization of a qualitative variable

    20

    THE DEMOD-13EXCEL SHEET

    Category mean:Weighted mean of the variable in the category

    Overall mean:Weighted mean of the category in the overall population

    Interpretation:One can see that the Age of respondent is the most characterizing continuousvariable of the group who answered yes to the question The family is the onlyplace where you feel well .This group is significantly older than the average respondent with an average age of46 years old, compared to 43.75 years old for the overall population.

    Characterisation by continuous variables of categories of

    The family is the only place where you feel wellYes (Weight = 230.00 Count = 230 )

    Characteristic variablesCategory

    mean

    Overall

    mean

    Category Std.

    deviation

    Overall Std.

    deviationTest-value Probability

    Age of respondent 46,100 43,756 16,752 16,581 4,12 0,000

    Religion : importance given 3,383 3,241 2,081 2,022 2,04 0,021Relatives, brothers, sisters ... : importance given 5,726 5,629 1,380 1,436 1,98 0,024

    Salary of the respondent 4044,990 4408,550 3690,140 4575,340 -2,09 0,018

    No (Weight = 83.00 Count = 83 )

    Characteristic variablesCategory

    mean

    Overall

    mean

    Category Std.

    deviation

    Overall Std.

    deviationTest-value Probability

    Salary of the respondent 5377,780 4408,550 6311,000 4575,340 2,10 0,018

    Number of children 1,542 1,860 1,772 1,671 -2,02 0,022

    Age of respondent 36,855 43,756 13,971 16,581 -4,41 0,000

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    21/176

    Descriptive Statistics with SPAD

    21

    DESCO - AUTOMATIC CHARACTERIZATION OF ACONTINUOUS VARIABLE

    This procedure provides the statistical characterization of one or more continuousvariables by:

    The other continuous variables, with the support of correlations.The categories of the categorical variables, by comparison of means.The categorical variables themselves, with the help of Fisher's statistic.

    THE VARIABLESTAB

    A continuous variable can be characterized with the other variables whether categorical orcontinuous, called characterizing variables.

    The scrolling menu allows you to select the variables to characterize and the characterizingelements.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    22/176

    DESCO - Automatic Characterization of a continuous variable

    22

    THE PARAMETERSTAB

    The parameter Minimum relative weight of charactering elements is useful if you donot want to display characterizing categories whose the frequency in the population islower than 2% (threshold by default).

    Display the categories whose therelated probabilities are lower

    equal than 0.025. It correspondsto a test-value of 1.96.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    23/176

    Descriptive Statistics with SPAD

    23

    THE DESCORESULTS

    CHARACTERISATION OF CONTINUOUS VARIABLES

    DESCR I PT ION OF : Sa l a r y o f t h e r e s p o n d e n t

    DESCRI PT I ON BY CATEGORI ES

    OF CONT I NUOUS VAR I ABLE : S a l a r y o f t h e r e s p o n d e n t

    ON 267. 0 ACTI VE CASES MEAN = 4408. 547STD. DEV. = 4575. 339

    +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +| TEST PROB. | MEAN STD. DEV. | CATEGORI ES | VARI ABLE LABEL | WEI GHT || VALUE | | | | |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +| 8.16 0.000 | 7060. 53 4921. 82 | yes, f ul l t i me | At t he moment , do you have a professi onal acti vi t y | 114.00 || 7. 58 0. 000 | 6496. 32 4736. 16 | empl oyed | Curr ent si t uat i on of t he r espondent | 136. 00 || 7. 28 0. 000 | 6617. 07 4883. 30 | no | Have you been unempl oyed dur i ng t he l ast t wel ve months | 123. 00 || 6. 69 0. 000 | 6533. 19 5486. 12 | mal e | Sex of r espondent | 117. 00 || 4. 60 0. 000 | 6452. 63 5414. 05 | no | Do you have work- per sonal l i f e probl ems | 76. 00 || 4. 25 0. 000 | 6698. 25 6784. 83 | qui t e of t en | Do you watch TV | 57.00 || 3. 73 0. 000 | 6331. 15 3880. 83 | yes | Do you have work- per sonal l i f e probl ems | 61. 00 || 3. 47 0. 000 | 6797. 37 6049. 03 | more hi gh school | Educat i onal l evel of t he r espondent | 38. 00 || 3. 35 0. 000 | 4860. 06 4834. 30 | no | Have you r ecentl y been depressed | 217. 00 || 3. 18 0. 001 | 5291. 85 5418. 67 | no | Have you r ecentl y been ner vous | 135. 00 || 3. 10 0. 001 | 6950. 00 5579. 71 | yes | Do you have a pi ano | 28. 00 || 2. 89 0. 002 | 6529. 41 5935. 61 | yes | Do you have a second house | 34.00 || 2. 88 0. 002 | 6330. 00 7536. 22 | yes | Do you have a vi deo-t ape | 40. 00 || 2. 65 0. 004 | 5937. 26 6786. 27 | Par i s r egi on | Regi on wher e t he r espondent l i ves | 51. 00 || 2. 43 0. 008 | 5179. 34 5246. 40 | a l ot | Has t he r espondent been i nter ested by t he survey | 117. 00 || 2.17 0.015 | 6906. 67 4638. 46 | a l ot bet t er | Your opi ni on on t he evol uti on of t he dai l y per sonal l i f e | 15. 00 || 2. 10 0. 018 | 5377. 78 6311. 00 | No | The f ami l y i s t he onl y pl ace wher e you f eel wel l | 72. 00 || - 2.01 0.022 | 3301. 51 2735. 77 | qui t e agree | Pers ons l i ke me oft en f eel al one | 55. 00 || - 2. 09 0. 018 | 4044. 99 3690. 14 | Yes | The f ami l y i s t he onl y pl ace wher e you f eel wel l | 193. 00 || - 2.14 0.016 | 3769. 06 3573. 01 | a l ot | Are you worr i ed about t he ri sk of havi ng a seri ous i l l ness | 125.00 || - 2.23 0.013 | 3196. 12 3440.69 | a l ot worse | Your opi ni on on t he evol uti on of French peopl e l i f e l evel | 56. 00 || - 2.47 0.007 | 3319. 48 2735. 76 | a l ot | Are you worr i ed about t he ri sk of a nucl ear pl ant acci dent | 77. 00 || - 2. 54 0. 006 | 1971. 43 1864. 75 | unempl oyed per son | Curr ent si t uat i on of t he r espondent | 21. 00 || - 2.57 0.005 | 760.00 1356. 61 | st udent | Curr ent si t uat i on of t he respondent | 10. 00 || - 2.66 0.004 | 2606. 41 3255. 77 | a l ot worse | Your opi ni on on t he evol uti on of t he dai l y per sonal l i f e | 39. 00 || - 2. 86 0. 002 | 3726. 34 3277. 03 | every day | Do you watch TV | 155. 00 || - 2. 88 0. 002 | 4069. 97 3721. 48 | no | Do you have a vi deo-t ape | 227. 00 || - 2. 89 0. 002 | 4099. 07 4253. 85 | no | Do you have a second house | 233. 00 || - 3. 10 0. 001 | 4110. 81 4346. 66 | no | Do you have a pi ano | 239. 00 || - 3. 18 0. 001 | 3505. 18 3271. 07 | yes | Have you r ecentl y been ner vous | 132. 00 || - 3. 35 0. 000 | 2449. 00 2373. 53 | yes | Have you r ecentl y been depressed | 50. 00 || - 3.49 0.000 | 2263. 04 2043. 80 | no qual i f i cati ons | Educat i onal l evel of t he respondent | 46. 00 || - 4. 36 0. 000 | 832. 14 1563. 89 | I have never worked | At t he moment , do you have a prof essi onal acti vi t y | 28. 00 || - 4. 85 0. 000 | 2691. 10 3397. 40 | no | At t he moment , do you have a prof essi onal acti vi t y | 103. 00 || - 6.54 0.000 | 488.54 1396. 02 | housewi f e w/ o prof. | Curr ent si t uat i on of t he respondent | 48. 00 || - 6. 69 0. 000 | 2751. 33 2742. 02 | f emal e | Sex of r espondent | 150. 00 || - 7. 28 0. 000 | 2311. 41 3196. 29 | mi ssi ng category | Do you have work- per sonal l i f e probl ems | 130. 00 || - 7. 28 0. 000 | 2311. 41 3196. 29 | mi ssi ng cat egory | Have you been unempl oyed dur i ng t he l ast t wel ve months | 130. 00 |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +| | 4408. 55 4575. 34 | OVERALL | 267. 00 |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +

    DESCR I PT I ON BY CATEGOR ICAL VAR I ABLES

    OF VARI A B L E : S a la r y o f t h e r e s p o n d e n t

    +- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - +| TEST- VALUE | PROBA. | NUM . VARI ABLE LABEL | DEN. DEG. FREE. | FI SHER|+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - +| 8. 56 | 0. 000 | 5 . Cur rent si t uati on of t he r espondent | 261 | 21. 44|| 8. 48 | 0. 000 | 18 . At t he moment , do you have a prof essi onal acti vi t y | 263 | 31. 95|| 7. 50 | 0. 000 | 20 . Have you been unempl oyed dur i ng t he l ast t wel ve mont hs | 264 | 35. 01|| 7. 28 | 0. 000 | 19 . Do you have work- per sonal l i f e probl ems | 264 | 32. 89|| 6. 98 | 0. 000 | 3 . Sex of r espondent | 265 | 53. 58|| 3. 48 | 0. 000 | 7 . Educat i onal l evel of t he r espondent | 258 | 3. 87|| 3. 47 | 0. 000 | 33 . Do you watch TV | 263 | 6. 57|| 3. 38 | 0. 001 | 24 . Have you r ecentl y been depressed | 265 | 11. 69|| 3. 21 | 0. 001 | 23 . Have you r ecentl y been ner vous | 265 | 10. 50|| 3. 12 | 0. 002 | 16 . Do you have a pi ano | 265 | 9. 94|| 2. 90 | 0. 004 | 17 . Do you have a second house | 265 | 8. 58|| 2. 89 | 0. 004 | 15 . Do you have a vi deo- t ape | 265 | 8. 50|

    | 2. 04 | 0. 021 | 52 . Has t he r espondent been i nterest ed by t he survey | 264 | 3. 92|| 1. 92 | 0. 054 | 21 . Have you r ecentl y had headaches | 265 | 3. 74|| 1. 77 | 0. 039 | 30 . Your opi ni on on t he evol ut i on of t he dai l y per sonal l i f e | 261 | 2. 38|| 1. 56 | 0. 059 | 25 . Ar e you sat i sf i ed of your heal t h | 263 | 2. 51|| 1. 33 | 0. 092 | 40 . Ar e you worr i ed about t he r i sk of a nucl ear pl ant acci dent | 263 | 2. 16|| 1. 31 | 0. 189 | 29 . Do you r egul arl y i mpose r est r i cti ons | 265 | 1. 73|| 1. 24 | 0. 107 | 8 . The f ami l y i s t he onl y pl ace where you f eel wel l | 264 | 2. 24|| 1. 12 | 0. 132 | 1 . Regi on wher e t he r espondent l i ves | 259 | 1. 61|| 1. 07 | 0. 143 | 39 . Ar e you worr i ed about t he r i sk of umempl oyment | 263 | 1. 82|| 1. 03 | 0. 151 | 35 . The comput er sci ence di f f usi on i s. . . | 263 | 1. 78|| 1. 02 | 0. 154 | 34 . Do you t hi nk t he soci ety needs t o change | 264 | 1. 86|| 0. 92 | 0. 179 | 49 . Per sons l i ke me oft en f eel al one | 263 | 1. 64|| 0. 89 | 0. 186 | 31 . Your opi ni on on t he evol ut i on of French peopl e l i f e l evel | 260 | 1. 48|| 0. 86 | 0. 194 | 36 . Ar e you wor r i ed about t he ri sk of havi ng a ser i ous i l l ness| 263 | 1. 58|| 0. 79 | 0. 428 | 22 . Have you r ecentl y had backaches | 265 | 0. 63|| 0. 78 | 0. 217 | 11 . Ar e you sat i sf i ed of your housi ng | 263 | 1. 49|| 0. 65 | 0. 257 | 37 . Ar e you worr i ed about t he r i sk of a muggi ng | 263 | 1. 35|| 0. 45 | 0. 327 | 13 . Occupat i on st atus of housi ng | 262 | 1. 16|| 0. 22 | 0. 412 | 27 . Do you have chi l dren | 264 | 0. 88|

    | 0. 13 | 0. 446 | 38 . Ar e you worr i ed about t he r i sk of a r oad acci dent | 263 | 0. 89|| 0. 10 | 0. 459 | 6 . Mari t al status | 262 | 0. 91|| 0. 08 | 0. 469 | 9 . Opi ni on about marr i age | 263 | 0. 85|| - 0. 15 | 0. 561 | 32 . Your opi ni on on t he l i f e condi t i ons i n t he f ut ur e | 261 | 0. 79|| - 0. 21 | 0. 585 | 12 . Are you sat i sf i ed of your dai l y l i f e | 263 | 0. 65|

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    24/176

    DESCO - Automatic Characterization of a continuous variable

    24

    | - 0. 23 | 0. 591 | 14 . The housi ng expenses are f or you | 260 | 0. 77|| - 0. 53 | 0. 702 | 10 . Housekeepi ng works, t ake car e of chi l dren. . . | 263 | 0. 47|| - 0. 59 | 0. 724 | 2 . Ur ban area si ze ( number of i nhabi t ant s) | 258 | 0. 66|| - 0. 64 | 0. 740 | 48 . Your opi ni on on t he j ust i ce r unni ng i n 1986 | 261 | 0. 55|+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - +

    SUMMARY STAT I S T I CS OF CONT I NUOUS VAR I AB LESTOTAL COUNT 315TOTAL WEI GHT 315. 00+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - +| NUM . I DEN - LABEL COUNT WEI GHT | MEAN STD. DEV. | MI NI MUM MAXI MUM |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - +| 4 . Age - Age of r espondent 267 267. 00 | 43. 61 16. 88 | 18. 00 83. 00 || 26 . Nbpr - Number of persons i n 267 267. 00 | 3. 04 1. 43 | 1. 00 8. 00 || 28 . Nbef - Number of chi l dren 267 267. 00 | 1. 85 1. 69 | 0. 00 9. 00 || 41 . Fami - Fami l y, chi l dr en : i 267 267. 00 | 6. 65 1. 07 | 1. 00 7. 00 || 42 . Trav - Work, pr of essi on : i 267 267. 00 | 5. 90 1. 57 | 1. 00 7. 00 || 43 . Loi s - Free t i me, r el ax: i m 267 267. 00 | 5. 30 1. 43 | 0. 00 7. 00 || 44 . Ami s - Fri ends, acquai nt anc 267 267. 00 | 5. 18 1. 41 | 1. 00 7. 00 || 45 . Par t - Rel at i ves, br ot her s, 267 267. 00 | 5. 63 1. 44 | 1. 00 7. 00 || 46 . Rel i - Rel i gi on : i mport anc 267 267. 00 | 3. 15 1. 96 | 1. 00 7. 00 || 47 . Pol i - Pol i t i c, pol i t i cal l 267 267. 00 | 3. 15 1. 79 | 1. 00 7. 00 || 50 . PrFm - St ate benef i t s : ave 244 244. 00 | 583. 10 966. 04 | 0. 00 5100. 00 |

    | 51 . Sal r - Sal ary of t he r espon 267 267. 00 | 4408. 55 4575. 34 | 0. 00 40000. 00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - +

    CORRELAT I ONS W I TH CONT I NUOUS VAR I A B LES

    OF V ARI A B L E : S a la r y o f t h e r e s p o n d e n t

    +- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - +| TEST-VALUE | PROB. | CORRELATI ON | NUM . VARI ABLE LABEL | WEI GHT |+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - +| 99. 90 | 0. 000 | 1. 000 | 51 . Sal ary of t he r espondent | 267. 000 || - 2. 53 | 0. 006 | - 0. 162 | 50 . St ate benef i t s : average mont hl y amount | 244. 000 |+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - +

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    25/176

    Descriptive Statistics with SPAD

    25

    TABLE - CROSS TABLES

    With this procedure, you can obtain in one go an unlimited number of tables for members,

    means or frequencies.

    THE TABLESTAB

    This tab allows you to define the cross tables to create.

    The tables cells can display weights, % raw, % column, average and standard deviationdepending on the parameters and settings.

    The scrolling menu allows you to define the cross tables you want to display with orwithout supplementary information such as mean or frequency related to anothervariable.

    If a variable appears in the Meanscolumn, each cell of the cross table will display theweighted average corresponding to the cases of the cell.

    If a variable appears in the Frequencies column, each cell of the cross table will displaythe weighted sum of the values of the variable for the cases of the cell.

    By clicking on local filter, you can define a specific filter for each command.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    26/176

    TABLE - Cross tables

    26

    THE PARAMETERSTAB

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    27/176

    Descriptive Statistics with SPAD

    27

    THE TABLERESULTS

    CROSS-TABS

    L I S T OF COMMANDS

    COMMAND 1

    TABLE 1 BY ROW : 9 . Opi ni on about mar r i age

    BY COLUMN : 3 . Sex of r espondentCOMMAND 2

    TABLE 2 BY ROW : 9 . Opi ni on about mar r i ageBY COLUMN : 3 . Sex of r espondentMEANS OF : 4 . Age of r espondent

    L I ST OF CROSS -TABS

    T A B LE 1 B Y ROW : O p i n i o n a b o u t m a r r i a g e TO TA L W EI GH T: 3 1 5 .

    B Y CO LUMN : S e x o f r e s p o n d e n t

    WEI GHT | mal e | f emal e | OVERALLCOLUMN PERC. | | |

    ROW PERC. | | |- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -

    | 41 | 40 | 81i ndi ssol ubl e | 29. 71 | 22. 60 | 25. 71

    | 50. 62 | 49. 38 | 100. 00- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -

    | 39 | 69 | 108di ssol ved seri ous pb | 28. 26 | 38. 98 | 34. 29| 36. 11 | 63. 89 | 100. 00

    - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 50 | 64 | 114

    di ssol ved i f agr eem | 36. 23 | 36. 16 | 36. 19| 43. 86 | 56. 14 | 100. 00

    - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 8 | 4 | 12

    I do not know | 5. 80 | 2. 26 | 3. 81| 66. 67 | 33. 33 | 100. 00

    - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 138 | 177 | 315

    OVERALL | 100. 00 | 100. 00 | 100. 00| 43. 81 | 56. 19 | 100. 00

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -KHI 2 = 6. 67 / 3 DEGREES OF FREEDOM / 0 EXPECTED FREQUENCI ES LESS THAN 5PROB. ( KHI 2 > 6. 67 ) = 0. 083 / TEST- VALUE = 1. 38- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -T A B LE 2 B Y ROW : O p i n i o n a b o u t m a r r i a g e TO TA L W EI GH T: 3 1 5 .

    B Y CO LUMN : S e x o f r e s p o n d e n t

    M E ANS O F : A g e o f r e s p o n d e n t

    WEI GHT | mal e | f emal e | OVERALLMEAN | | |

    STD. DEV. | | |- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -

    | 41 | 40 | 81i ndi ssol ubl e | 45. 829 | 48. 325 | 47. 062

    | 17. 234 | 17. 084 | 17. 206- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -

    | 39 | 69 | 108di ssol ved ser i ous pb | 43. 000 | 46. 362 | 45. 148

    | 14. 739 | 18. 260 | 17. 148

    - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 50 | 64 | 114

    di ssol ved i f agr eem | 41. 300 | 38. 484 | 39. 719| 15. 442 | 14. 330 | 14. 893

    - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 8 | 4 | 12

    I do not know | 50. 250 | 41. 250 | 47. 250| 15. 618 | 8. 842 | 14. 377

    - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 138 | 177 | 315

    OVERALL | 43. 645 | 43. 842 | 43. 756| 16. 007 | 17. 015 | 16. 581

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    28/176

    BIVAR - Bivariate Analysis

    28

    BIVAR - BIVARIATE ANALYSIS

    The BIVAR procedure lets you characterize a sample from the viewpoint of two particular

    continuous variables (AXES variables or base variables). The sample can be described bycategorical variables and by other continuous variables.

    THE VARIABLESTAB

    With this tab, the SPAD user selects the two continuous variables for the bivariateanalysis.

    It is possible to include in the analysis some supplementary variables (whether continuousor categorical).

    The graph editor of the BIVAR method is the same that is used for factorial analyses.The capabilities of the graph editor will be described in the section Factorial analyses.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    29/176

    Descriptive Statistics with SPAD

    29

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    30/176

    BIVAR - Bivariate Analysis

    30

    FACTORIAL ANALYSES WITH SPAD

    PCA: Principal Component Analysis (PCA)

    SCA: Simple Correspondence Analysis (SCA)

    MCA: Multiple Correspondence Analysis (MCA)

    DEFAC: Factors description

    SPAD provides the main techniques in multidimensional exploratory analysis, combinedwith procedures for clustering. One area of application concerns the processing of large-scale surveys in market research and socio-economic research.

    The main applications of factorial analyses are: (1) to reduce the number of dimensionsand (2) to detect structure in the relationships between variables. Therefore, factor analysisis applied as a data reduction or structure detection method.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    31/176

    Factorial Analyses with SPAD

    31

    VOCABULARY

    Active Variables Variables used to perform the factorial analysis

    Supplementary variables Variables that are not used to perform the original analysisbut used to illustrate the main results of the analysis.

    Contribution Criteria that measures the contribution of an element(category, variable, frequency or case) to the inertia (totalinertia, dimensions inertia)

    Cosines Criteria that measures the quality of representation of anelement (category, variable, case or frequency) for eachdimension.

    Axes, factors, dimensions These terms correspond to the factors computed orextracted by the analysis. Consecutive factors areuncorrelated or orthogonal to each other. Factors areconsecutively extracted by maximizing the remainingvariability in the active data.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    32/176

    PCA - Principal Component Analysis

    32

    PCA - PRINCIPAL COMPONENT ANALYSIS

    This method performs the principal component analysis of a sample of cases describedwith continuous variables. The analysis can be performed on original variables or normedvariables (centered and normalized) whether the active variables are on the same scale ornot.It is possible to introduce supplementary elements such as: cases, other continuousvariables or categorical variables.

    Import the Sba dataset Cars.sba.

    Drag and drop the PCA method on the Cars dataset as follows.

    The two goals of the analysis are:

    Capture the main interrelationships between correlated variables in small number

    of summary characteristics: dimension reduction

    Identify automobile models with similar attributes: Useful step for developingclustering or classification model

    The dataset contains measurements on 6 variables for 24 models: cubic capacity, power,speed, weight, length and width.

    Due to strong differences in measurement scales, we will perform a PCA on normedvariables.

    KIDENCubic

    capacity Power Speed Weight Length Width

    Honda civic 1396 90 174 850 369 166

    Peugeot 205 Rallye 1294 103 189 805 370 157

    Seat Ibiza SX I 1461 100 181 925 363 161

    Citron AX Sport 1294 95 184 730 350 160

    Renault 19 1721 92 180 965 415 169

    Fiat Tipo 1580 83 170 970 395 170

    Peugeot 405 1769 90 180 1080 440 169

    Renault 21 2068 88 180 1135 446 170

    Citron BX 1769 90 182 1060 424 168

    Opel Omega 1998 122 190 1255 473 177

    Peugeot 405 Break 1905 125 194 1120 439 171

    Ford Sierra 1993 115 185 1190 451 172

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    33/176

    Factorial Analyses with SPAD

    33

    Renault Espace 1995 120 177 1265 436 177

    Nissan Vanette 1952 87 144 1430 436 169

    VW Caravelle 2109 112 149 1320 457 184

    Audi 90 Quattro 1994 160 214 1220 439 169

    BMW 530i 2986 188 226 1510 472 175

    Rover 827i 2675 177 222 1365 469 175

    Renault 25 2548 182 226 1350 471 180BMW 325iX 2494 171 208 1600 432 164

    Ford Scorpio 2933 150 200 1345 466 176

    Fiat Uno 1116 58 145 780 364 155

    Peugeot 205 1580 80 159 880 370 156

    Ford Fiesta 1117 50 135 810 371 162

    The matrix plot, performed with the STATS method, is a good overview of the pair wiserelationships between variables.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    34/176

    PCA - Principal Component Analysis

    34

    The SETTING OPTIONS

    THE VARIABLESTAB

    This tab allows the SPAD user to define the following elements:

    Active continuous variables Supplementary continuous variables Supplementary categorical variables

    In our example, we select all the available continuous variables as active. We do not haveany more available variable for supplementary information.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    35/176

    Factorial Analyses with SPAD

    35

    THE CASESTAB

    The Cases tab allows you to define the role of the cases in the analysis.

    The cases retained are the ACTIVE cases, those not retained are called ILLUSTRATIVESor SUPPLEMENTARY. By using the selections by list or interval, we can also define theABANDONNED cases (which are neither active nor illustrative).

    All the calculations that lead to the factorial planes, to the hierarchical classification treeand to the final partitions are carried out only on the active cases. The illustrative casesmay be projected onto the factorial planes constructed, and re-assigned during thepartition into classes, of which they are the closest or form a missing data class.

    The cases abandoned are completely ignored in the calculations and affected automaticallyto a missing data class in the partitions.

    If you conduct many analyses on a particular sub-population, it may be preferable tocreate a BASE corresponding this one. To do this, use the Recoding chain in the Toolsmenu.

    In the Cars example, weselect all the cases as active.

    THE PARAMETERSTAB

    NORMED PCAAND NOT NORMED PCA

    Cases coordinates are notdisplayed by default.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    36/176

    PCA - Principal Component Analysis

    36

    Normed PCA means that all the active variables are previously centered and standardizedby SPAD. The consequence is that all the variables are assigned the same contribution tothe overall inertia.When the PCA is not normed (only centered), the distance between the variable and theorigin is equal to the variance of the variable.

    Most of the time, it is advised to perform a normed analysis in order to assign the sameimportance to each active variable. It is particularly recommended when themeasurements scales are different.

    In our example, we can see that the measurements scales are strongly different. Thus, wewill perform a normed PCA.

    RETAINED COORDINATES

    The number of retained coordinates is useful for the methods that follow the PCA in thechain. These methods can be DEFAC (factors description) and RECIP/SEMIS (clustering).

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    37/176

    Factorial Analyses with SPAD

    37

    THE PCARESULTS

    PRINCIPAL COMPONENTS ANALYSIS

    SUMMARY STAT I S T I CS OF CONT I NUOUS VAR I AB LES

    TOTAL COUNT : 24 TOTAL WEI GHT : 24. 00+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +

    | NUM . I DEN - LABEL COUNT WEI GHT | MEAN STD. DEV. | MI NI MUM MAXI MUM |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +| 1 . CYLI - Cubi c capaci t y 24 24. 00 | 1906. 13 516. 79 | 1116. 00 2986. 00 || 2 . PUI S - Power 24 24. 00 | 113. 67 37. 97 | 50. 00 188. 00 || 3 . VI TE - Speed 24 24. 00 | 183. 08 24. 68 | 135. 00 226. 00 || 4 . POI D - Wei ght 24 24. 00 | 1123. 33 243. 20 | 730. 00 1600. 00 || 5 . LONG - Length 24 24. 00 | 421. 58 40. 47 | 350. 00 473. 00 || 6 . LARG - Wi dth 24 24. 00 | 168. 83 7. 49 | 155. 00 184. 00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +CORRELAT I ON MATR I X

    | CYLI PUI S VI TE POI D LONG LARG- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -CYLI | 1. 00PUI S | 0. 86 1. 00VI TE | 0. 69 0. 89 1. 00POI D | 0. 90 0. 77 0. 51 1. 00LONG | 0. 86 0. 69 0. 53 0. 86 1. 00LARG | 0. 71 0. 55 0. 36 0. 70 0. 86 1. 00- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    | CYLI PUI S VI TE POI D LONG LARG

    The linear correlation coefficient points out the intensity of the relationship between twocontinuous variable. The coefficient correlation ranges from 1 to 1. The closer thecorrelation coefficient is to +1 or -1, the more closely the two variables are related.

    TEST - VA LUES M ATR I X

    | CYLI PUI S VI TE POI D LONG LARG- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -CYLI | 99. 99PUI S | 6. 35 99. 99

    VI TE | 4. 19 7. 06 99. 99POI D | 7. 14 4. 99 2. 74 99. 99LONG | 6. 42 4. 14 2. 90 6. 40 99. 99LARG | 4. 34 3. 05 1. 86 4. 25 6. 41 99. 99- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    | CYLI PUI S VI TE POI D LONG LARG

    This matrix is related to the previous one. SPAD translates the test of correlation in termsof test-value. In this example, the higher is the test-value, the more closely are the twovariables. We can consider that a test-value lower than 2 means no linear relationshipbetween the two variables.

    E I GENVA LUES

    COMPUTATI ONS PRECI SI ON SUMMARY : TRACE BEFORE DI AGONALI SATI ON. . 6. 0000SUM OF EI GENVALUES. . . . . . . . . . . . 6. 0000

    H I S TOGRAM OF THE F I RST 6 EI GENVALUES

    +- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| NUMBER | EI GENVALUE | PERCENTAGE | CUMULATED | || | | | PERCENTAGE | |+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 4. 6173 | 76. 96 | 76. 96 | ******************************************************************************** || 2 | 0. 8788 | 14. 65 | 91. 60 | **************** || 3 | 0. 3035 | 5. 06 | 96. 66 | ****** || 4 | 0. 1055 | 1. 76 | 98. 42 | ** || 5 | 0. 0732 | 1. 22 | 99. 64 | ** || 6 | 0. 0216 | 0. 36 | 100. 00 | * |+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

    In the second column (Eigenvalue) above, we find the variance on the new factors that

    were successively extracted. In the third column, these values are expressed as a percent ofthe total variance. As we can see, factor 1 accounts for 77 percent of the variance, factor 2for 15 percent, and so on. As expected, the sum of the eigenvalues is equal to the number

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    38/176

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    39/176

    Factorial Analyses with SPAD

    39

    AND ERSON 'S LAPLACE I NTERVALS

    W I T H 0 . 9 5 T HRESHO LD

    +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| NUMBER | LOWER LI MI T EI GENVALUE UPPER LI MI T |+- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1. 9486 4. 6173 7. 2860 || 2 | 0. 3709 0. 8788 1. 3868 || 3 | 0. 1281 0. 3035 0. 4789 || 4 | 0. 0445 0. 1055 0. 1665 || 5 | 0. 0309 0. 0732 0. 1154 |+- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +LENGTH AND RELAT I VE POS I T I ON OF I N TERVALS1 . . . . . . . . . . . . . . . . . * - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * .2 . . . * - - - - - - - - +- - - - - - - -*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 . *- - +- - *. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 *+* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 +*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Third and second differences as well as Andersons laplace intervals are other guidelinesto help the SPAD User to choose the number of dimensions to retain for further analyses.

    LOAD I NGS OF VAR I AB LES ON AXES 1 TO 5

    ACTI VE VAR I AB LES- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    VARI ABLES | LOADI NGS | VARI ABLE- FACTOR CORRELATI ONS | NORMED EI GENVECTORS- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -I DEN - SHORT LABEL | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -CYLI - Cubi c capaci t y| 0.96 0.01 - 0.15 0. 04 - 0. 23 | 0.96 0. 01 - 0. 15 0.04 - 0. 23 | 0.45 0. 01 - 0. 27 0.11 - 0.84PUI S - Power | 0. 90 0. 38 - 0. 02 - 0. 16 0. 04 | 0. 90 0. 38 - 0. 02 - 0. 16 0. 04 | 0. 42 0. 41 - 0. 03 - 0. 49 0. 15VI TE - Speed | 0. 75 0. 62 0. 20 0. 08 0. 04 | 0.75 0. 62 0. 20 0. 08 0. 04 | 0. 35 0. 66 0. 37 0. 26 0. 13POI D - Wei ght | 0.91 - 0.18 - 0.35 - 0. 06 0. 11 | 0.91 - 0. 18 - 0. 35 - 0.06 0. 11 | 0.42 - 0.19 - 0. 63 - 0.18 0. 42LONG - Length | 0. 92 - 0. 30 0. 05 0. 22 0. 07 | 0. 92 - 0. 30 0. 05 0. 22 0. 07 | 0. 43 - 0. 32 0. 10 0. 69 0. 26LARG - Wi dth | 0. 80 - 0. 48 0. 34 - 0. 14 - 0. 02 | 0. 80 - 0. 48 0. 34 - 0. 14 - 0. 02 | 0. 37 - 0. 51 0. 62 - 0. 42 - 0. 06- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    For normed PCA, correlations (variable factor) and loadings are equivalent.Apparently, the first factor is generally more highly correlated with the variables than thesecond factor. This is to be expected because, as previously described, these factors areextracted successively and will account for less and less variance overall.

    Normed eigen vectors are the coefficients that describe the linear relationship between theactive normed variables and the factors: in this example, we have:

    ...35.0)(

    )(42.0

    )(

    )(45.01 +

    +

    =

    PUISSTDEV

    PUISMeanPUIS

    CYLISTDEV

    CYLIMeanCYLIFactor

    Note:SPAD does not print out neither the contributions nor the cosinus for the active variables.However, it is possible to calculate them this way:

    ),(),( jLoadingjCos = for a normed PCA

    ),(),( jnCorrelatiojCos = for both normed and not normed PCA

    and),(),( jnVectorNormedEigejonContributi =

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    40/176

    PCA - Principal Component Analysis

    40

    FACTOR SCORES , CONTR I BUT I ONS AN D SQUARED COS I NES OF CASES

    AXES 1 TO 5+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +| CASES | FACTOR SCORES | CONTRI BUTI ONS | SQUARED COSI NES || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - || I DENTI FI ER REL. WT. DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +| Honda ci vi c 4. 17 4. 59 | - 2.01 0.32 0.50 - 0.44 - 0.10 | 3. 6 0. 5 3.4 7.6 0. 6 | 0.88 0.02 0. 05 0. 04 0.00 || Peugeot 205 Ral l ye 4. 17 7. 37 | - 2. 25 1. 49 0. 14 0. 09 0. 19 | 4. 6 10. 6 0. 3 0. 3 2. 1 | 0. 69 0. 30 0. 00 0. 00 0. 00 || Seat I bi za SX I 4. 17 4. 73 | - 1.92 0.94 - 0. 06 - 0.36 0.00 | 3. 3 4. 2 0.1 5.0 0. 0 | 0.78 0.19 0. 00 0. 03 0.00 || Ci t r on AX Spor t 4. 17 8. 78 | - 2.60 1.29 0. 47 - 0.32 - 0.15 | 6. 1 7. 9 3.0 4.0 1. 2 | 0.77 0.19 0. 02 0. 01 0.00 |

    | Renaul t 19 4. 17 0. 92 | - 0. 78 - 0. 16 0. 48 0. 20 - 0. 12 | 0. 6 0. 1 3. 1 1.6 0. 8 | 0. 66 0. 03 0. 25 0. 04 0. 01 || Fi at Ti po 4. 17 2. 18 | - 1.30 - 0.43 0. 43 - 0.22 - 0.10 | 1. 5 0. 9 2. 5 2. 0 0. 6 | 0.77 0.09 0. 08 0. 02 0.00 || Peugeot 405 4. 17 0. 71 | - 0. 30 - 0. 46 0. 21 0. 58 0. 16 | 0. 1 1. 0 0. 6 13. 1 1. 4 | 0. 12 0. 30 0. 06 0. 47 0. 04 || Renaul t 21 4. 17 0. 96 | 0. 15 - 0. 64 0. 01 0. 67 - 0. 21 | 0. 0 1. 9 0. 0 17. 8 2. 5 | 0. 02 0. 42 0. 00 0. 47 0. 05 || Ci t r on BX 4. 17 0. 54 | - 0. 52 - 0. 20 0. 17 0. 40 0. 04 | 0. 2 0. 2 0. 4 6. 2 0. 1 | 0. 50 0. 07 0. 06 0. 29 0. 00 || Opel Omega 4. 17 3. 25 | 1. 45 - 0. 79 0. 51 0. 31 0. 42 | 1. 9 3. 0 3.5 3. 7 10. 0 | 0. 64 0. 19 0. 08 0. 03 0. 05 || Peugeot 405 Br eak 4. 17 0. 55 | 0. 57 0. 13 0. 39 0. 15 0. 19 | 0. 3 0. 1 2. 0 0. 9 2. 1 | 0. 58 0. 03 0. 27 0. 04 0. 07 || Ford Si err a 4. 17 0. 82 | 0.70 - 0.43 0. 14 0. 30 0.16 | 0. 4 0. 9 0. 3 3. 5 1. 4 | 0.60 0.23 0. 02 0. 11 0.03 || Renaul t Espace 4. 17 1. 77 | 0. 86 - 0. 87 0. 20 - 0. 44 0. 13 | 0. 7 3. 6 0. 5 7. 7 0. 9 | 0. 42 0. 43 0. 02 0. 11 0. 01 || Ni ssan Vanett e 4. 17 4. 73 | - 0. 11 - 1. 69 - 1. 33 - 0. 05 0. 24 | 0. 0 13. 6 24. 4 0. 1 3. 3 | 0. 00 0. 61 0. 38 0. 00 0. 01 || VW Caravel l e 4. 17 7. 58 | 1.14 - 2.39 0. 21 - 0.69 - 0.06 | 1. 2 27. 1 0. 6 18.7 0. 2 | 0.17 0.75 0. 01 0. 06 0.00 || Audi 90 Quat t r o 4. 17 3. 43 | 1. 39 1. 10 0. 19 - 0. 03 0. 48 | 1. 7 5. 7 0. 5 0. 0 13. 0 | 0. 56 0. 35 0. 01 0. 00 0. 07 || BMW 530i 4. 17 15. 98 | 3. 88 0. 85 - 0. 35 - 0. 04 - 0. 30 | 13. 6 3. 4 1. 7 0. 1 5. 1 | 0. 94 0. 04 0. 01 0. 00 0. 01 || Rover 827i 4. 17 10. 52 | 3. 15 0. 75 0. 13 0. 05 - 0. 13 | 8. 9 2. 7 0. 2 0. 1 0. 9 | 0. 94 0. 05 0. 00 0. 00 0. 00 || Renaul t 25 4. 17 12. 39 | 3. 39 0. 57 0. 71 - 0. 23 0. 07 | 10. 4 1. 5 6.9 2. 1 0. 3 | 0. 93 0. 03 0. 04 0. 00 0. 00 || BMW 325i X 4. 17 8. 92 | 2. 20 1. 17 - 1. 59 - 0. 24 0. 32 | 4. 4 6. 5 34. 6 2. 3 6. 0 | 0. 54 0. 15 0. 28 0. 01 0. 01 || Ford Scorpi o 4. 17 8. 28 | 2.74 - 0.15 - 0. 19 0. 13 - 0.83 | 6. 8 0. 1 0. 5 0. 6 39. 1 | 0.91 0.00 0. 00 0. 00 0.08 || Fi at Uno 4. 17 14. 29 | - 3. 73 0. 03 - 0. 50 0. 19 0. 01 | 12. 6 0. 0 3. 5 1. 4 0. 0 | 0. 97 0. 00 0. 02 0. 00 0. 00 || Peugeot 205 4. 17 7. 70 | - 2. 60 0. 46 - 0. 72 0. 12 - 0. 39 | 6. 1 1. 0 7. 1 0. 6 8. 4 | 0. 88 0. 03 0. 07 0. 00 0. 02 || Ford Fi esta 4. 17 12. 99 | - 3.49 - 0.87 - 0. 13 - 0.11 - 0.03 | 11. 0 3. 6 0. 2 0.5 0. 1 | 0.94 0.06 0. 00 0. 00 0.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +

    DISTO: the distance between the case and the center of gravity of the overall sample. Thisis helpful to determine the Average cars, (close to the center of gravity) and the morespecific ones that are far from the center of gravity.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    41/176

    Factorial Analyses with SPAD

    41

    THE FACTORIAL GRAPH EDITOR

    To access the factorial graph editor, click on this icon .

    To create a new factorial graph, select Graph - New, the following windowappears:

    The preselection step allows you to select the different elements to display in the graph:

    Active or supplementary cases Active or supplementary variables

    If you forget to select an element, you have to create a new graph and redo thepreselection.

    THE TOOL BAR OF THE GRAPH EDITOR

    Points Total Delete Cancelselection Unselection the labels the ghosts

    Factors Framing Write Setselection selection the labels as ghost

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    42/176

    PCA - Principal Component Analysis

    42

    Information Vertical Correlationon points symmetric view circle

    Refresh Horizontalsymmetric view

    SAVE A GRAPH

    Internal saveis dependent on the chain.In the case of a re-execution of the chain, or the deletion by the user of the results of the

    chain, these internal saves are deleted.This type of save uses the commands:SaveSave as internal save of the graphics menu.

    When you save in internal format, you give a TITLE to the saved graphic.Later you can reload this save with the command Open Internal save graphics menu.

    The utility of the Save in Internal Format is that all the functions of the annotations andproperties of the factorial planes remain available.

    The save in archive formatis a save, which is independent of the chain.

    This type of save is made using the command Save as Save archive on the graphicsmenu.When saving in archive format, you give a NAME to the graphic saved with the obligatoryextension .GFA.

    Later, you can recover this save with the command Open -Save archive in the Graphics

    menu.This save is independent of the chain. Some formats are no longer possible in this type ofsave, in particular the formatting of cases.

    The editor for the factorial planes also lets you save the graphics in .BMPor .PCXformat.These images can then be inserted into a word processor document.TheEMF Metafileformat gives the best image quality.This type of Save is made with the command Save as - Screen Image BMP/PCX.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    43/176

    Factorial Analyses with SPAD

    43

    GENERAL PRINCIPLES

    The construction of a graphic after an analysis requires the following general principles:

    Go to the New Graphics Menu, which opens the pre-selections Dialogue Box.For a single analysis, you can open several graphics at once through the Graphics Menuand make different pre-selections. All the graphics you create can be saved in an internalor the archive format.

    To modify your graph, apply the following rule:

    Select the points with the tool bar or the selection menu Format them with the format menu Deselect to see the effect of the embellishments.

    IMPORTANTTo manipulate (move, change etc.) the labels and the texts on a graphic, enlarge the frame.For this you have to be in standard mode, that is: no selection mode button is highlighted,and the status bar is empty.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    44/176

    PCA - Principal Component Analysis

    44

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    45/176

    Factorial Analyses with SPAD

    45

    SCA - SIMPLE CORRESPONDENCE ANALYSIS

    This procedure performs a simple correspondence analysis (SCA) on a contingency tableor a table with non-negative numbers.

    Simple correspondence analysis is a powerful statistical tool for the graphical analysis ofcontingency tables.

    The result of a simple correspondence analysis is a two-dimensional graphicalrepresentation of the association between rows and columns of the table.The plot contains a point for each row and each column of the table. Rows with similarpatterns of counts produce points that are close together, and columns with similar

    patterns of counts produce points that are close together.

    Simple correspondence analysis analyzes a contingency table made up of one or morecolumn variables and one or more row variables.

    To illustrate this method, consider the following dataset, a typical two-dimensionalcontingency table. The data deal with the perception of different kinds of alcohol.

    Select the SPAD dataset ALCOOL.SBA and import it.

    PASTIS WHISKY MARTINI SUZE VODKA GIN MALIBU BEER

    Like the taste 49 50 42 18 25 23 25 59

    With friends 83 83 76 60 69 68 69 74

    To relax oneself 61 61 51 32 38 39 39 72

    Become expensive 60 88 42 41 75 70 61 19

    Refreshing 78 22 18 19 17 19 14 80

    Not elegant 26 11 13 17 13 11 13 29

    Friendly product 64 64 56 34 45 42 46 68

    Good before meals 88 79 85 64 45 46 37 41

    Good during the day 24 21 12 10 13 12 13 85

    Good during evening 7 61 12 11 53 50 48 54

    For all year long 83 87 85 79 83 82 80 90

    Liked by youngs 45 77 36 16 65 69 76 89

    Good for guests 88 92 87 60 70 67 67 81

    Oldy, not trendy 12 4 13 38 5 6 8 7As wel l for men as for women 50 62 69 43 49 51 61 60

    Close to me 38 41 27 11 16 18 17 49

    By habits 36 30 24 16 19 19 17 40

    Make snobish 3 35 9 8 28 25 21 4

    We can mix it 43 87 29 32 82 80 43 40

    For night life / bars / nightclubs 12 91 27 16 84 81 72 67

    http://www.soc.surrey.ac.uk/sru/SRU7.html#table1http://www.soc.surrey.ac.uk/sru/SRU7.html#table1
  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    46/176

    SCA - Simple correspondence analysis

    46

    The SETTING OPTIONS

    THE COLUMNSTAB

    Active frequencies: all

    THE ROWSTAB

    This tab is exactly similar to the Cases tabs available for the descriptive statisticsmethods.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    47/176

    Factorial Analyses with SPAD

    47

    THE PARAMETERSTAB

    In order to display the rowsresults in excel sheets, clickon the Options button

    and select Yes

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    48/176

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    49/176

    Factorial Analyses with SPAD

    49

    The following graph has been designed with the SPAD Amado procedure.Using the SCA results, rows and columns are ranked by decreasing first factorcoordinates. It gives a visual structure to the table. The width of a column is proportionalto its frequency.

    28

    84

    53

    82 75 6549

    8369 70

    45 38 4525 19 16 13 13 5

    17

    25

    81

    50

    80 70 6951

    8268 67

    42 39 4623 19 18 11 12 6

    19

    21

    7248 43

    6176

    6180 69 67

    46 39 3725 17 17 13 13 8 14

    35

    91

    6187 88 77

    6287 83 92

    64 6179

    5030 41

    11 21 422

    927

    1229

    42 36

    6985 76 87

    56 51

    85

    4224 27

    13 12 13 18

    8 16 1132 41

    1643

    7960 60

    34 32

    64

    18 16 11 17 1038

    19

    4

    6754

    4019

    89

    60

    9074 81 68 72

    4159

    40 49 29

    85

    7

    80

    3 12 743

    60

    4550

    83 83 8864 61

    88

    4936 38 26 24

    12

    78

    VODKA

    GIN

    MALIBU

    WHISKY

    MARTINI

    SUZE

    BEER

    PASTIS

    Makesnobish

    Fornightlife/bars/nightclubs

    Goodduringevening

    Wecanmixit

    Becomeexpensive

    Likedbyyoungs

    Aswellformenasforwomen

    Forallyearlong

    Withfriends

    Goodforguests

    Friendlyproduct

    Torelaxoneself

    Goodbeforemeals

    Likethetaste

    Byhabits

    Closetome

    Notelegant

    Goodduringtheday

    Oldy.nottrendy

    Refreshing

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    50/176

    MCA - Multiple Correspondence Analysis

    50

    MCA - MULTIPLE CORRESPONDENCE ANALYSIS

    The multiple correspondence analysis extends the simple correspondence analysisproperties to n-way tables.The procedure requires more than 2 active categorical variables, observed on a set of cases.As well as for the other factorial analyses, it is possible to add some supplementaryelements such as illustrative cases, illustrative continuous or categorical variables.

    We will perform the MCA on the ASPI1000.SBA dataset.

    VARIABLES DESCRIPTION OF THE ASPI1000.SBADATASET

    ACTIVE CATEGORICAL VARIABLES - 7 VARIABLES - 28 CATEGORIES

    11 . Gender ( 2 categories )29 . Do you own securities ? ( 2 categories )39 . Urban area size (number of inhabitants) ( 5 categories )49 . Job category ( 5 categories )51 . Diploma in 5 categories ( 5 categories )52 . Occupation status of housing in 4 categories ( 4 categories )53 . Age in 5 categories ( 5 categories )

    SUPPLEMENTARY CATEGORICAL VARIABLES - 35 VARIABLES - 152 CATEGORIES

    All available categorical variables

    SUPPLEMENTARY CONTINUOUS VARIABLES - 8 VARIABLES

    All available continuous variables

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    51/176

    Factorial Analyses with SPAD

    51

    The SETTING OPTIONS

    THE VARIABLESTAB

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    52/176

    MCA - Multiple Correspondence Analysis

    52

    THE PARAMETERSTAB

    Random assignment of active categories inferior to (in %)To assure the robustness of the analysis, it may be useful, on the definition of the

    axes of the analysis, to take into account only the categorical variables of a sufficientweight.For each question, the cases concerned by a weak total weight category will beassigned at random to one of the other categories of the variable with a sufficientweight in the question considered. This cleaning operation allows the data table toconserve its completely disjunctive property.

    The parameter PCMIN fixes the percentage of the total weight of the active casesbelow which a category is considered to have a weight too weak. If all the caseshave the weight 1, PCMIN is the percentage of the number of active cases below

    which a category will be broken down.

    If all the categories for a question (or all except one) have too weak weight, thequestion itself will be made illustrative for the calculation of the axes.The default value (2%) is suitable for most analyses. If the parameter is set to 0.0,only the categories with a null weight will be eliminated.

    Retained coordinatesThe number of retained coordinates is useful for the methods that follow the MCA

    in the chain. These methods can be DEFAC (factors description) and RECIP/SEMIS(clustering).

    By default, cases coordinatesare not displayed.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    53/176

    Factorial Analyses with SPAD

    53

    THE MCARESULTS

    MULTIPLE CORRESPONDENCE ANALYSIS

    E LI M I N A T I ON O F AC TI V E CA T EGORI ES W I T H SMA L L WE I GH T S

    THRESHOLD ( PCMI N) : 2. 00 % WEI GHT: 20. 00BEFORE CLEANI NG : 7 ACTI VE QUESTI ONS 28 ASSOCI ATE CATEGORI ESAFTER CLEANI NG : 7 ACTI VE QUESTI ONS 28 ASSOCI ATE CATEGORI ESTOTAL WEI GHT OF ACTI VE CASES : 1000. 00

    MARG I NA L D I S TRI BUT I ONS OF ACTI VE QUEST I ONS- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    CATEGORI ES | BEFORE CLEANI NG | AFTER CLEANI NGI DENT LABEL | COUNT WEI GHT | COUNT WEI GHT HI STOGRAM OF RELATI VE WEI GHTS,- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    11 . Gendermasc - mal e | 469 469. 00 | 469 469. 00 *** **** **** *** **** *** **** ****f mi - gender | 531 531. 00 | 531 531. 00 *** **** **** *** **** *** **** **** ***- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    29 . Do you own some securi t i es ?vmo1 - Yes | 121 121. 00 | 121 121. 00 ** **** **

    vmo2 - No | 879 879. 00 | 879 879. 00 *** **** **** *** **** *** **** **** *** **** **** *** **** **- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -39 . Ur ban area si ze ( number of i nhabi t ant s)

    agg1 - Lower t han 2. 000 | 83 83. 00 | 83 83. 00 *****agg2 - 2. 000 - 20. 000 | 87 87. 00 | 87 87. 00 ***** *agg3 - 20. 000 - 100. 000 | 175 175. 00 | 175 175. 00 ***** ******agg4 - greater t han 100. 000 | 329 329. 00 | 329 329. 00 ***** ******* ***** ***agg5 - Par i s | 326 326. 00 | 326 326. 00 *** **** **** *** **** **- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    49 . J ob categoryemp1 - Worker | 263 263. 00 | 263 263. 00 ***** ******* ****emp2 - Empl oyee | 335 335. 00 | 335 335. 00 ***** ******* ***** ****emp3 - Manager | 229 229. 00 | 229 229. 00 ** **** ** **** **emp4 - Ot her | 48 48. 00 | 48 48. 00 ==RAND. ASSI GN. == 49_ - mi ssi ng category | 125 125. 00 | 125 125. 00 ***** ***- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    51 . Di pl oma i n 5 categori esdi e1 - No one | 189 189. 00 | 189 189. 00 ***** *******di e2 - CEP | 321 321. 00 | 321 321. 00 *** **** **** *** **** **

    di e3 - BEPC- BE-BEPS | 158 158. 00 | 158 158. 00 ***** *****di e4 - Bac - Br evet sup. | 182 182. 00 | 182 182. 00 *** **** ****di e5 - Uni ver si t y | 150 150. 00 | 150 150. 00 *** **** ***- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    52 . Occupat i on st atus of housi ng i n 4 categori essl o1 - homeowner | 120 120. 00 | 120 120. 00 ** **** **sl o2 - owner | 290 290. 00 | 290 290. 00 *** **** **** *** ****sl o3 - t enant | 523 523. 00 | 523 523. 00 ******* ******* ******* ******** ***sl o4 - f r ee housi ng, other | 67 67. 00 | 67 67. 00 *** **- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    53 . Age i n 5 categori esagc1 - Lower t han 25 yo | 150 150. 00 | 150 150. 00 ***** *****agc2 - 25 t o 34 yo | 284 284. 00 | 284 284. 00 ***** ******* ***** *agc3 - 35 t o 49 yo | 209 209. 00 | 209 209. 00 ***** ******* *agc4 - 50 t o 64 yo | 188 188. 00 | 188 188. 00 ***** *******agc5 - 65 yo and more | 169 169. 00 | 169 169. 00 ** **** ** ***- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    54/176

    MCA - Multiple Correspondence Analysis

    54

    E I GENVA LUESCOMPUTATI ONS PRECI SI ON SUMMARY : TRACE BEFORE DI AGONALI SATI ON. . 2. 8571

    SUM OF EI GENVALUES. . . . . . . . . . . . 2. 8571

    H I S TOGRAM OF THE F I RST 2 0 E I GENVALUES+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| NUMBER | EI GENVALUE | PERCENTAGE | CUMULATED | || | | | PERCENTAGE | |

    +- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - +| 1 | 0. 2703 | 9. 46 | 9. 46 | ******************************************************************************** | | 2 | 0. 2369 | 8. 29 | 17. 75 | *********************************************************************** | | 3 | 0. 2084 | 7. 29 | 25. 05 | ************************************************************** || 4 | 0. 1922 | 6. 73 | 31. 77 | ********************************************************* || 5 | 0. 1846 | 6. 46 | 38. 23 | ******************************************************* || 6 | 0. 1578 | 5. 52 | 43. 76 | *********************************************** || 7 | 0. 1534 | 5. 37 | 49. 13 | ********************************************** || 8 | 0. 1493 | 5. 23 | 54. 35 | ********************************************* || 9 | 0. 1441 | 5. 04 | 59. 40 | ******************************************* || 10 | 0. 1398 | 4. 89 | 64. 29 | ****************************************** || 11 | 0. 1326 | 4. 64 | 68. 93 | **************************************** || 12 | 0. 1300 | 4. 55 | 73. 48 | *************************************** || 13 | 0. 1284 | 4. 49 | 77. 97 | ************************************** || 14 | 0. 1222 | 4. 28 | 82. 25 | ************************************* || 15 | 0. 1070 | 3. 74 | 86. 00 | ******************************** || 16 | 0. 1015 | 3. 55 | 89. 55 | ******************************* || 17 | 0. 0954 | 3. 34 | 92. 89 | ***************************** || 18 | 0. 0821 | 2. 87 | 95. 76 | ************************* |

    | 19 | 0. 0748 | 2. 62 | 98. 38 | *********************** || 20 | 0. 0462 | 1. 62 | 100. 00 | ************** |+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - +

    RESEARCH OF I RREGULAR I T I ES ( TH I RD D I F FERENCES)+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| I RREGULARI TY | I RREGULARI TY | || BETWEEN | VALUE | |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 5 - - 6 | - 27. 77 | **************************************************** || 14 - - 15 | - 10. 42 | ******************** || 17 - - 18 | - 6. 67 | ************* || 13 - - 14 | - 5. 44 | *********** || 10 - - 11 | - 3. 77 | ******** || 2 - - 3 | - 3. 66 | ******* || 8 - - 9 | - 1. 53 | *** |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

    RESEARCH OF I RREGULAR I T I ES ( SECOND D I FFERENCES )+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

    | I RREGULARI TY | I RREGULARI TY | || BETWEEN | VALUE | |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 5 - - 6 | 22. 31 | **************************************************** || 2 - - 3 | 12. 28 | ***************************** || 14 - - 15 | 9. 83 | *********************** || 3 - - 4 | 8. 62 | ********************* || 1 - - 2 | 4. 94 | ************ || 10 - - 11 | 4. 67 | *********** || 11 - - 12 | 0. 90 | *** || 8 - - 9 | 0. 81 | ** || 6 - - 7 | 0. 40 | * |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

    Irregularity 2nddiff between 5 and 6 = [ ( 7 6 ) ( 6 5 ) ] * 1000

    The two tables below are the equivalent of the scree test (or Cattel test).This procedure detects the main irregularities in the graph and ranks them by decreasingimportance.

  • 8/10/2019 SPAD7 Data Miner Guide.pdf

    55/176

    Factorial Analyses with SPAD

    55

    LOAD I NGS , CONTR I BUT I ONS AND SQUARED COS I N ES OF ACTI VE CATEGOR I ES

    AXES 1 TO 5+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +| CATEGORI ES | LOADI NGS | CONTRI BUTI ONS | SQUARED COSI NES || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - || I DEN - LABEL REL. WT. DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 11 . Gender || masc - mal e 6.70 1.13 | - 0.29 0. 08 0.43 - 0.47 - 0.25 | 2.1 0.2 6.0 7. 6 2. 3 | 0.07 0.01 0.16 0.19 0.06 || f mi - gender 7.59 0.88 | 0.26 - 0.07 - 0.38 0. 41 0.22 | 1.8 0.2 5. 3 6. 7 2. 0 | 0.07 0.01 0.16 0.19 0.06 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 3. 9 0. 3 11. 2 14. 4 4. 3 +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 29 . Do you own some secur i t i es ? |

    | vmo1 - Yes 1.73 7.26 | 0.69 1.46 - 0.25 - 0.23 0.06 | 3.1 15. 5 0.5 0. 5 0.0 | 0.07 0.29 0.01 0.01 0.00 || vmo2 - No 12. 56 0.14 | - 0.10 - 0.20 0.03 0.03 - 0.01 | 0.4 2. 1 0.1 0. 1 0.0 | 0.07 0.29 0.01 0.01 0.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 3. 5 17. 6 0. 6 0. 6 0. 0 +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 39 . Urban area si ze ( number of i nhabi t ant s) || agg1 - Lower t han 2.000 1.19 11. 05 | -1. 06 0.83 - 1.06 0.75 - 0.06 | 5.0 3.4 6.4 3.5 0.0 | 0. 10 0.06 0.10 0.05 0.00 || agg2 - 2.000 - 20. 000 1.24 10. 49 | - 0.55 0.26 0.28 0.80 - 0.61 | 1.4 0.3 0.5 4.2 2.5 | 0.03 0.01 0.01 0.06 0.04 || ag