Upload
alejandro-cross
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
Quantitative Tools for Qualitative Data
Richard Bell
University of Melbourne
For copies of this presentation-130 slides-(about 500kb in a zipped file)
email: [email protected]
What kind of Qualitative Data can be Analysed?
• Not raw continuous text data
• Discrete text units that are replicated
• Any kind of coding that has been made
What does the data have to look like
• It must be able to be represented by a table
• not necessarily a two-way table
• for example– Giegler & Klein coding of personal
advertisements– a four-way table: magazine, sex, concept,
category
Categorization
Magazine Sex Concept Fitness Compassion Figure Values Erotic
Z F Self 44 99 50 11 101
Z F Seeking 41 12 9 11 85
Z F Relationship 6 0 12 5 3
Z M Self 67 97 67 18 207
Z M Seeking 80 9 11 9 37
Z M Relationship 1 0 3 4 1
WN F Self 8 14 17 18 107
WN F Seeking 19 1 4 38 59
WN F Relationship 20 0 0 3 0
WN M Self 9 7 4 3 42
WN M Seeking 11 2 6 3 19
WN M Relationship 1 0 1 0 0
Giegler & Klein data as a four-way table
Here, data is stored as a table, the first 4 columns define the cells,
the last column gives the frequency in the cell.
To analyse this data at a case level
Use the SPSS WEIGHT BY function ie
WEIGHT BY FREQ.
Kinds of tables
• Rows are participants, columns are categories
• Rows are categories, columns are participants
• Rows are one set of categories, columns are another set of categories
Data in cells of table
• Indicator to indicate present/absence of relationship between rows and columns
• Frequencies or counts of indicators
• Values of categories
CATEGORY PARTICIPANT
Neuropsychological experience 1 2 3 4 5 6 7 8 Problems with concentration
• • • • • • •
Takes longer to learn new information/ get something in my mind.
• •
Gaps in long term memory/ difficulty accessing memories.
• •
Thinking more affected by emotions • • •
Less capacity to express oneself creatively.
•
Occasional difficulty finding a word or name in conversation
• •
Vague complaint of something not being quite right, not being able to think too well.
•
Category 2: Other experiences Affects everything • • • Ability to do things • • • • • • • • Problems with medications • • • • • Independence • Relationships and social life • • • • • • • • Speech and communication • • • • • • • • Quality of sleep • • • • Work/employment • • • • • • Category 3: amplifying factors for neuropsychological experiences (as well as other experiences)
Personality • • Stress or feeling under pressure/ emotional reaction
• • • • •
work /activity schedule • • • • • previous cognitive ability • Social interaction where there are several conversations to follow
• •
Noisy environment • Category 4: general strategies towards difficulties, which also help neuropsychological experiences
Reviewing things, planning & prioritising, limiting activities
• • • • •
Succeeding with activities, sense of achievement helps/ confronting the problem
• • • • •
Writing things down • • • • •
Indicator of present/absence of relationship between rows and columns
Data from Huber (1997)
Site A B C D E F G H ISIN 0.11 0.07 0.09 0.13 0.15 0.16 0.15 0.12 0.12SGR 0.07 0.05 0.09 0.13 0.18 0.16 0.2 0.18 0.2TCO 0.5 0.46 0.43 0.34 0.33 0.28 0.26 0.25 0.2TIN 0.17 0.21 0.19 0.09 0.13 0.2 0.13 0.16 0.16TGR 0.07 0.09 0.11 0.14 0.1 0.07 0.13 0.16 0.13TOP 0.09 0.12 0.09 0.17 0.11 0.13 0.12 0.14 0.18
SIN: Students learn as self-regulated individualsSGR: Students learn in autonomous groupsTCO: Teacher is in controlTIN: Teacher dominates, but allows some individual autonomyTGR: Teacher dominates, but allows some small group autonomyTOP: Teacher dominates, but is open to students' initiatives
Proportions of Activities by Site (Frequencies)
Group Interaction Intensity
Interaction Frequency
Feeling of Belonging
Physical Proximity
Relationship Formality
Crowd slight slight none close formal Audience low nonrecurring slight close formal Public slight slight slight distant no
relationship Mob high nonrecurring high close informal Family high frequent high close informal Relatives moderate infrequent variable distant formal Community low infrequent variable close formal
Values of categories
Getting data into statistical packages such as SPSS
• Transfer data directly from qualitative packages such as Nvivo
• Use SPSS text-import wizard (best with precoded data, ie numbers)
• Enter data by hand
Transfer data from qualitative packages
• Need to be able to export tables.
• Should only be done for tables where rows (or columns) are units of analysis (ie documents or respondents)
• should be saved as a text file (ie has the extension .txt as in table1.txt)
Transfer data from printed table
• Type into SPSS
• Transfer table from Word document– Word document to Excel spreadsheet– Excel spreadsheet to SPSS spreadsheet
The Table in Word
1. Remove the heading row
1. Remove the headings
Move subheadings into a column
Insert a new column into the table
Copy subheading into empty column cells that subheading applied to
Shorten text and insert headings in columns that willbecome SPSS variable names (ie < 9 characters no spaces
Select table and copy to clipboard
Open Excel
Paste from clipboard
Save spreadsheet
Open SPSS
Under the File pull down to Open new Data
Change the file type to Excel files [.xls]
And open the saved Excel spreadsheet
If you have names as column headings in the first row of the Excel spreadsheetSPSS can read them as its variable names
SPSS opens the file(the variable view)
The data view
Notice there are dud lines in this file-they need to be edited out
The file fixed up
Now we need to change oura) repeated phrases (variable ‘type’)b) symbols (variables p1 to p8)into numbers
Do this thru Automatic Recodeunder the ‘Transform’ tab
Need to create a numeric variableinto which values of alphanumericvariable are transformed(alphanumeric values saved as labels)
Transferring Cross-Category tables into SPSS
[where Rows are one set of categories, columns are another set of categories]
• Three types of table:– Cells of the table contain frequencies– Cells of the table contain other data– Cells of the table contain binary indicator
(yes/no, true/false, present/absent etc)
Transferring Frequency Tables: 1
If only two dimensions to table (rows are categories of one variable, columns are categories of another)
– can feed table straight in as table• easy but won’t have labelled output
– feed table in cell by cell (as for more complex tables)
• more complex but allows for labelled output and other possibilities
Feeding table in as table
• Only have cells of table as data
• Can only run one procedure (correspondence analysis) via syntax.
Feeding table in cell by cell
• Have to use syntax (data list function)data list free
/ block slice row column frequency.
begin data.
1 1 1 287
1 1 2 143
1 2 1 94
1 2 2 23
end data.
Data list FREE / EMS PMS GENDER MARSTAT FREQ.Weight by freq.Begin data.1 1 1 1 171 1 1 2 41 1 2 1 281 1 2 2 111 2 1 1 361 2 1 2 41 2 2 1 171 2 2 2 42 1 1 1 542 1 1 2 252 1 2 1 602 1 2 2 422 2 1 1 2142 2 1 2 3222 2 2 1 682 2 2 2 130end data.
Var labels EMS, 'Extramarital Sex'/ PMS, 'Premarital Sex' / GENDER, 'Gender' / MARSTAT,'Marital Status'.Value labels EMS, PMS, 1 'Yes' 2 'No' / GENDER, 1 'Women' 2 'Men' /
MARSTAT, 1 'Divorced' 2 'Still Married'.
‘Traditional’ Quantitative Methods for Qualitative Data
• Miles & Huberman (1994)– hierarchical cluster analysis
• Giegler & Klein (1994)– correspondence analysis
• Bazely (2002)– cluster analysis– correspondence analysis
Cluster Analysis
Figure 9.11 (p.203) from Graham Gibbs (2002)Qualitative data Analysis: Explorations with Nvivoas an SPSS data file
Cluster Analysis: Solution I
Dendrogram using Average Linkage (Between Groups): Chi-square measure
Rescaled Distance Cluster Combine
C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+
Worklink 10 Youth Training 11 Adult training 1 Redundancy Counselli 6 Start Up Business un 7 Training Access Poin 8 Workers Coops 9 Business Access Sche 3 Careers & Education 4 BCETA 2 Careers Information 5
Cluster Analysis: Solution II
Dendrogram using Average Linkage (Between Groups): Anderberg’s D Measure
Rescaled Distance Cluster Combine
C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+
Careers & Education 4 Training Access Poin 8 BCETA 2 Start Up Business un 7 Careers Information 5 Adult training 1 Worklink 10 Youth Training 11 Workers Coops 9 Redundancy Counselli 6 Business Access Sche 3
Cluster Analysis: Solution III
Dendrogram using Single Linkage
Rescaled Distance Cluster Combine
C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+
Careers & Education 4 Training Access Poin 8 Worklink 10 Youth Training 11 Workers Coops 9 Business Access Sche 3 BCETA 2 Start Up Business un 7 Careers Information 5 Adult training 1 Redundancy Counselli 6
Cluster Analysis
• Varies according with coefficient chosen as measure of association between rows (or columns)
• Varies according to method of clustering
• Use with extreme caution
Other Quantitative Methods
• Find weights for categories of variable that maximize relationships between variables
• correspondence analysis– finds weights for categories of row and
categories of column
• also traditional least-squares procedures– eg regression, principal components & others
Correspondence Analysis
• Similar to principal components• Originally derived for tables of frequencies
– [for statistics to apply need one respondent per cell, but can be used with multiple responses across cells]
• but can be used with indicator data• Can produce separate maps of relationships
between categories of rows or columns• Can produce a joint map of categories of rows or
columns
Giegler & Klein
• Examined personal advertisements
• in a number of German magazines
• eg
• Young man, 35 y, 176cm, slim with car, good income, looks for a lovely high-bosomed and well-developed partner for a common future.
Categories of the content analysis.
1. CI cultural interests Bach, Picasso, Rilke, opera 2. IM intellectual mobility brain, quick-witted 3. AP academic professions doctor, lawyer, dentist 4. HEC high economic status villa, yacht, car 5. FB fitness of body sport, strong, robust 6. CLC compassion, life crises poverty, lonely 7. SEX sex tolerant, hot, leather, lover, horny 8. BA body attributes attractive, slim, pretty, well developed 9. HIP high image profession business man, head-clerk10. IV inner values honest, sincere, frank, heart11. SBE social behav. erotic flirt, candle light, romantic, necking12. SB social behaviour friend, life long, emancipated, eager to help13. FO family orientation mother, father, children, marriage, family14. HT holiday, travel abroad, mountains, nature, south, journeys15. NAT nationality African, Turk, foreigner16. 30Y 17-30 years17. 45Y 31-45 years18. 60Y 46-60 years19. OLD over 60 years20. PO pleasure orientation hedonistic, gourmet, wine, good meal21. SDN smoking, drinking negative abstinent, non drinker, cigarette free22. SOC social activities parties, friends, pub, cinema23. SIN single24. DW divorced, widowed
CI IM AP HEC FB CLC SEX BA HIP IV SBE SB FO HT NAT 30Y 45Y 60Y OLD PO1001 2 2 1 0 1 3 0 1 0 1 2 0 1 1 0 1 0 0 0 21002 2 1 0 0 1 0 0 0 0 1 1 1 0 1 1 0 0 1 0 1
Data:One row per adEach column contains number of instancesfor each coding category
ie Each ad will appear a number of times in the cell of any table – total frequency of table is the number ofcodings not the number of ads
Cut-down version of Giegler & Klein exampleCategory Magazine
Z WN WAZ TIP EXP H&W
High SES 234 39 27 14 29 27
Fitness 239 68 58 44 55 44
Compassion 217 24 43 19 35 13
Sex 49 9 52 54 71 127
Figure 152 32 57 49 85 46
Image 43 41 25 30 36 90
Values 58 65 16 14 6 45
Erotic 434 227 303 313 268 374
Friend 303 104 132 182 197 224
Family 515 197 291 282 344 353
Travel 260 149 111 98 90 130
30yo 208 57 135 116 143 283
45yo 132 20 58 52 54 116
60yo 37 10 11 32 9 31
Old 36 108 85 68 97 44
Hedonist 165 124 187 146 156 127
Wowser 70 10 99 134 113 160
Social 14 1 32 45 29 148
Single 56 13 23 9 12 18
Separated 54 15 26 22 16 81
Correspondence Analysis
• In SPSS one of the data reduction options (like factor analysis) as Correspondence Analysis [can be run as syntax or point-and-click]
• also a syntax-only option called ANACOR which is more limited but can analyse a table directly when the only data in the SPSS spreadsheet is the table frequencies.
data list free / A B C D E F G H J.begin data.
0.11 0.07 0.09 0.13 0.15 0.16 0.15 0.12 0.12 0.07 0.05 0.09 0.13 0.18 0.16 0.2 0.18 0.2
0.5 0.46 0.43 0.34 0.33 0.28 0.26 0.25 0.2 0.17 0.21 0.19 0.09 0.13 0.2 0.13 0.16 0.16 0.07 0.09 0.11 0.14 0.1 0.07 0.13 0.16 0.13 0.09 0.12 0.09 0.17 0.11 0.13 0.12 0.14 0.18end data.do repeat xs = A to J.compute xs = xs * 100.end repeat.
ANACOR TABLE = ALL (6,9).
ANACOR syntax example: Huberman proportions table shown earlier
Indicates data values separated by spaces
Identifies columns
}Changes data values from proportions to percentages
Simplest ANACOR syntax (just identifies numbers of rows & columns)
Correspondence AnalysisThe point-and-click way
DimensionSingular
Value Inertia Chi Square Sig.
Proportion of InertiaConfidence Singular Value
Accounted for CumulativeStandard Deviation
Correlation
2
1 .299 .089 .576 .576 .008 .161
2 .198 .039 .252 .828 .009
3 .141 .020 .129 .956
4 .062 .004 .024 .981
5 .055 .003 .019 1.000
Total .155 1948.580 .000(a) 1.000 1.000
a. Five possible dimensionsb. Singular value – square root of eigenvaluec. Inertia – eigenvalues (variance)d. Chi-square – could be partitioned between dimensions
(only valid if cells in table are independent)
a. b. c. d.
-How many dimensions?-Fit of Solution
Row and Column Points
Symmetrical Normalization
Dimension 1
2.01.51.0.50.0-.5-1.0-1.5
1.2
.8
.4
0.0
-.4
-.8
-1.2
-1.6
Content
Magazine
Separated
Single
Social
Wowser
Hedonist
Old
60yo
45yo
30yo
.
Travel
Family
.Erotic
Values
ImageFigure
Sex
Compassion
Fitness
High SES
H&W
. TIPWAZ
WN
Z
Score in Dimension
Inertia
Contribution
Of Points to Inertia of Dimension
Of Dimension to Inertia of Point
Mass 1 2 1 2 1 2 Total
Z .264 -.779 .380 .056 .537 .193 .863 .136 .999
WN .106 -.349 -.939 .030 .043 .473 .130 .625 .755
WAZ .143 .137 -.276 .007 .009 .055 .123 .331 .454
TIP .138 .391 -.166 .011 .071 .019 .559 .067 .625
EXP .149 .213 -.218 .010 .023 .036 .198 .137 .335
H&W .199 .691 .472 .042 .318 .224 .676 .208 .884
Active Total
1.000 .155 1.000 1.000
Details for Magazines
Location in spatial
representation
Different ways ofdescribing fit ofeach magazine
Content Mass Score in Dimension 1 Contribution
1 2
Of Point to Inertia of Dimension Of Dimension to Inertia of Point
2 1 2 Total
High SES .029 -1.463 .670 .022 .211 .067 .873 .121 .995
Fitness .040 -.939 .125 .011 .119 .003 .985 .011 .996
Compassion .028 -1.407 .626 .019 .185 .055 .859 .113 .971
Sex .029 .830 .438 .007 .066 .028 .801 .147 .949
Figure .034 -.418 .085 .004 .020 .001 .487 .013 .501
Image .021 .470 .012 .004 .016 .000 .360 .000 .360
Values .016 -.456 -.639 .010 .011 .034 .105 .137 .242
Erotic .153 .109 -.172 .002 .006 .023 .262 .437 .699
. .091 .040 .061 .001 .000 .002 .034 .050 .084
Family .158 -.004 -.063 .001 .000 .003 .001 .109 .110
Travel .067 -.367 -.278 .005 .030 .026 .493 .188 .681
30yo .075 .384 .384 .006 .037 .056 .548 .363 .911
45yo .034 .079 .582 .002 .001 .059 .026 .944 .970
60yo .010 .130 .350 .002 .001 .006 .030 .145 .175
Old .035 .181 -1.417 .015 .004 .354 .023 .955 .978
Hedonist .072 .118 -.578 .006 .003 .122 .048 .756 .804
Wowser .047 .814 .160 .012 .103 .006 .767 .020 .786
Social .021 1.482 .970 .020 .157 .102 .721 .204 .926
Single .010 -.676 .275 .002 .016 .004 .742 .081 .823
Separated .017 .379 .717 .004 .008 .044 .192 .455 .647
Active Total 1.000 .155 1.000 1.000
Similar Fit information for ad categorizations
More Complex versions…
• Sometimes known as Multiple Correspondence Analysis
• HOMALS HOMogeneity analysis by Alternating Least Squares
• For example
• The complete data structure of Giegler & Klein
Categorization
Magazine Sex Concept Fitness Compassion Figure Values Erotic
Z F Self 44 99 50 11 101
Z F Seeking 41 12 9 11 85
Z F Relationship 6 0 12 5 3
Z M Self 67 97 67 18 207
Z M Seeking 80 9 11 9 37
Z M Relationship 1 0 3 4 1
WN F Self 8 14 17 18 107
WN F Seeking 19 1 4 38 59
WN F Relationship 20 0 0 3 0
WN M Self 9 7 4 3 42
WN M Seeking 11 2 6 3 19
WN M Relationship 1 0 1 0 0
Giegler & Klein data as a four-way table
Multiple Correspondence Analysis
[HOMALS]
Dimension 1
2.01.51.0.50.0-.5-1.0-1.5
Dim
en
sio
n 2
.8
.6
.4
.2
0.0
-.2
-.4
-.6-.8
Category
Concept
Sex
Magazine
Erotic
Values
Figure
Compassion
Fitness.
Relationship
Seeking
Self
M
F WN
Z
Some other questions
• How well could we predict magazine usage from the other factors?
• Could use – multinomial regression if cells independent
(and sample size very large)– categorical regression if just want to look at
effects
A new issue:The kind of
transformation to be chosen
Kinds of tranformations
• Depends on what we want to assume• Not inherent in the data• Basic Kinds
– Nominal - Categorical (unordered categories)– Ordinal (Assumes data are ordered)– Numeric -Interval (Assumes data on a scale
with equal intervals)
• Recent advance– Spline (smoothes ordinal & nominal
transformations)
Model Summary
Multiple R R SquareAdjusted R
Square
.338 .115 .113
Dependent Variable: MAGAZINEPredictors: SEX CONCEPT CATEGORY
Standardized Coefficients
Beta Std. Error
df F-ratio Prob
SEX -.195 .008 2 535.707 .000
CONCEPT -.034 .008 3 16.377 .000
CATEGORY .273 .008 20 1052.267
.000
Transformation: MAGAZINE
Optimal Scaling Level: Nominal.
Categories
H&WEXPTIPWAZWNZ
Qu
an
tific
atio
ns
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
Another example: How do characteristics distinguish among
groups?
• Famous example
• (Not real)
GROUPInteraction Intensity
Interaction Frequency
Feeling of Belonging
Physical Proximity
Relationship Formality
Crowd slight slight none close formal
Audience low nonrecurring slight close formal
Public slight slight slight distant no relationship
Mob high nonrecurring high close informal
Family high frequent high close informal
Relatives moderate infrequent variable distant formal
Community low infrequent variable close formal
Summary of a qualitative analysis of the characteristics of groups as postulated by Gutman from Bell & Sirjamaki (1962)
Object Scores Labeled by GROUP
Cases weighted by number of objects.
Dimension 1
1.51.0.50.0-.5-1.0-1.5-2.0
Dim
en
sio
n 2
2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
CommunityRelatives
FamilyMob
Public
Audience
Crowd
Quantifications
Dimension 1
1.51.0.50.0-.5-1.0-1.5-2.0
Dim
en
sio
n 2
2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
Relationship Formali
ty
Physical Proximity
Feeling of Belonging
Interaction Frequenc
y
Interaction Intensit
y
informal
formal
no relations
closedistanthigh
variable
slight
none
frequent
infrequent
nonrecurring
slight
high
moderate
low
slight
Category Quantifications
• Here the data were all treated as nominal• Dimensions were quantification values• Different quantifications for different
dimensions• Only possible for nominal data• Other (ordinal, numeric) must have same
quantification on each dimension. Nominal can also be similarly restricted.
For example: Using regression
• Make the group the dependent variable • Other nominal variables cannot be multiple-
nominal because regression coefficients are unidimensional
• Use other variables to predict group– Artificial example few cases relatively many variables
will give perfect prediction
– Can still compare prediction & evaluate categories
Predictors of
Group
Standardized Coefficients
Beta
Interaction Intensity -1.084
Interaction Frequency .689
Feeling of Belonging 1.219
Physical Proximity -.209
Relationship Formality .060
Transformation: GROUP
Optimal Scaling Level: Nominal.
Categories
CommunityRelativesFamilyMobPublicAudienceCrowd
Qu
an
tific
atio
ns
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
-2.5
Transformation: Interaction Frequency
Optimal Scaling Level: Nominal.
Categories
frequentinfrequentnonrecurringslight
Qu
an
tific
atio
ns
2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
Transformation: Feeling of Belonging
Optimal Scaling Level: Spline Nominal (degree 2, interior knots 2).
Categories
highvariableslightnone
Qu
an
tific
atio
ns
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
Transformation: Relationship Formality
Optimal Scaling Level: Nominal.
Categories
informalformalno relationship
Qu
an
tific
atio
ns
2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
Principal Components: Demographics
• Age Group [treat as ordinal]
• Education Level [treat as ordinal]
• Marital Status [ nominal ]
• Work Status [ nominal – allow different quantications for different dimensions]
Transformation: Age Group
Optimal Scaling Level: Spline Ordinal (degree 2, interior knots 2).
Variable Principal Normalization.
Categories
60-64yrs
55-59yrs
50-54yrs
45-49yrs
40-44yrs
35-39yrs
30-34yrs
25-29yrs
20-24yrs
18-19yrs
.
Qu
an
tific
atio
ns
4
2
0
-2
-4
-6
Transformation: Work Status
Dimension 1
Optimal Scaling Level: Multiple Nominal.
Variable Principal Normalization.
Categories
OtherUnemployedDon't workPart-timeFull-time
Qu
an
tific
atio
ns
1.0
.5
0.0
-.5
-1.0
Transformation: Work Status
Dimension 2
Optimal Scaling Level: Multiple Nominal.
Variable Principal Normalization.
Categories
OtherUnemployedDon't workPart-timeFull-time
Qu
an
tific
atio
ns
2
1
0
-1
-2
-3
Transformation: Marital Status
Optimal Scaling Level: Nominal.
Variable Principal Normalization.
Categories
Single
Widowed
Divorced-Separated
living together
2nd marriage
1st marriage
Qu
an
tific
atio
ns
2
1
0
-1
-2
-3
Component Loadings and Centroids
Variable Principal Normalization.
Dimension 1
1.51.0.50.0-.5-1.0-1.5-2.0-2.5
Dim
en
sio
n 2
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
-2.5
Work Status
Component Loadings
Other
Unemployed
Don't work
Part-time
Full-time
Marital Status
Education LevelAge Group
Combining Qualitative & Quantitative Data
• The availability of numeric and other transformations
• makes the combining of quantitative & qualitative data
• simple
Combining Qualitative & Quantitative Data
• Use Categorical Regression setting measurement levels appropriately
• Use Categorical Principal Components setting measurement levels appropriately
• Save transformed variables and use ordinary regression or factor analysis for better options (eg hierarchical regression or factor rotation)
Combining Qualitative & Quantitative Data
• Preserve independence of sets of data
• Generalized (more than two sets) non-linear canonical variate analysis
• OVERALS
OVERALS• A tool for relating sets of variables
• Variant that is a common statistical model is canonical variate analysis (producing a canonical correlation between two sets of variables
• OVERALS – Allows for more than two sets– Allows variables to be numeric, categorical or
ordinal
A current data set
• PhD project by Simone Pica
• People with psychosis featuring social withdrawal– 19 young people suffering from psychosis with
symptoms of social withdrawal– Unstructured interviews– Standard psychiatric measures also completed
Data
• Interviews transcribed, categories formed from content, coding made
• Diagnosis (DSM III-R)
• Scores on quantitative measures– Premorbid Adjustment Scale (PAS)– Symptoms of Negative Schizophrenia (SANS)
Raw materialUm, when I got home I thought it was probably a good thing I didn’t
go because um, it sort of relates to motivation as well, I wasn’t really that motivated to go out and deal with people and stuff. If more of my friends were there, I’d probably would have gone, if it was a party and all my friends were there I would have thought cool you know, I’d have to go even if I only had a few dollars, that’s cool, I can go without drinks, cigarettes, I’d just want to be there you know but probably because there would have been only a couple of people I would have known there and the rest of them I wouldn’t have known. I sort of thought no, I wouldn’t have a good time because if I wanted to meet people, I like meeting people, but when I meet people I always have to talk about my psychosis, and whenever I have to talk about my psychosis, its like everyone is listening you know, and they all just stop what they are doing and they listen, “psychosis, what is that?” and then I have to explain everything about it and they are all listening type of thing, honing in type of thing.
Classified material• 3. EXPERIENCED DIFFICULTY COMMUNICATING• He couldn’t talk because he became jumbled, he couldn’t focus
on one thing he kept thinking about whether his ex-friend was going to mention the letter to other people there
• He stayed in small groups of people throughout the evening in order to avoid saying something inappropriate that would draw attention to him
• When he felt comfortable he found it easier to talk• He found that the comfortable feeling didn’t last, it wore off when
the ‘wall’ came and he found it difficult to think of things to talk about
• When he was with the group of people he didn’t know what to talk to people about so he remained silent
• He didn’t know what to talk about because he couldn’t think of anything intelligent to say
• When he was with people and he didn’t know what to talk about his mind was blank, he didn’t think anything
felt differentstressed
uncomfortabledifficulty
communicatingconcern about others
views of them
1 Absent Present Absent Absent
2 Absent Present Present Present
3 Absent Absent Absent Present
4 Present Absent Absent Present
5 Present Present Present Present
6 Present Absent Present Present
Qualitative Data: eg Presence of categories in interview transcripts
DSM-IIIR diagnosis Frequency Percent Cumulative Percent
Schizophrenic 11 55.0 57.9
Schizophreniform 3 15.0 73.7
Schizoaffective 2 10.0 84.2
Delusional 2 10.0 94.7
Bipolar 1 5.0 100.0
Qualitative measures: eg DSM diagnosis
PAS Child PAS Adolesc PAS Adult
14 6 9
21 6 8
35 8 11
46 5 4
54 5 5
64 6 7
74 4 5
Quantitative Measures: eg Premorbid Adjustment Scales
DIM1
1.0.8.6.4.20.0-.2-.4
DIM
21.0
.8
.6
.4
.2
0.0
-.2
-.4
-.6
-.8
SET
Diagnosis
Interview
SANS
PAS
DSM-IIIR Dimension 2
DSM-IIIR Dimension 1
self boring
want to be alone
shy/inferior
stigma judged reject
.concern others views
.stressed incommunica
Attention
Anhedonia
Avolition
Alogia
Affect Adult
Adolesc
Child
Dimension 1 Transformation Plot for DSM-IIIR
DSM-IIIR
BipolarDelusionalSzoaffectiveSzphreniformSzphrenic
Ca
teg
ory
Qu
an
tific
atio
ns
for
DS
M-I
IIR
1
0
-1
-2
-3
Dimension 2 Transformation Plot for DSM-IIIR
DSM-IIIR
BipolarDelusionalSzoaffectiveSzphreniformSzphrenic
Ca
teg
ory
Qu
an
tific
atio
ns
for
DS
M-I
IIR
2
1
0
-1
-2
-3
Fit of Solution
Summary of Analysis
Dimension Sum 1 2 Loss Set 1 .220 .545 .764 Set 2 .359 .267 .626 Set 3 .284 .302 .585 Set 4 .119 .326 .445 Mean .245 .360 .605Eigenvalue .755 .640 Fit 1.395
Summary of Analysis
Dimension 1 2 SumLoss PAS .220 .545 .764 31.6% SANS .359 .267 .626 25.9% Text .284 .302 .585 24.1% DSM .119 .326 .445 18.4% Mean .245 .360 .605 (Loss) 30%Eigenvalue .755 .640 1.395 (Fit) 70%Total 1.000 1.000 2.000 100%
Fit of Solution
Object Scores Labeled by ID
Cases weighted by number of objects.
Dimension 1
210-1-2-3-4
Dim
en
sio
n 2
3
2
1
0
-1
-2
-3
2019
18
17
16
15
14
13
12
11
1098
7
65
43
2
1
Some pointers for Optimal Scaling
• for SPSS optimal scaling – CATREG & CATPCA have most
sophisticated options– CATREG produces standard regression output– Both CATREG & CATPCA can
• save transformed variables (for repeating analysis in ordinary mode eg for rotating components)
• Eliminate need to specify range (unlike HOMALS & OVERALS which must have range 1 to n specified)
Some pointers for Optimal Scaling
• Cautions• In general category quantifications only hold for
the set of variables in the analysis• (Incredibly) there is little published experience
with these techniques• Remember to use in exploratory mode
– Change transformations and see what happens
– Delete outlying variables/categories