Descriptive Statistics in SPSS
When first looking at a dataset, it is wise to use descriptive statistics to get some idea of
what your data look like. Here is a simple dataset, showing three different variables –
type of psychological problem someone was treated for, type of treatment approach used,
and symptom level at the end of treatment. One of these variables (symptom) is on an
interval-level scale; the other two are grouping variables (i.e., they are measured on a
nominal-level scale).
The type of descriptive statistics you can calculate depends on the level of data that you
have to work with. Here are the options:
• For simple description of nominal-level variables (groups) use frequencies
• For more complex description of nominal-level variables use crosstabs
• For simple description of interval- or ratio-level variables (items measured on a
scale) use the descriptives command
• For more complex description of interval- or ratio-level variables use the explore
command
Each of these options will be shown in more detail below.
One Important Note: Even though “problem” and “treatment” are nominal-level
variables, they have to be coded as numbers (not as text) in order to use the following
procedures. See the instructions on How to Recode Variables if you need to convert from
a text variable to a number variable.
Here are the various choices. All of them are found in the “Analyze” menu in SPSS,
under the sub-menu for “Descriptive Statistics”:
We will start by looking at frequencies for a nominal-level variable, like “problem” in
this dataset. Open the “Frequencies” dialog box:
Select the variable that you are interested in from the left-hand lists, and use the
arrow button to move it from the left-hand list into the right-hand list.
Use the “Charts” button to see a graphical output of the frequencies on your variable. In
this case, I have selected a histogram. You can also use pie charts or bar charts. If you
aren’t happy with the way your data are displayed (ascending vs. descending order, etc.),
try some of the options found by clicking on the “Format” button.
Hit “OK” in the main dialog box to continue. The output for this variable looks like this: Patient Diagnosis
Frequency Percent Valid Percent Cumulative Percent
Anxiety 5 31.3 31.3 31.3
Depression 5 31.3 31.3 62.5
Eating Disorder 6 37.5 37.5 100.0
Valid
Total 16 100.0 100.0
This is the frequency table, which is the basic output from this procedure. It shows the
number of people in each group, the percent of the total in each group, the percent in each
group as a proportion of just the people with complete data (that’s the “valid percent”),
and the cumulative percent reached as you add each new group to the previous ones
(that’s the “cumulative percent”). Notice that the numeric values have been replaced with
text labels – these were entered in the “values” column in the “variable view” of the data.
3.532.521.510.5
Patient Diagnosis
6
5
4
3
2
1
0
Frequency
Mean = 2.06Std. Dev. = 0.854N = 16
Histogram
Here’s the graphical output: This histogram shows the number of people in each
category.
For a more detailed breakdown of nominal-level data, use the crosstabs command in the
“Analyze/Descriptive Statistics” sub-menu. This is appropriate if you want to consider
two variables in combination – for example, what % of the patients with an anxiety
disorder were treated with cognitive-behavioral therapy?
The “Crosstabs” dialog box lets you enter one variable (or more) as “rows” in a
frequency table, and another variable as the “columns” in the same table.
Use the “Cells” command to get percentages for each row and column:
Hit “OK” in the main dialog box to continue. The output from this procedure looks like
this:
Patient Diagnosis * Type of Therapy Crosstabulation
3 2 5
60.0% 40.0% 100.0%
37.5% 25.0% 31.3%
18.8% 12.5% 31.3%
2 3 5
40.0% 60.0% 100.0%
25.0% 37.5% 31.3%
12.5% 18.8% 31.3%
3 3 6
50.0% 50.0% 100.0%
37.5% 37.5% 37.5%
18.8% 18.8% 37.5%
8 8 16
50.0% 50.0% 100.0%
100.0% 100.0% 100.0%
50.0% 50.0% 100.0%
Count
% within Patient
Diagnosis
% within Type of Therapy
% of Total
Count
% within Patient
Diagnosis
% within Type of Therapy
% of Total
Count
% within Patient
Diagnosis
% within Type of Therapy
% of Total
Count
% within Patient
Diagnosis
% within Type of Therapy
% of Total
Anxiety
Depression
Eating Disorder
Patient
Diagnosis
Total
CBT IPT
Type of Therapy
Total
The central area of this table shows you the total number of patients who have each
individual combination of the various levels of the two variables.
The far-right column shows you the total for each diagnosis, as a percent of all patients.
The bottom-most row slices the data the other way, showing you the total for each type of
treatment, as a percent of all patients.
The interior cells of the table also show you each individual cell’s total as a percentage of
the patients with that particular diagnosis, as a percentage of the patients who received
that particular type of therapy, and as a percentage of the overall total.
For interval- or ratio-level variables (i.e., “scales”), use the descriptives sub-command
on the “Analyze/Descriptive Statistics” menu:
This opens a dialog box where you can select the variable that you want descriptive
statistics on. Make sure that what you are selecting is actually an interval- or ratio-level
variable. Because all of the data are entered as numbers, SPSS will actually calculate
descriptive statistics on any of these variables; but these results are only meaningful for
interval- or ratio-level variables (in this case, just the “score” variable).
As usual, select the variable(s) that you are interested in from the left-hand column, and
move them to the right-hand column. Use the “Options” button to select the specific
descriptive statistics that you are interested in. Usually, good choices include the mean,
standard deviation, maximum, and minimum. If you are concerned about the impact of
outliers on your data, a measure of skewness may also be appropriate.
The output from this procedure looks like this:
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Symptom Severity (higher = worse) 16 19.00 33.00 24.9375 4.15482
Valid N (listwise) 16
This shows you the specific results for each variable that you entered into the analysis. It
is possible to get a table that gives you basic descriptive statistics for many variables
simultaneously, just by moving them all at once from the left-hand to the right-hand list
in the main dialog box.
Finally, to get more complex descriptive results for an interval- or ratio-level variable,
use the explore command in the “Analyze/Descriptive Statistics” sub-menu.
The following dialog box will appear:
The variable that you are interested in goes into the “dependent” list. For now, we won’t
use the other lists, but we will use the “plots” button to select additional options.
The “plots” dialog box lets you select options for graphical display of the data, including
a stem-and-leaf plot like this:
Frequency Stem & Leaf
1.00 1 . 9
7.00 2 . 0112444
5.00 2 . 55567
3.00 3 . 123
Stem width: 10.00
Each leaf: 1 case(s)
A boxplot like this:
Symptom Severity (higher = worse)
33
30
27
24
21
18
A histogram like this:
33.0030.0027.0024.0021.0018.00
Symptom Severity (higher = worse)
7
6
5
4
3
2
1
0
Frequency
Mean = 24.9375Std. Dev. = 4.15482N = 16
Histogram
Or a plot to test whether the data are normally distributed, like this:
333027242118
Observed Value
2
1
0
-1
-2
Expected Normal
Normal Q-Q Plot of Symptom Severity (higher = worse)
(This result is showing a slight deviation from normality, based on the difference between
the actual data points and the theoretical line they should fit, but probably nothing to
worry about. Some “tests of normality” are found earlier in the printout, and give you an
actual test to determine how far off the data are from a normal distribution. If these tests
are statistically significant [p < .05], you may need to do a data transformation to correct
for non-normality, before you run any statistical tests):
Tests of Normality
Kolmogorov-Smirnov(a) Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Symptom Severity (higher = worse) .181 16 .166 .924 16 .196
a Lilliefors Significance Correction
The “explore” command also gives you descriptive statistics for each variable, with more
extensive results (median, interquartile range, etc.) than the “descriptives” command:
Descriptives
24.9375 1.03870
22.7236
27.1514
24.8194
24.5000
17.263
4.15482
19.00
33.00
14.00
5.50
.669 .564
-.205 1.091
Mean
Lower Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Symptom Severity
(higher = worse)
Statistic Std. Error
One additional feature of the “explore” command is the ability to get separate descriptive
results for different sub-groups within your dataset. The way to do this is by grouping
your data based on some additional variable, which is treated as a “factor” for breaking
down the analyses. Go back to the main dialog box to add one of your nominal-level
variables as a “factor” (the grouping variable always has to be nominal-level):
After you add this factor, the results will show you two separate sets of statistics and
plots – one for just those patients in the CBT group, and a second set for just those
patients in the IPT group.