Statistics for Librarians, Session 2: Descriptive statistics

Preview:

DESCRIPTION

The second in a series of four seminars presented to University of North Texas librarians. This presentation focuses on organizing and presenting basic descriptive statistics, including measures of central tendency and variation.

Citation preview

E X P LO RAT O RY DATA A N A LY S I S

DESCRIPTIVE STATISTICS

REVIEW

Results

Bias?

Sampling Error?

Invalid Measures

?

Random Error?

Other Factors?

PURPOSE OF STATISTICS

VARIABLES

Independent

Subjects

Factors

Effects of…

Dependent

Objects

Outcomes

Effects on…

SCALES OF DATA (NOIR)

Nominal• Counts by

category• Binary (Yes/No)• No meaning

between the categories (Blue is not better than Red)

Ordinal• Ranks• Scales• Space between

ranks is subjective

Interval• Integers• Zero is just

another value – doesn’t mean “absence of”

• Space between values is equal and objective, but discrete

Ratio• Interval data with

a baseline• Zero (0) means

“absence of” • Space between is

continuous• Includes simple

counts

ANOTHER WAY

• Counts by Categories• Ranks• Scales

Qualitative

• Measurements• Composite scores• Simple Counts

Quantitative

EXAMPLE DATA SETPACS FACULTY CITATION ANALYSIS

RESEARCH QUESTION

Does UNT Libraries provide access to the resources used by PACS faculty, based on references in their published works?

PACS STUDY VARIABLES

• Department• Years at UNTFaculty

• # published by type• Rankings of journalsPublished

• # cited by type• Rankings of journals• UNT accessible

Cited

IV

DV

PACS STUDY VARIABLES BY SCALE

• # of publications by type

• # of citations by type• # references available

Qualitative

• Years at UNT• Years since PhD

Quantitative

EXPLORATORY DATA ANALYSIS

GETTING TO KNOW YOUR DATA, INTIMATELY

DISTRIBUTIONS

QUALITATIVE DATA

Tables• Counts• Percentages/Ratios

• By row and column

Excel• Pivot Tables

TABLES

DepartmentNum

Faculty% of

Faculty

Anthropology 20 18%

Behavior Analysis 17 15%

Criminal Justice 18 16%

Public Administration 19 17%

Rehab, Social Work, & Addictions 18 16%

Sociology 21 19%

Totals 113 100%

DepartmentArticl

e%

Articles OtherAnthropology 73 61% 46Behavior Analysis 65 81% 15Criminal Justice 54 69% 24Public Administration 64 58% 47Rehabilitation, Social Work, and Addictions 49 82% 11Sociology 83 62% 50

Totals 388 67% 193

Availability # Refs%

Available 586 79.62%

Title not avail 134 17.66%

Year not avail 23 2.72%

Grand Total 743100.00

%

Department Article Article % BookBook

% Other Total

Anthropology 1152 666 2012

Behavior Analysis 1412 289 1740

Criminal Justice 1220 624 2003Public Administration 966 561 1724Rehabilitation, Social Work, and Addictions 852 365 1282

Sociology 2238 1558 3970

Totals

Department Article Article % BookBook

% Other Total

Anthropology 1152 57% 666 33% 194 2012

Behavior Analysis 1412 81% 289 17% 39 1740

Criminal Justice 1220 61% 624 31% 159 2003Public Administration 966 56% 561 33% 197 1724Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282

Sociology 2238 56% 1558 39% 174 3970

Totals 7840(avg) 63% 4063 30% 828 12731

ACTIVITY 1

GRAPHS

0%

40%

80%

% Articles by Department

Anthropology

Behavior Analysis

Criminal JusticePublic Adminis-tration

Rehabilitation, Social Work, and

Addictions

Sociology

% of Faculty

GRAPH & CHART RULES OF THUMB

TrendsConnection

across the X-axis

CategoricalComparisons

GroupedStackedRelative Stacked

CategoricalFew

CategoriesDifferences are Wide

ACTIVITY 2

Draw a bar graph of References by Type

Department Article Article % BookBook

% Other Total

Anthropology 1152 57% 666 33% 194 2012

Behavior Analysis 1412 81% 289 17% 39 1740

Criminal Justice 1220 61% 624 31% 159 2003

Public Administration 966 56% 561 33% 197 1724

Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282

Sociology 2238 56% 1558 39% 174 3970

Totals 7840(avg) 63% 4063 30% 828 12731

0

2000

4000

OtherBookArticle

QUANTITATIVE DISTRIBUTIONS

Stem & Leaf

Histogram

Distribution graphs

EXPLORATORY DATA ANALYSIS

• John W. Tukey• Exploratory Data

Analysis• Examining your

data visually.• Stem & Leaf• Hinges• Box plots• Scatter plots, etc.

STEM-AND-LEAF

Stem

Leaf

0 1122223334445555666666677777899

1 000011122222222333346677889

2 0122234468

3 1112355888

4 12

First digit(s)

Last digit

ACTIVITY 3

Create a stem-and-leaf table for Years at UNT.

Stem Leaf

0 01112222222222222233333344445556666677788899

1 0000000011122223333356778899

2 00122234444799

3 0245

FROM STEM-AND-LEAF TO HISTOGRAMS

Stem

Leaf Count

0 1122223334445555666666677777899

31

1 000011122222222333346677889 27

2 0122234468 10

3 1112355888 11

4 12 2Range Count

0-9 31

10-19 27

20-29 10

30-39 11

40-49 2

0-9 10-19 20-29 30-39 40-490

10

20

30

40

Histogram of Years at UNT

ACTIVITY 4

Create a histogram of the Years at UNT

Stem Leaf

0 01112222222222222233333344445556666677788899

1 0000000011122223333356778899

2 00122234444799

3 0245

Stem

Leaf Count

0 01112222222222222233333344445556666677788899

44

1 0000000011122223333356778899 28

2 00122234444799 14

3 0245 40-9 10-19 20-29 30-39

0

10

20

30

40

50

Years at UNT

PIVOT TABLES

Select

Data

• Highlight table• Insert->Pivot Table

Select

Variable

s

• Categories (Row Labels)• Values

Change Settings

• Percentage of Grand Total• Average

DEMONSTRATION OF PIVOT TABLES IN EXCEL

HISTOGRAMS IN EXCEL

• Options• Add-ins• Manage Add-ins

Analysis Toolpak

• Equal spacing• Enter the highest

# for each range• Ceiling (“more”)

Set ranges• Data• Data Analysis• Histogram

Create Histogram

• Insert Bar Chart• Highlight

histogram• Select bars &

Format Selection• Gap Width=0%

Create Graph

DEMONSTRATION OF HISTOGRAM IN EXCEL

MEASURES OF CENTRAL TENDENCY

• Average

Mean

• Middle

Median

• Most Common

Mode

CENTRAL TENDENCY BY SCALES

Quantitative

Mean

Median

Qualitative

Median--not

Nominal

Mode

ACTIVITY 5

# Available

Mode

# References by TypeMode

Years Since PhDMean Median

Years at UNTMean Median

MEAN

Sum of all the values divided by the count of values

= sample mean∑ = “sum of…”X = values of the variablen = number of values

EXCEL FUNCTIONS FOR MEASURES OF CENTRAL TENDENCY

• =Average(range)Mean

• =Median(range)Median

• =Mode(range)Mode

SPREAD (REVIEW)

Quantitative

• Range• Quartiles

or Quintiles

• Standard Deviation

Qualitative

• Distribution Tables

• Bar Graphs

How variable is the data?

RANGE & QUARTILES

PRESENTATION OF SPREAD

• Box plots• Median• Upper & lower

quintiles• Outliers

• Cross-tabulations• Bar graphs

BOXPLOT IN EXCEL

Set parameters

• Median• Quartile 1• Minimum • Maximum• Quartile 3

Use Excel functions

• Median(range)• Quartile.inc(range,1

)• Min(range)• Max(range)• Quartile.inc(range,3

)

Insert Chart

• Highlight both columns

• Select a bar chart

• Switch the columns & rows

• Modify the formats of each element

• YouTube tutorial

STANDARD DEVIATION

•Measure of dispersion of data•Square root of the average variation from the mean

STANDARD DEVIATION WORKED OUT

Years since PhD ()

Mean ()

Difference from Mean

Difference from Mean Squared

1 14.86 -13.86 192.216

1 14.86 -13.86 192.216

2 14.86 -12.86 165.4876

14 14.86 -0.86 0.746837

16 14.86 1.14 1.290047

41 14.86 26.14 683.0802

42 14.86 27.14 736.3518

n=81 14.86 0.00 9931.506

WORK IT OUT

𝑠=√𝟗𝟗𝟑𝟏 .𝟓𝟎𝟔(𝟖𝟏−1 )

𝑠=√124.1438

𝑠=√ 9931.50680

𝑠=11.14198

SPREAD IN EXCEL

• =Min(range)• =Max(range)Range

• =Percentiles.inc(range, %)

• =Quartile.inc(range, {1,2,3,4})

Quantiles

• =STDEV.S(range)Standard Deviation

WHAT DOES THE STANDARD DEVIATION TELL YOU?

Greater variation, less certainty

Lower variation, more certainty

FROM HISTOGRAMS TO FREQUENCY DISTRIBUTIONS

NORMAL DISTRIBUTIONS

BIVARIATE ANALYSIS

SCATTER PLOT

Relationship of two variables

Quantitative Only

CORRELATIONS

Direct• As x increases, y

increases

Indirect• As x increases, y

decreases

No Correlation

DEMONSTRATION OF SCATTER PLOT IN EXCEL

• Highlight both columns

Select Data

• Scatter• Layout 9

Insert graph• X-axis label• Y-axis label

Change Labels

CROSS-TABULATIONS

Qualitative Two Variables

Fewer Categories

Row Percentage

Column Percentage

Pivot Tables in Excel

CONTINGENCY TABLE

Test A/B Yes No Total

Yes 10 15 25

No 50 25 75

Totals 60 40 100

Simple Cross-tab

Two Binomial Variables

• Odds Ratios & Risk Ratios

Powerful Statistics

IMPORTANCE OF DESCRIPTIVE STATISTICS

DescribesPopulationSampleResults

Compares

Sample to Population

Sub-groupsCorrelations

Summarizes

Central TendencySpread

PROGRESSION FROM DESCRIPTIVE TO INFERENTIAL STATISTICS

Central Tendency

Spread

Distributions

Probability

Inferential Statistics

Recommended