56
EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS

Statistics for Librarians, Session 2: Descriptive statistics

Embed Size (px)

DESCRIPTION

The second in a series of four seminars presented to University of North Texas librarians. This presentation focuses on organizing and presenting basic descriptive statistics, including measures of central tendency and variation.

Citation preview

Page 1: Statistics for Librarians, Session 2: Descriptive statistics

E X P LO RAT O RY DATA A N A LY S I S

DESCRIPTIVE STATISTICS

Page 2: Statistics for Librarians, Session 2: Descriptive statistics

REVIEW

Page 3: Statistics for Librarians, Session 2: Descriptive statistics

Results

Bias?

Sampling Error?

Invalid Measures

?

Random Error?

Other Factors?

PURPOSE OF STATISTICS

Page 4: Statistics for Librarians, Session 2: Descriptive statistics

VARIABLES

Independent

Subjects

Factors

Effects of…

Dependent

Objects

Outcomes

Effects on…

Page 5: Statistics for Librarians, Session 2: Descriptive statistics

SCALES OF DATA (NOIR)

Nominal• Counts by

category• Binary (Yes/No)• No meaning

between the categories (Blue is not better than Red)

Ordinal• Ranks• Scales• Space between

ranks is subjective

Interval• Integers• Zero is just

another value – doesn’t mean “absence of”

• Space between values is equal and objective, but discrete

Ratio• Interval data with

a baseline• Zero (0) means

“absence of” • Space between is

continuous• Includes simple

counts

Page 6: Statistics for Librarians, Session 2: Descriptive statistics

ANOTHER WAY

• Counts by Categories• Ranks• Scales

Qualitative

• Measurements• Composite scores• Simple Counts

Quantitative

Page 7: Statistics for Librarians, Session 2: Descriptive statistics

EXAMPLE DATA SETPACS FACULTY CITATION ANALYSIS

Page 8: Statistics for Librarians, Session 2: Descriptive statistics

RESEARCH QUESTION

Does UNT Libraries provide access to the resources used by PACS faculty, based on references in their published works?

Page 9: Statistics for Librarians, Session 2: Descriptive statistics

PACS STUDY VARIABLES

• Department• Years at UNTFaculty

• # published by type• Rankings of journalsPublished

• # cited by type• Rankings of journals• UNT accessible

Cited

IV

DV

Page 10: Statistics for Librarians, Session 2: Descriptive statistics

PACS STUDY VARIABLES BY SCALE

• # of publications by type

• # of citations by type• # references available

Qualitative

• Years at UNT• Years since PhD

Quantitative

Page 11: Statistics for Librarians, Session 2: Descriptive statistics

EXPLORATORY DATA ANALYSIS

GETTING TO KNOW YOUR DATA, INTIMATELY

Page 12: Statistics for Librarians, Session 2: Descriptive statistics

DISTRIBUTIONS

Page 13: Statistics for Librarians, Session 2: Descriptive statistics

QUALITATIVE DATA

Tables• Counts• Percentages/Ratios

• By row and column

Excel• Pivot Tables

Page 14: Statistics for Librarians, Session 2: Descriptive statistics

TABLES

DepartmentNum

Faculty% of

Faculty

Anthropology 20 18%

Behavior Analysis 17 15%

Criminal Justice 18 16%

Public Administration 19 17%

Rehab, Social Work, & Addictions 18 16%

Sociology 21 19%

Totals 113 100%

DepartmentArticl

e%

Articles OtherAnthropology 73 61% 46Behavior Analysis 65 81% 15Criminal Justice 54 69% 24Public Administration 64 58% 47Rehabilitation, Social Work, and Addictions 49 82% 11Sociology 83 62% 50

Totals 388 67% 193

Availability # Refs%

Available 586 79.62%

Title not avail 134 17.66%

Year not avail 23 2.72%

Grand Total 743100.00

%

Page 15: Statistics for Librarians, Session 2: Descriptive statistics

Department Article Article % BookBook

% Other Total

Anthropology 1152 666 2012

Behavior Analysis 1412 289 1740

Criminal Justice 1220 624 2003Public Administration 966 561 1724Rehabilitation, Social Work, and Addictions 852 365 1282

Sociology 2238 1558 3970

Totals

Department Article Article % BookBook

% Other Total

Anthropology 1152 57% 666 33% 194 2012

Behavior Analysis 1412 81% 289 17% 39 1740

Criminal Justice 1220 61% 624 31% 159 2003Public Administration 966 56% 561 33% 197 1724Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282

Sociology 2238 56% 1558 39% 174 3970

Totals 7840(avg) 63% 4063 30% 828 12731

ACTIVITY 1

Page 16: Statistics for Librarians, Session 2: Descriptive statistics

GRAPHS

0%

40%

80%

% Articles by Department

Anthropology

Behavior Analysis

Criminal JusticePublic Adminis-tration

Rehabilitation, Social Work, and

Addictions

Sociology

% of Faculty

Page 17: Statistics for Librarians, Session 2: Descriptive statistics

GRAPH & CHART RULES OF THUMB

TrendsConnection

across the X-axis

CategoricalComparisons

GroupedStackedRelative Stacked

CategoricalFew

CategoriesDifferences are Wide

Page 18: Statistics for Librarians, Session 2: Descriptive statistics

ACTIVITY 2

Draw a bar graph of References by Type

Department Article Article % BookBook

% Other Total

Anthropology 1152 57% 666 33% 194 2012

Behavior Analysis 1412 81% 289 17% 39 1740

Criminal Justice 1220 61% 624 31% 159 2003

Public Administration 966 56% 561 33% 197 1724

Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282

Sociology 2238 56% 1558 39% 174 3970

Totals 7840(avg) 63% 4063 30% 828 12731

0

2000

4000

OtherBookArticle

Page 19: Statistics for Librarians, Session 2: Descriptive statistics

QUANTITATIVE DISTRIBUTIONS

Stem & Leaf

Histogram

Distribution graphs

Page 20: Statistics for Librarians, Session 2: Descriptive statistics

EXPLORATORY DATA ANALYSIS

• John W. Tukey• Exploratory Data

Analysis• Examining your

data visually.• Stem & Leaf• Hinges• Box plots• Scatter plots, etc.

Page 21: Statistics for Librarians, Session 2: Descriptive statistics

STEM-AND-LEAF

Stem

Leaf

0 1122223334445555666666677777899

1 000011122222222333346677889

2 0122234468

3 1112355888

4 12

First digit(s)

Last digit

Page 22: Statistics for Librarians, Session 2: Descriptive statistics

ACTIVITY 3

Create a stem-and-leaf table for Years at UNT.

Stem Leaf

0 01112222222222222233333344445556666677788899

1 0000000011122223333356778899

2 00122234444799

3 0245

Page 23: Statistics for Librarians, Session 2: Descriptive statistics

FROM STEM-AND-LEAF TO HISTOGRAMS

Page 24: Statistics for Librarians, Session 2: Descriptive statistics

Stem

Leaf Count

0 1122223334445555666666677777899

31

1 000011122222222333346677889 27

2 0122234468 10

3 1112355888 11

4 12 2Range Count

0-9 31

10-19 27

20-29 10

30-39 11

40-49 2

0-9 10-19 20-29 30-39 40-490

10

20

30

40

Histogram of Years at UNT

Page 25: Statistics for Librarians, Session 2: Descriptive statistics

ACTIVITY 4

Create a histogram of the Years at UNT

Stem Leaf

0 01112222222222222233333344445556666677788899

1 0000000011122223333356778899

2 00122234444799

3 0245

Stem

Leaf Count

0 01112222222222222233333344445556666677788899

44

1 0000000011122223333356778899 28

2 00122234444799 14

3 0245 40-9 10-19 20-29 30-39

0

10

20

30

40

50

Years at UNT

Page 26: Statistics for Librarians, Session 2: Descriptive statistics

PIVOT TABLES

Select

Data

• Highlight table• Insert->Pivot Table

Select

Variable

s

• Categories (Row Labels)• Values

Change Settings

• Percentage of Grand Total• Average

Page 27: Statistics for Librarians, Session 2: Descriptive statistics

DEMONSTRATION OF PIVOT TABLES IN EXCEL

Page 28: Statistics for Librarians, Session 2: Descriptive statistics

HISTOGRAMS IN EXCEL

• Options• Add-ins• Manage Add-ins

Analysis Toolpak

• Equal spacing• Enter the highest

# for each range• Ceiling (“more”)

Set ranges• Data• Data Analysis• Histogram

Create Histogram

• Insert Bar Chart• Highlight

histogram• Select bars &

Format Selection• Gap Width=0%

Create Graph

Page 29: Statistics for Librarians, Session 2: Descriptive statistics

DEMONSTRATION OF HISTOGRAM IN EXCEL

Page 30: Statistics for Librarians, Session 2: Descriptive statistics

MEASURES OF CENTRAL TENDENCY

• Average

Mean

• Middle

Median

• Most Common

Mode

Page 31: Statistics for Librarians, Session 2: Descriptive statistics

CENTRAL TENDENCY BY SCALES

Quantitative

Mean

Median

Qualitative

Median--not

Nominal

Mode

Page 32: Statistics for Librarians, Session 2: Descriptive statistics

ACTIVITY 5

# Available

Mode

# References by TypeMode

Years Since PhDMean Median

Years at UNTMean Median

Page 33: Statistics for Librarians, Session 2: Descriptive statistics

MEAN

Sum of all the values divided by the count of values

= sample mean∑ = “sum of…”X = values of the variablen = number of values

Page 34: Statistics for Librarians, Session 2: Descriptive statistics

EXCEL FUNCTIONS FOR MEASURES OF CENTRAL TENDENCY

• =Average(range)Mean

• =Median(range)Median

• =Mode(range)Mode

Page 35: Statistics for Librarians, Session 2: Descriptive statistics

SPREAD (REVIEW)

Quantitative

• Range• Quartiles

or Quintiles

• Standard Deviation

Qualitative

• Distribution Tables

• Bar Graphs

How variable is the data?

Page 36: Statistics for Librarians, Session 2: Descriptive statistics

RANGE & QUARTILES

Page 37: Statistics for Librarians, Session 2: Descriptive statistics

PRESENTATION OF SPREAD

• Box plots• Median• Upper & lower

quintiles• Outliers

• Cross-tabulations• Bar graphs

Page 38: Statistics for Librarians, Session 2: Descriptive statistics

BOXPLOT IN EXCEL

Set parameters

• Median• Quartile 1• Minimum • Maximum• Quartile 3

Use Excel functions

• Median(range)• Quartile.inc(range,1

)• Min(range)• Max(range)• Quartile.inc(range,3

)

Insert Chart

• Highlight both columns

• Select a bar chart

• Switch the columns & rows

• Modify the formats of each element

• YouTube tutorial

Page 39: Statistics for Librarians, Session 2: Descriptive statistics

STANDARD DEVIATION

•Measure of dispersion of data•Square root of the average variation from the mean

Page 40: Statistics for Librarians, Session 2: Descriptive statistics

STANDARD DEVIATION WORKED OUT

Years since PhD ()

Mean ()

Difference from Mean

Difference from Mean Squared

1 14.86 -13.86 192.216

1 14.86 -13.86 192.216

2 14.86 -12.86 165.4876

14 14.86 -0.86 0.746837

16 14.86 1.14 1.290047

41 14.86 26.14 683.0802

42 14.86 27.14 736.3518

n=81 14.86 0.00 9931.506

Page 41: Statistics for Librarians, Session 2: Descriptive statistics

WORK IT OUT

𝑠=√𝟗𝟗𝟑𝟏 .𝟓𝟎𝟔(𝟖𝟏−1 )

𝑠=√124.1438

𝑠=√ 9931.50680

𝑠=11.14198

Page 42: Statistics for Librarians, Session 2: Descriptive statistics

SPREAD IN EXCEL

• =Min(range)• =Max(range)Range

• =Percentiles.inc(range, %)

• =Quartile.inc(range, {1,2,3,4})

Quantiles

• =STDEV.S(range)Standard Deviation

Page 43: Statistics for Librarians, Session 2: Descriptive statistics

WHAT DOES THE STANDARD DEVIATION TELL YOU?

Greater variation, less certainty

Lower variation, more certainty

Page 44: Statistics for Librarians, Session 2: Descriptive statistics

FROM HISTOGRAMS TO FREQUENCY DISTRIBUTIONS

Page 45: Statistics for Librarians, Session 2: Descriptive statistics

NORMAL DISTRIBUTIONS

Page 48: Statistics for Librarians, Session 2: Descriptive statistics

BIVARIATE ANALYSIS

Page 49: Statistics for Librarians, Session 2: Descriptive statistics

SCATTER PLOT

Relationship of two variables

Quantitative Only

Page 50: Statistics for Librarians, Session 2: Descriptive statistics

CORRELATIONS

Direct• As x increases, y

increases

Indirect• As x increases, y

decreases

No Correlation

Page 51: Statistics for Librarians, Session 2: Descriptive statistics

DEMONSTRATION OF SCATTER PLOT IN EXCEL

• Highlight both columns

Select Data

• Scatter• Layout 9

Insert graph• X-axis label• Y-axis label

Change Labels

Page 52: Statistics for Librarians, Session 2: Descriptive statistics

CROSS-TABULATIONS

Qualitative Two Variables

Fewer Categories

Row Percentage

Column Percentage

Pivot Tables in Excel

Page 53: Statistics for Librarians, Session 2: Descriptive statistics

CONTINGENCY TABLE

Test A/B Yes No Total

Yes 10 15 25

No 50 25 75

Totals 60 40 100

Simple Cross-tab

Two Binomial Variables

• Odds Ratios & Risk Ratios

Powerful Statistics

Page 54: Statistics for Librarians, Session 2: Descriptive statistics

IMPORTANCE OF DESCRIPTIVE STATISTICS

DescribesPopulationSampleResults

Compares

Sample to Population

Sub-groupsCorrelations

Summarizes

Central TendencySpread

Page 55: Statistics for Librarians, Session 2: Descriptive statistics

PROGRESSION FROM DESCRIPTIVE TO INFERENTIAL STATISTICS

Central Tendency

Spread

Distributions

Probability

Inferential Statistics