Upload
university-of-north-texas
View
281
Download
1
Embed Size (px)
DESCRIPTION
The second in a series of four seminars presented to University of North Texas librarians. This presentation focuses on organizing and presenting basic descriptive statistics, including measures of central tendency and variation.
Citation preview
E X P LO RAT O RY DATA A N A LY S I S
DESCRIPTIVE STATISTICS
REVIEW
Results
Bias?
Sampling Error?
Invalid Measures
?
Random Error?
Other Factors?
PURPOSE OF STATISTICS
VARIABLES
Independent
Subjects
Factors
Effects of…
Dependent
Objects
Outcomes
Effects on…
SCALES OF DATA (NOIR)
Nominal• Counts by
category• Binary (Yes/No)• No meaning
between the categories (Blue is not better than Red)
Ordinal• Ranks• Scales• Space between
ranks is subjective
Interval• Integers• Zero is just
another value – doesn’t mean “absence of”
• Space between values is equal and objective, but discrete
Ratio• Interval data with
a baseline• Zero (0) means
“absence of” • Space between is
continuous• Includes simple
counts
ANOTHER WAY
• Counts by Categories• Ranks• Scales
Qualitative
• Measurements• Composite scores• Simple Counts
Quantitative
EXAMPLE DATA SETPACS FACULTY CITATION ANALYSIS
RESEARCH QUESTION
Does UNT Libraries provide access to the resources used by PACS faculty, based on references in their published works?
PACS STUDY VARIABLES
• Department• Years at UNTFaculty
• # published by type• Rankings of journalsPublished
• # cited by type• Rankings of journals• UNT accessible
Cited
IV
DV
PACS STUDY VARIABLES BY SCALE
• # of publications by type
• # of citations by type• # references available
Qualitative
• Years at UNT• Years since PhD
Quantitative
EXPLORATORY DATA ANALYSIS
GETTING TO KNOW YOUR DATA, INTIMATELY
DISTRIBUTIONS
QUALITATIVE DATA
Tables• Counts• Percentages/Ratios
• By row and column
Excel• Pivot Tables
TABLES
DepartmentNum
Faculty% of
Faculty
Anthropology 20 18%
Behavior Analysis 17 15%
Criminal Justice 18 16%
Public Administration 19 17%
Rehab, Social Work, & Addictions 18 16%
Sociology 21 19%
Totals 113 100%
DepartmentArticl
e%
Articles OtherAnthropology 73 61% 46Behavior Analysis 65 81% 15Criminal Justice 54 69% 24Public Administration 64 58% 47Rehabilitation, Social Work, and Addictions 49 82% 11Sociology 83 62% 50
Totals 388 67% 193
Availability # Refs%
Available 586 79.62%
Title not avail 134 17.66%
Year not avail 23 2.72%
Grand Total 743100.00
%
Department Article Article % BookBook
% Other Total
Anthropology 1152 666 2012
Behavior Analysis 1412 289 1740
Criminal Justice 1220 624 2003Public Administration 966 561 1724Rehabilitation, Social Work, and Addictions 852 365 1282
Sociology 2238 1558 3970
Totals
Department Article Article % BookBook
% Other Total
Anthropology 1152 57% 666 33% 194 2012
Behavior Analysis 1412 81% 289 17% 39 1740
Criminal Justice 1220 61% 624 31% 159 2003Public Administration 966 56% 561 33% 197 1724Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282
Sociology 2238 56% 1558 39% 174 3970
Totals 7840(avg) 63% 4063 30% 828 12731
ACTIVITY 1
GRAPHS
0%
40%
80%
% Articles by Department
Anthropology
Behavior Analysis
Criminal JusticePublic Adminis-tration
Rehabilitation, Social Work, and
Addictions
Sociology
% of Faculty
GRAPH & CHART RULES OF THUMB
TrendsConnection
across the X-axis
CategoricalComparisons
GroupedStackedRelative Stacked
CategoricalFew
CategoriesDifferences are Wide
ACTIVITY 2
Draw a bar graph of References by Type
Department Article Article % BookBook
% Other Total
Anthropology 1152 57% 666 33% 194 2012
Behavior Analysis 1412 81% 289 17% 39 1740
Criminal Justice 1220 61% 624 31% 159 2003
Public Administration 966 56% 561 33% 197 1724
Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282
Sociology 2238 56% 1558 39% 174 3970
Totals 7840(avg) 63% 4063 30% 828 12731
0
2000
4000
OtherBookArticle
QUANTITATIVE DISTRIBUTIONS
Stem & Leaf
Histogram
Distribution graphs
EXPLORATORY DATA ANALYSIS
• John W. Tukey• Exploratory Data
Analysis• Examining your
data visually.• Stem & Leaf• Hinges• Box plots• Scatter plots, etc.
STEM-AND-LEAF
Stem
Leaf
0 1122223334445555666666677777899
1 000011122222222333346677889
2 0122234468
3 1112355888
4 12
First digit(s)
Last digit
ACTIVITY 3
Create a stem-and-leaf table for Years at UNT.
Stem Leaf
0 01112222222222222233333344445556666677788899
1 0000000011122223333356778899
2 00122234444799
3 0245
FROM STEM-AND-LEAF TO HISTOGRAMS
Stem
Leaf Count
0 1122223334445555666666677777899
31
1 000011122222222333346677889 27
2 0122234468 10
3 1112355888 11
4 12 2Range Count
0-9 31
10-19 27
20-29 10
30-39 11
40-49 2
0-9 10-19 20-29 30-39 40-490
10
20
30
40
Histogram of Years at UNT
ACTIVITY 4
Create a histogram of the Years at UNT
Stem Leaf
0 01112222222222222233333344445556666677788899
1 0000000011122223333356778899
2 00122234444799
3 0245
Stem
Leaf Count
0 01112222222222222233333344445556666677788899
44
1 0000000011122223333356778899 28
2 00122234444799 14
3 0245 40-9 10-19 20-29 30-39
0
10
20
30
40
50
Years at UNT
PIVOT TABLES
Select
Data
• Highlight table• Insert->Pivot Table
Select
Variable
s
• Categories (Row Labels)• Values
Change Settings
• Percentage of Grand Total• Average
DEMONSTRATION OF PIVOT TABLES IN EXCEL
HISTOGRAMS IN EXCEL
• Options• Add-ins• Manage Add-ins
Analysis Toolpak
• Equal spacing• Enter the highest
# for each range• Ceiling (“more”)
Set ranges• Data• Data Analysis• Histogram
Create Histogram
• Insert Bar Chart• Highlight
histogram• Select bars &
Format Selection• Gap Width=0%
Create Graph
DEMONSTRATION OF HISTOGRAM IN EXCEL
MEASURES OF CENTRAL TENDENCY
• Average
Mean
• Middle
Median
• Most Common
Mode
CENTRAL TENDENCY BY SCALES
Quantitative
Mean
Median
Qualitative
Median--not
Nominal
Mode
ACTIVITY 5
# Available
Mode
# References by TypeMode
Years Since PhDMean Median
Years at UNTMean Median
MEAN
Sum of all the values divided by the count of values
= sample mean∑ = “sum of…”X = values of the variablen = number of values
EXCEL FUNCTIONS FOR MEASURES OF CENTRAL TENDENCY
• =Average(range)Mean
• =Median(range)Median
• =Mode(range)Mode
SPREAD (REVIEW)
Quantitative
• Range• Quartiles
or Quintiles
• Standard Deviation
Qualitative
• Distribution Tables
• Bar Graphs
How variable is the data?
RANGE & QUARTILES
PRESENTATION OF SPREAD
• Box plots• Median• Upper & lower
quintiles• Outliers
• Cross-tabulations• Bar graphs
BOXPLOT IN EXCEL
Set parameters
• Median• Quartile 1• Minimum • Maximum• Quartile 3
Use Excel functions
• Median(range)• Quartile.inc(range,1
)• Min(range)• Max(range)• Quartile.inc(range,3
)
Insert Chart
• Highlight both columns
• Select a bar chart
• Switch the columns & rows
• Modify the formats of each element
• YouTube tutorial
STANDARD DEVIATION
•Measure of dispersion of data•Square root of the average variation from the mean
STANDARD DEVIATION WORKED OUT
Years since PhD ()
Mean ()
Difference from Mean
Difference from Mean Squared
1 14.86 -13.86 192.216
1 14.86 -13.86 192.216
2 14.86 -12.86 165.4876
14 14.86 -0.86 0.746837
16 14.86 1.14 1.290047
41 14.86 26.14 683.0802
42 14.86 27.14 736.3518
n=81 14.86 0.00 9931.506
WORK IT OUT
𝑠=√𝟗𝟗𝟑𝟏 .𝟓𝟎𝟔(𝟖𝟏−1 )
𝑠=√124.1438
𝑠=√ 9931.50680
𝑠=11.14198
SPREAD IN EXCEL
• =Min(range)• =Max(range)Range
• =Percentiles.inc(range, %)
• =Quartile.inc(range, {1,2,3,4})
Quantiles
• =STDEV.S(range)Standard Deviation
WHAT DOES THE STANDARD DEVIATION TELL YOU?
Greater variation, less certainty
Lower variation, more certainty
FROM HISTOGRAMS TO FREQUENCY DISTRIBUTIONS
NORMAL DISTRIBUTIONS
NORMAL DISTRIBUTION
SKEWED DISTRIBUTIONS
BIVARIATE ANALYSIS
SCATTER PLOT
Relationship of two variables
Quantitative Only
CORRELATIONS
Direct• As x increases, y
increases
Indirect• As x increases, y
decreases
No Correlation
DEMONSTRATION OF SCATTER PLOT IN EXCEL
• Highlight both columns
Select Data
• Scatter• Layout 9
Insert graph• X-axis label• Y-axis label
Change Labels
CROSS-TABULATIONS
Qualitative Two Variables
Fewer Categories
Row Percentage
Column Percentage
Pivot Tables in Excel
CONTINGENCY TABLE
Test A/B Yes No Total
Yes 10 15 25
No 50 25 75
Totals 60 40 100
Simple Cross-tab
Two Binomial Variables
• Odds Ratios & Risk Ratios
Powerful Statistics
IMPORTANCE OF DESCRIPTIVE STATISTICS
DescribesPopulationSampleResults
Compares
Sample to Population
Sub-groupsCorrelations
Summarizes
Central TendencySpread
PROGRESSION FROM DESCRIPTIVE TO INFERENTIAL STATISTICS
Central Tendency
Spread
Distributions
Probability
Inferential Statistics
RESOURCESRice Virtual Lab in Statistics
Excel Tutorials for Statistical Analysis
Khan Academy - videos
Basic Research Methods for Librarians
– ebook
Descriptive Statistical Techniques for Librarians
- ebook