Upload
julia-hunt
View
222
Download
2
Tags:
Embed Size (px)
Citation preview
B AD 6243: Applied Univariate Statistics
Understanding Data and
Data Distributions
Professor Laku Chidambaram
Price College of Business
University of Oklahoma
Summarizing Data
BAD 6243: Applied Univariate Statistics 3
Measures of Central Tendency• Mean
– Summarizing sample data (continuous data)– Estimating the population parameter (µ) from the sample
statistic (x)– Arithmetic average (sum of scores/number of scores)
• Median– Mid point of distribution– Can be used in summarizing ordinal data
• Mode– Most frequently occurring value– Does not consider distribution of all scores– E.g., more males than females
BAD 6243: Applied Univariate Statistics 4
• Take a sample distribution of hourly billing rates for a group of IT workers:30, 20, 40, 50, 60, 60, 100, 60, 30
x = 450
n = 9
Mean = 50
Median: 20, 30, 30, 40, 50, 60, 60, 60, 100 = 50
Mode = 60
An Example
BAD 6243: Applied Univariate Statistics 5
Measures of Dispersion
• Range– Difference between highest and lowest score– Does not take into account all scores in distribution
• Variance– Measure of how much scores deviate from mean on
average– Use sample statistic (s2) to estimate population
parameter (2)• Standard Deviation
– Square root of the variance– Measure of consistency of scores
BAD 6243: Applied Univariate Statistics 6
Example (contd.)
x x-x (x-x)2
20 -30 90030 -20 40030 -20 40040 -10 10050 0 060 10 10060 10 10060 10 100
100 50 250050 0 4600
Average Sum Sum
Range = 80
(x-x)2 = 4600 (sum of squared differences)
s2 = (x-x)2/n-1 = 4600/8 = 575 (variance)
s = 24 (standard deviation)
BAD 6243: Applied Univariate Statistics 7
Frequencies
Box Plot
BAD 6243: Applied Univariate Statistics 8
BAD 6243: Applied Univariate Statistics 9
Nature of the Distribution• Skewness
– Symmetry of the distribution– Value is zero in normal distribution– If skewed positively (pile up of scores on left) or negatively (pile up
of scores on right), standardized z scores can be useful
• Kurtosis– “Peak”edness of the distribution– Value is zero in normal distribution– If positive (peaked) or negative (flat), standardized z scores can be
useful
• Statistical TestsTests of Normality
.227 9 .199 .909 9 .310VAR00001Statistic df Sig. Statistic df Sig.
Kolmogorov-Smirnova
Shapiro-Wilk
Lilliefors Significance Correctiona.
BAD 6243: Applied Univariate Statistics 10
A Summary of Results
BAD 6243: Applied Univariate Statistics 11
Histogram with Normal Curve
Understanding Data Distributions
BAD 6243: Applied Univariate Statistics 13
Data Distributions• A data distribution is a way of representing the
frequency of occurrence of values for a variable• Data distributions can be discrete (e.g.,
Bernoulli, Binomial, Poisson) or continuous (e.g., Normal, Exponential, Uniform)
• A histogram, representing a probability density function, depicts a data distribution
• Data distributions are defined by the functional form and the values of parameters
• Our focus is on the shape of such distributions and their implications for statistical inference
BAD 6243: Applied Univariate Statistics 14
Normal Distribution• Refers to a family of distributions (a.k.a
Gaussian distributions) that are bell shaped and:– Represent a continuous probability distribution– Are symmetric (with scores concentrated in the
middle)– Can be specified mathematically in terms of two
parameters: the mean (μ) and the standard deviation (σ)
– Have one mode– Are asymptotic
BAD 6243: Applied Univariate Statistics 15
An Example
BAD 6243: Applied Univariate Statistics 16
Standard Normal Distribution
The area P under the standard normal probability curve, with the respective z-statistic
BAD 6243: Applied Univariate Statistics 17
The z Distribution• The standard normal distribution, sometimes called the z
distribution (as indicated by the formula below), is a normal distribution with a mean of 0 and a standard deviation of 1
• Normal distributions can be transformed to a standard normal distribution using the formula:
where X is a score from the original normal distribution, μ is its mean and σ is the standard deviation
• A z-score represents the number of standard deviations above or below the mean
• Note that the z distribution will only be a normal distribution if the original distribution (X) is normal
BAD 6243: Applied Univariate Statistics 18
Areas Under the Curve
The Empirical Rule: 68-95-99.7
BAD 6243: Applied Univariate Statistics 19
An Example
• If IQ scores are normally distributed, with a mean of 100 and a standard deviation of 15, – what proportion of scores would be greater
than 125?– what proportion of scores would fall between
90 and 120?– what proportion of scores would be less than
85?
BAD 6243: Applied Univariate Statistics 20
Some Key Concepts
• Central Limit Theorem– As sample size increases, the sampling distribution of
the mean for simple random samples of n cases, taken from a population with a mean equal to and a finite variance equal to 2, approximates a normal distribution
• Sampling Distribution of the Mean*• Standard Deviation vs. Standard Error of
the Mean• Sample Size vs. Number of Samples• Other Distributions
The Central Limit Theorem
21