Upload
brian-ellis
View
216
Download
0
Embed Size (px)
Citation preview
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 1
Introduction
to
Statistics
Introduction
to
Statistics
Page 1
Objectives
� To define statistics
� To discuss the wide range of applications of statistics
� To discuss key statistical concepts
� To understand the branches of statistics
� To describe the levels of measurement of data
Page 2
What is Statistics?
� Science of collecting, organizing, presenting, analyzing, and interpreting data for the purpose of assisting in making more effective decision
� Branch of mathematics
� Facts and figures
� A subject or discipline
� Collections of data
Page 3
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 2
Applications of Statistics
Statistical techniques are used in a wide range
of types of scientific and social research,
including: Biostatistics, Computational
biology, Computational sociology, Network
biology, Social science, Sociology and Social
research.
Some fields of inquiry use applied statistics so
extensively that they have specialized terminology.
These disciplines include:Actuarial science
Applied information economics
Biostatistics
Business statistics
Chemometrics (for analysis of data from chemistry)
Data mining
Demography
Econometrics
Energy statistics
Engineering statistics
Epidemiology
Geography and Geographic Information Systems, specifically in Spatial analysis
Image processing
Psychological statistics
Reliability engineering
Social statistics
Page
6
Engineering statistics is a branch of statistics that has
several subtopics which are particular to engineering:
� Design of Experiments (DOE) uses statistical techniques to test and construct models of engineering components and systems.
� Quality control and process control use statistics as a tool to
manage conformance to specifications of manufacturing
processes and their products.
� Time and methods engineering use statistics to study repetitive
operations in manufacturing in order to set standards and find optimum (in some sense) manufacturing procedures.
� Reliability engineering which measures the ability of a system to
perform for its intended function (and time) and has tools for
improving performance.
� Probabilistic design involving the use of probability in product and
system design
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 3
Applications of Statistics in Business
� Accounting – auditing and cost estimation
� Finance – investments and portfolio management
� Human resource – compensation, job satisfaction, performance measure
� Operation – quality management, forecasting, MIS, capacity planning, materials control
� Marketing - market analysis, consumer research, pricing
� Economics – regional, national, and international economic performance
� International Business- market and demographic analysis.
Page 7
Key Statistical Concepts…
� Population
— a population is
� Sample
— A sample is
Page 8
Key Statistical Concepts…
� Parameter
—
� Statistic
—
Page 9
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 4
Key Statistical Concepts…
�� Populations have Parameters,Populations have Parameters,
�� Samples have Statistics.Samples have Statistics.
Parameter
Population Sample
Statistic
Subset
Page 10
Branches of Statistics
Statistics
Descriptive Statistics Inferential Statistics
Non-Parametric StatisticsParametric Statistics
Page 11
Descriptive Statistics…
� …are methods of organizing, summarizing, and presenting data in a convenient and informative way. These methods include:
� Graphical Techniques
� Numerical Techniques
� The actual method used depends on what information we would like to extract. Are we interested in…
� measure(s) of central location? and/or
� measure(s) of variability (dispersion)?
Page 12
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 5
Inferential Statistics…
� Inferential statistics is also a set of methods, but it is used to draw conclusions or inferences about characteristics of populations based on data from a sample.
Page 13
Statistical Inference…
Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample.
Parameter
Population
Sample
Statistic
Inference
What can we infer about a Population’s Parameters
based on a Sample’s Statistics?
Page 14
Statistical Inference…
We use statistics to make inferences about parameters.
Therefore, we can make an estimate, prediction, or decision about a population based on sample data.
Thus, we can apply what we know about a sample to the larger population from which it was drawn!
Page 15
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 6
Statistical Inference…Inference…
� Rationale:
•Large populations make investigating each member impractical and expensive.
•Easier and cheaper to take a sample and make estimates about the population from the sample.
� However:
Such conclusions and estimates are not always going to be correct.
For this reason, we build into the statistical inference “measures of reliability”, namely confidence level and significance level.
Page 16
Confidence & Significance Levels…
The confidence level is the proportion of times that an estimating procedure will be correct.
E.g. a confidence level of 95% means that, estimates based on this form of statistical inference will be correct 95% of the time.
When the purpose of the statistical inference is to draw a conclusion about a population, the significance level measures how frequently the conclusion will be wrong in the long run.
E.g. a 5% significance level means that, in the long run, this type of conclusion will be wrong 5% of the time.
Page 17
Process of Inferential StatisticsProcess of Inferential StatisticsProcess of Inferential StatisticsProcess of Inferential Statistics
Population
(parameter)
µ
Sample
x
(statistic )
Calculate x
to estimate µ
Select a
random sample
Page 18
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 7
Page
19
Branches of Statistics
Statistics
Descriptive Statistics Inferential Statistics
Non-Parametric StatisticsParametric Statistics
Page
20
Parametric Statistics
Parametric statistics is a branch of statistics that assumes data come
from a type of probability distribution and makes inferences
about the parameters of the distribution. Most well-known
elementary statistical methods are parametric.
Page
21
Non-parametric statistics
� distribution free methods which do not rely on assumptions that the data are drawn from a given probability distribution.
� non-parametric statistic can refer to a statistic (a function on a sample) whose interpretation does not depend on the population fitting any parametrized distributions.
� Non-parametric models differ from parametricmodels in that the model structure is not specified a priori but is instead determined from data.
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 8
Page
22
The most frequently used tests include:� Anderson–Darling test Kuiper's test
� Cliff's delta Mann–Whitney U or
Wilcoxon rank sum test
� Cochran's Q median test
� Cohen's kappa Pitman's permutation test
� Efron–Petrosian test Rank products
� Friedman two-way analysis of variance by ranks
� Kendall's tau Siegel–Tukey test
� Kendall's W Wald–Wolfowitz runs test
� Kolmogorov–Smirnov test
� Kruskal-Wallis one-way analysis of variance by ranks
� Spearman's rank correlation coefficient
� Van Elteren stratified Wilcoxon rank sum test
� Wilcoxon signed-rank test.
Types of Data & Information
Data (at least for purposes of Statistics) fall into three main groups:
� Interval Data
� Nominal Data
� Ordinal Data
Page 23
Interval Data…
Interval data
• Real numbers, i.e. heights, weights, prices, etc.
• Also referred to as quantitative or numerical.
Arithmetic operations can be performed on Interval Data, thus its meaningful to talk about 2*Height, or Price + $1, and so on.
Page 24
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 9
Nominal Data…Nominal Data…
Nominal Data
• The values of nominal data are categories.
E.g. responses to questions about marital status, coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4
Because the numbers are arbitrary, arithmetic operations don’t make any sense (e.g. does Widowed ÷ 2 = Married?!)
Nominal data are also called qualitative or categorical.
Page 25
Ordinal Data…Ordinal Data…
Ordinal Data appear to be categorical in nature, Ordinal Data appear to be categorical in nature, but their values have an but their values have an orderorder; a ranking to them:; a ranking to them:
E.g. College course rating system:E.g. College course rating system:
poor = 1, fair = 2, good = 3, very good = 4, poor = 1, fair = 2, good = 3, very good = 4, excellent = 5excellent = 5
While its still not meaningful to do arithmetic on While its still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can this data (e.g. does 2*fair = very good?!), we can say things like:say things like:
excellent > poorexcellent > poor oror fair < very goodfair < very good
That is, order is maintained no matter what That is, order is maintained no matter what numeric values are assigned to each category.numeric values are assigned to each category.
Page 26
Calculations for Types of Data
As mentioned above,
• All calculations are permitted on interval data.
• Only calculations involving a ranking process are allowed for ordinal data.
• No calculations are allowed for nominal data, only counting the number of observations in each category is possible.
This lends itself to the following “hierarchy
of data”…
Page 27
Introduction to Statistics STATISTICAL ANALYSIS
YHJ@SOM, USM Page - 10
Hierarchy of Data…IntervalInterval
Values are real numbers.Values are real numbers.
All calculations are valid.All calculations are valid.
Data may be treated as ordinal or nominal.Data may be treated as ordinal or nominal.
OrdinalOrdinal
Values must represent the ranked order of the data.Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.Calculations based on an ordering process are valid.
Data may be treated as nominal but not as interval.Data may be treated as nominal but not as interval.
Nominal Nominal
Values are the arbitrary numbers that represent Values are the arbitrary numbers that represent categories.categories.
Only calculations based on the frequencies of Only calculations based on the frequencies of occurrence are valid.occurrence are valid.
Data may not be treated as ordinal or interval.Data may not be treated as ordinal or interval.Page 28
Page
29
End of discussion