Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
1
Module Detail and its Structure
Subject Name Sociology
Paper Name Methodology of Research in Sociology
Module Name/Title Statistical Procedure: Parametric and Non-parametric
Module Id RMS 22
Pre-requisites Some knowledge of descriptive and inferential statistics. Knowledge of
statistical monographs and numerical symbols. Social scientific application of
quantitative presentation
Objectives To introduce learner to the use of the dual statistical method in sociological
research. This would include introduction to the basic concepts and strategies
of the non-parametric and parametric methods, and their scope and
limitations.
Keywords Statistics; Parametric and Non-parametric Processes
Role in Content
Development
Name Affiliation
Principal Investigator Prof. Sujata Patel Dept. of Sociology,
University of Hyderabad
Paper Co-ordinator Prof. Biswajit Ghosh Professor, Department of Sociology, The
University of Burdwan, Burdwan 713104
Email: [email protected]
Ph. M +91 9002769014
Content Writer Subhasis Bandyopadhyay
and
Biplab Nayak
Assistant Professor, IIEST-S
Email: [email protected]
M: 9836945013
Statistical Consultant and Freelance
Analyst
Email: [email protected]
M: +91 9836364035
Content Reviewer (CR) &
Language Editor (LE)
Prof. Biswajit Ghosh
Professor, Department of Sociology, The
University of Burdwan, Burdwan
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
2
Contents
1. Introduction………………………………………………………………………………..3
2. Learning Outcome…………………………………………………………………………4
3. Utility of Statistics in Social Research………………………………………………….....4
4. Parametric tests and analogous Non-parametric procedure……………………………….5
5. Fundamentals of parametric tests………………………………………………………….5
6. Standard Deviation - t-test of significance – ANOVA……………………………………6
Self-Check Exercise 1……………………………………………………………………..8
7. Fundamentals of Non-parametric Tests…………………………………………………...9
i) Goodness-of-Fit test based on Empirical Process- Two-sample Kolmogorov
-Smirnov Tests (Non-parametric tests)……………………………………………………9
ii) Chi-Square Test ………………………………………………………………………10
iii) The Two Sample Rank-Sum Test………………...………………………………….12
iv) The Kruskal-Wallis Test ……………………………..………………………………12
v) Spearman’s Rank Correlation………………………………………………………....13
8. Why don’t we always use Non-parametric tests?..............................................................15
Self-Check Exercise 2……………………………………………………………………15
9. Summary…………………………………………………………………………………16
Notes……………………………………………………………………………………..16
10. References………………………………………………………………………………..17
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
3
1. Introduction
Parametric and non-parametric are two broad classifications of statistical procedures. The
disciplinary domain of statistics takes important role in social sciences because it is usually
impossible to collect data from all individuals of interest (population) in a given space and time.
The only solution available to social researcher is to collect data from a subset (sample) of the
individuals of interest. Meanwhile, the real objective of the research is to know the “truth” about
the population.
Quantities such as means, standard deviations and proportions are all meaningful values and are
generally called “parameters” when social scientists are talking about a population. Since
researchers usually cannot get data from the whole population, they cannot show the values of the
parameters for that population. Researchers can, however, calculate estimates of these quantities
for their sample. When calculated from sample data, these quantities are called “statistics.”
When applying a statistical method, it is common to differentiate between quantitative and
qualitative features and variables. Nominal and ordinal variables and data are usually considered
as qualitative (attributive), while interval variables and ratio variables are considered as
quantitative (Ferguson 1966, Krneta 1987). Also, it is common to apply nonparametric statistical
methods on nominal and ordinal data, while parametric methods are used for the interval and
ratio data (Ferguson 1966).
A statistic estimates a parameter. Parametric statistical procedures rely on assumptions about the
shape of the distribution (i.e., assume a normal distribution) in the underlying population and
about the form or parameters (i.e., means and standard deviations) of the assumed distribution.
These are tests that assume a certain distribution of the data generally and fall under normal
distribution and, in an interval level of measurement with an equality of variances. Descriptive
Statistics in the forms of t-test, z-test and ANOVA are used in parametric procedures. Parametric
distributions are used as arguments to higher-level functions that compute probabilities,
expectations, random variates, or parameter estimates from data. Distributions with undetermined
parameters can be used throughout, and later the parameters can be solved for or optimized over1.
Parametric statistics is a branch of statistics which assumes that the data have come from a type
of probability distribution and makes inferences about the parameters of the distribution (Geisser
and Johnson 2006).
There is hardly any qualified answer to the question: whether a given association between two
variables is significant or not. Again in this instance, the question is whether the association
between those two variables is strong, important and report-worthy or not. In this context,
parametric statistics assumes significance. As the name suggests, parametric statistics are those
that make certain assumptions about the parameters describing the population from which the
sample is selected. Here, the term “significance” does not imply “importance” as in the general
sense of the expression. It refers only to the likelihood that relationship observed in a sample
could be attributed to sampling error alone (Babbie 2013: 470)
Non-parametric statistical procedures rely on no or few assumptions about the shape or
parameters of the population distribution from which the sample was drawn. Non-parametric
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
4
statistics assumes no normality in data, no function form in the distribution, and assumes absence
of any parent population to which a sample belongs. It is significant to note, given the internal
fluidity of non-parametric procedure, more number of social conditions are amenable to non-
parametric statistical treatment. Hence, in the numerical section, we will find various types of
non-parametric treatment of behavioural and social situations.
2. Learning Outcome
This module will be helpful to understand the use of the dual statistical methods in sociological
research. This would include introduction to the basic concepts and strategies of the non-
parametric and parametric methods, and their scope and limitations.
3. Utility of Statistics in Social Research
Sociologists seek the help of statistical tools to study cultural change in the society, family
pattern, industrial systems to name a few. They also study statistically the relation between
income and education, occupational mobility and migration and things like these. Thus, statistics
is of immense use in various sociological studies. In fact, research in most social science
disciplines requires a large amount of quantitative data. The process of handling, managing and
interpreting quantitative data collected in the process of research very often necessitate statistical
exercise. Hence, there is a strong logic to argue that sociology needs statistics.
Statistics and statistical methods have highly significant application in sociology. Functions of
statistics are numerous: the methods of descriptive statistics have an important application for
describing natural phenomena; inferential statistics is used for inductive reasoning about
unknown properties of a larger group using the known indicators of the causes; hypothesis testing
most frequently refers to the results of one, two or more causes, on the basis of which it is
possible to draw conclusions on the problem of the research, by accepting or refuting an initial
hypothesis; regression and correlation analysis, in the most simple case, examines the influence
and dependence between two or more variables. If the relationship of a greater number of
variables is examined, it is multiple regression and correlation.
Yet, there are many limitations of the use of statistics in social science. Thus, to begin with,
statistical laws are true on average. Statistics are aggregates of facts. So single observation is not
a statistics, it deals with groups and aggregates only. Second, statistical methods are best
applicable on quantitative data. Third, statistics cannot be applied to heterogeneous data. Fourth,
if sufficient care is not exercised in collecting, analyzing and interpretation the data, statistical
results might be misleading. Finally, some errors are possible in statistical decisions. Particularly
the inferential statistics involves certain errors. We do not know whether an error has been
committed or not.
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
5
4. Parametric tests and analogous Non-parametric procedures
Table 1: Difference between Parametric and Non-parametric procedures
Analysis Type Example Parametric
Procedure
Non-parametric
Procedure
Compare means
between two
distinct/independent
groups
Is the mean value of the variable in
a control group indifferent from the
mean for the same variable
assigned to an experimental group?
Two-sample t-test
Wilcoxon rank-
sum test
Compare two
quantitative
measurements taken
from the same
individual
Is there a significant change in one
parameter between baseline and
the six-month follow-up
measurement
Paired t-test
Wilcoxon signed-
rank test
Compare means
between three or more
distinct independent
groups
If the experiment has three groups,
whether the mean value of one
important variable at baseline
differed among the three groups.
Analysis of
Variance (ANOVA)
Kruskal-Wallis
test
Estimate the degree of
association between
two quantitative
variables
Is the significant variable measure
associated with some other
variable within the group?
Pearson coefficient
of correlation
Spearman’s rank
correlation
5. Fundamentals of Parametric Test:
Generally we assume that in a parametric condition, the population data have a normal
distribution. Here is an example of a normal distribution curve, often called a bell-curve.
A normal distribution is a distribution that occurs naturally in many situations. For example, such
a distribution is seen in tests like the CBSE and CAT. The bulk of students will score the average
(C), while smaller numbers of students will score a B or D. An even smaller percentage of
students score an F or an A. Half of the data will fall to the left of the mean; half of the later will
fall to the right.
Many groups follow a Normal Distribution pattern. That’s why it’s widely used in business,
statistics and in government bodies like the Planning Commission. The following may be some of
the issues:
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
6
Heights of people.
Measurement errors.
Blood pressure.
Points on a test.
IQ scores.
Salaries.
The empirical rule (three sigma) tells you what percentage of your data falls within a certain
number of standard deviation from the mean:
• 68% of the date falls within one standard deviation of the mean.
• 95% of the data falls within two standard deviations of the mean.
• 99.7% of the date falls within three standard deviations of the mean.
6. Standard Deviation, t-text and ANOVA
i) The standard deviation (SD)
SD controls the spread of the normal distribution. Standard deviation is a measure of dispersion
in statistics. It shows you how much your data is spread out around the mean or average. A
smaller standard deviation means that the data is tightly clustered around the mean; the normal
distribution will be taller. A larger standard deviation means that the data is spread out around
the mean; the normal distribution will be flatter and wider. The expression of SD is
N
i
xN1
(/1 - )2
A normal distribution has the following characteristics:
The mean, mode and median are all equal.
The curve is symmetric at the centre.
Exactly half of the values are to the left of centre and exactly half the values are to the right.
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
7
ii) t-Test of significance:
Most t-test statistics have the form t = Z/s, where Z and s are functions of the data. Typically, Z is
designed to be sensitive to the alternative hypothesis (i.e., its magnitude tends to be larger when
the alternative hypothesis is true), whereas s is a scaling parameter that allows the distribution
of t to be determined. As an example, in the one-sample t-test,
t =)/(
)/()(
)/( nS
nX
nS
Z where is the sample mean from a
sample , of size , is the ratio of sample standard deviation over
population standard deviation, is the population standard deviation of the data, and is
the population mean. The assumptions underlying a t-test are that
follows a normal distribution with mean and variance
s2 follows a χ2 distribution with p degrees of freedom under the null
hypothesis, where p is a positive constant
Z and s are independent.
iii) Analysis of Variance (ANOVA):
ANOVA is a statistical test which analyzes variance. It is helpful in making comparison of two or
more means which enables a researcher to draw various results and predictions about two or more
sets of data. The test includes one-way ANOVA, two-way ANOVA or multiple ANOVA
depending upon the type and arrangement of the data.
Like any other test, the ANOVA test has its own test statistic
The statistic for ANOVA is called the F statistic, which we get from the
F Test
The F statistic takes into consideration:
number of samples taken (I)
sample size of each sample (n1, n2, …, nI)
means of the samples ( 1, 2, …, I)
standard deviations of each sample (s1, s2, …, sI)
Example2: Following data is given about cricket teams of three countries:
Countries Number of Players Average Runs Standard Deviations
India 11 60 15
New Zealand 11 50 10
South Africa 11 70 12
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
8
Find ANOVA coefficient.
Solution: Construct the following table:
Cricket Teams n x S S2
India 11 60 15 225
New Zealand 11 50 10 100
South Africa 11 70 12 144
n = 11
p = 3
N = 33
x¯ = 60+50+70 = 60
SST=∑n (x−x¯) 2
SST=11(60−60)2+11(50−60)2+11(70−60)2= 2200
MST = SST/p−1
MST = 2200/3−1= 1100
SSE=∑ (n−1) S2
SSE = 10*225 + 10*100 + 10*144= 4690
MSE = SSE/N−p
MSE = 4690/33−3
MSE = 156.33
F = MST/MSE
F = 1100/156.33
= 7.036
Self-Check Exercise 1
Q 1. What is standard deviation?
The amount of dispersion or variation in a set of data value, or spread of the number in a data-
set from its mean value is called standard deviation. The symbol is s for a sample and ơ for a
population.
Q 2. What is ANOVA?
The full form is Analysis of Variance which is a statistical test to obtain difference in means
among at least two groups on some variables.
Q 3. Define inferential statistics
Inferential statistics is used for inductive reasoning about unknown properties of a larger
group using the known indicators of the causes.
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
9
7. Fundamentals of Non-parametric Tests
Some examples of non-parametric tests and their application in sociological case are given
below:
i) Goodness-of-Fit test based on Empirical Process: Two-sample Kolmogorov-Smirnov
Tests (Non-parametric tests)
The Kolmogorov-Smirnov Test (K-S Test) is used to decide if a sample comes from a
population with a specific distribution (Chakravarty et al. 1967). K-S test is based on the
empirical distribution function (ECDF). Given N ordered data points Y1, Y2, ..., YN, the
ECDF is defined as:
EN=n(i)/N
where n(i) is the number of points less than Yi and the Yi are ordered from smallest to largest
value. This is a step function that increases by 1/N at the value of each ordered data point. The
graph below is a plot of the empirical distribution function with a normal cumulative
distribution function for 100 normal random numbers. The K-S test is based on the maximum
distance between these two curves.
An attractive feature of this test is that the distribution of the K-S test statistic itself does not
depend on the underlying cumulative distribution function being tested. Another advantage is
that it is an exact test (the chi-square goodness-of-fit test depends on an adequate sample size
for the approximations to be valid). Despite these advantages, the K-S test has several
important limitations:
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
10
1. It only applies to continuous distributions.
2. It tends to be more sensitive near the centre of the distribution than at the tails.
3. Perhaps the most serious limitation is that the distribution must be fully specified. That
is, if location, scale, and shape parameters are estimated from the data, the critical region
of the K-S test is no longer valid. It typically must be determined by simulation.
ii) Chi-Square Test:
An important non-parametric test often used in sociological analysis is Chi-square test.
Application of chi-square distribution and chi-square test is important in cases with multiple
qualitative variables for which it is known or assumed that the variables are interrelated. Chi-
square test is quite a common test based on determining the sum of the quotient of the square
of the difference between the observed and expected frequencies and expected frequencies.
The primary use of the chi-square test is to examine whether two variables are independent or
not. What does it mean to be independent, in this sense? It means that the two factors are not
related. Typically in social science research, we are interested in finding factors that are
dependent upon each other—education and income, occupation and prestige, age and voting
behaviour. By ruling out independence of the two variables, the chi-square can be used to
assess whether two variables are, in fact, dependent or not. More generally, we say that one
variable is "not correlated with" or "independent of" the other if an increase in one variable is
not associated with an increase in another. If two variables are correlated, their values tend to
move together, either in the same or in the opposite direction. Chi-square examines a special
kind of correlation: that between two nominal variables.
In the following example, we'll use a chi-square test to determine whether there is a
relationship between gender and getting in trouble at school (both nominal variables). Below
is the table documenting the raw scores of boys and girls and their respective behaviour issues
(or lack thereof):
Gender and Getting in Trouble at School
Got in Trouble Did Not Get in Trouble Total
Boys 46 71 117
Girls 37 83 120
Total 83 154 237
To examine statistically whether boys got in trouble in school more often, we need to frame
the question in terms of hypotheses. The null hypothesis is that the two variables are
independent (i.e. no relationship or correlation) and the research hypothesis is that the two
variables are related. In this case, the specific hypotheses are:
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
11
H0: There is no relationship between gender and getting in trouble at school
H1: There is a relationship between gender and getting in trouble at school
As is customary in the social sciences, we'll set our alpha level at 0.05
Next we need to calculate the expected frequency for each cell. These values represent what
we would expect to see if there really were no relationships between the two variables. We
calculate the expected frequency for each cell by multiplying the row total by the column total
and dividing by the total number of observations. To get the expected count for the upper right
cell, we would multiply the row total (117) by the column total (83) and divide by the total
number of observations (237): (83 x 117)/237 = 40.97. If the two variables were independent,
we would expect 40.97 boys to get in trouble. Or, to put it another way, if there were no
relationship between the two variables, we would expect to see the number of students who
got in trouble be evenly distributed across both genders.
We do the same thing for the other three cells and end up with the following expected counts
(in parentheses next to each raw score):
Gender and Getting in Trouble at School
Got in Trouble Did Not Get in Trouble Total
Boys 46 (40.97) 71 (76.02) 117
Girls 37 (42.03) 83 (77.97) 120
Total 83 154 237
With these sets of figures, we calculate the chi-square statistic as follows:
For each cell, we square the difference between the observed frequency and the expected
frequency (observed frequency – expected frequency) and divide that number by the expected
frequency. Then we add all of the terms (there will be four, one for each cell) together, like so:
After we have crunched all those numbers, we end up with an obtained statistic of 1.87. Please
note here that a chi-square statistic cannot be negative because nominal variables do not have
directionality. If your obtained statistic turns out to be negative, you might want to check your
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
12
math. But before we can come to a conclusion, we need to find our critical statistic, which
entails finding our degrees of freedom. In this case, the number of degrees of freedom is equal
to the number of columns in the table minus one multiplied by the number of rows in the table
minus one, or (r-1)(c-1).
In our case, we have (2-1)(2-1), or one degree of freedom.
We also need to reference our alpha, which we set at .05. As you can see, the critical statistic
for an alpha level of 0.05 and one degree of freedom is 3.841, which is larger than our
obtained statistic of 1.87. Because the critical statistic is greater than our obtained statistic, we
cannot reject our null hypothesis.
iii) The Two Sample Rank-Sum Test
The two sample rank-sum test is a nonparametric alternative to the two sample t-test which is
based solely on the order in which the observations from the two samples fall. We will use the
following as a running example.
The logic underlying the two-sample rank-sum test is straightforward. The data consist of two
independent samples drawn from identically distributed populations. Let x1, x2, . . . , xn denote
the first random sample of size n and let y1, y2, . . . , ym denote the second random sample of
size m. Assign the ranks 1 to n + m to the combined observations from smallest to largest
without regard to sample membership and let Rk denote the rank assigned to the n + m
observations for k = 1, . . . , n + m. Let Tx and Ty denote the sums of the ranks from the first
and second samples, respectively, and let T = Tx. Finally, note that Tx + Ty = (n + m)(n + m +
1)/2 . The null hypothesis simply states that each of the possible arrangements of the n + m
observations to the two samples with n values in the first sample and m values in the second
sample occurs with equal probability. The exact lower (upper) one-sided probability value of
an observed value of T, T0, is the proportion of all possible T values less (greater) than or
equal to T0.
iv) The Kruskal-Wallis Test
This test was developed by Kruskal and Wallis (1952) jointly and is named after them. The
Kruskal-Wallis test is a nonparametric (distribution free) test, and is used when the
assumptions of ANOVA are not met. They both assess for significant differences on a
continuous dependent variable by a grouping independent variable (with three or more
groups). In the ANOVA, we assume that distribution of each group is normally distributed
and there is approximately equal variance on the scores for each group. However, in the
Kruskal-Wallis Test, we do not have any of these assumptions. Like all non-parametric tests,
the Kruskal-Wallis Test is not as powerful as the ANOVA. The following account has been
taken from www.statisticssolutions.com.
Null hypothesis: Null hypothesis assumes that the samples are from identical populations.
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
13
Alternative hypothesis: Alternative hypothesis assumes that the samples come from different
populations.
Questions like the following are answered:
How do test scores differ between the different grade levels in elementary school?
Does marketing scores differ between the different grade levels in elementary school?
Procedure:
1. Arrange the data of both samples in a single series in ascending order.
2. Assign rank to them in ascending order. In the case of a repeated value, or a tie, assign
ranks to them by averaging their rank position.
3. Then sum up the different ranks, e.g. R1 R2 R3…., for each of the different groups..
4. To calculate the value, apply the following formula:
Where,
H = Kruskal-Wallis Test statistic
N = total number of observations in all samples
Ti = Sum of the ranks assigned
The Kruskal-Wallis test statistic is approximately a chi-square distribution, with k-1 degrees
of freedom where nishould be greater than 5. If the calculated value of the Kruskal-Wallis test
is less than the critical chi-square value, then the null hypothesis cannot be reject. If the
calculated value of Kruskal-Wallis test is greater than the critical chi-square value, then we
can reject the null hypothesis and say that the sample comes from a different population.
Assumptions:
1. We assume that the samples drawn from the population are random.
2. We also assume that the cases of each group are independent.
3. The measurement scale should be at least ordinal.
v) Spearman’s Rank Correlation.
The following description has been eminently taken from the source
www.statisticssolutions.com: Spearman Correlation Coefficient is also referred to as
Spearman Rank Correlation or Spearman's rho. It is typically denoted either with the Greek
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
14
letter rho (ρ), or rs. It is one of the few cases where a Greek letter denotes a value of a sample
and not the characteristic of the general population. Like all correlation coefficients,
Spearman's rho measures the strength of association of two variables. As such, the Spearman
Correlation Coefficient is a close sibling to Pearson's Bivariate Correlation Coefficient, Point-
Biserial Correlation, and the Canonical Correlation.
All correlation analyses express the strength of linkage or co-occurrence between to variables
in a single value between -1 and +1. This value is called the correlation coefficient. A positive
correlation coefficient indicates a positive relationship between the two variables (the larger
A, the larger B) while a negative correlation coefficients expresses a negative relationship (the
larger A, the smaller B). A correlation coefficient of 0 indicates that no relationship between
the variables exists at all. However correlations are limited to linear relationships between
variables. Even if the correlation coefficient is zero a non-linear relationship might exist.
Compared to Pearson's bivariate correlation coefficient the Spearman Correlation does not
require continuous-level data (interval or ratio), because it uses ranks instead of assumptions
about the distributions of the two variables. This allows us to analyze the association between
variables of ordinal measurement levels. Moreover the Spearman Correlation is a non-
paracontinuous-level test, which does not assume that the variables approximate multivariate
normal distribution. Spearman Correlation Analysis can therefore be used in many cases
where the assumptions of Pearson's Bivariate Correlation (continuous-level variables,
linearity, and multivariate normal distribution of the variables to test for significance) are not
met.
Typical questions the Spearman Correlation Analysis answers are as follows:
Sociology: Do people with a higher level of education have a stronger opinion of whether
or not tax reforms are needed?
Medicine: Does the number of symptoms a patient has indicate a higher severity of
illness?
Biology: Is mating choice influenced by body size in bird species A?
Business: Are consumers more satisfied with products that are higher ranked in quality?
Theoretically, the Spearman correlation calculates the Pearson correlation for variables that
are converted to ranks. Similar to Pearson's bivariate correlation, the Spearman correlation
also tests the null hypothesis of independence between two variables. However this can lead
to difficult interpretations. Kendall's Tau-b rank correlation improves this by reflecting the
strength of the dependence between the variables in comparison.
Since both variables need to be of ordinal scale or ranked data, Spearman's correlation
requires converting interval or ratio scales into ranks before it can be calculated.
Mathematically, Spearman correlation and Pearson correlation are very similar in the way that
they use difference measurements to calculate the strength of association. Pearson correlation
uses standard deviations while Spearman correlation difference in ranks. However, this leads
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
15
to an issue with the Spearman correlation when tied ranks exist in the sample. An example of
this is when a sample of marathon results awards two silver medals but no bronze medal. A
statistician is even crueller to these runners because a rank is defined as average position in
the ascending order of values. For a statistician, the marathon result would have one first
place, two places with a rank of 2.5, and the next runner ranks 4. If tied ranks occur, a more
complicated formula has to be used to calculate rho, but SPSS automatically and correctly
calculates tied ranks.3
8. Why don’t we always use Non-parametric tests?
Although non-parametric tests have the very desirable property of making fewer assumptions
about the distribution of measurements in the population from which we drew our sample, they
have two main drawbacks. The first is that they generally are less statistically powerful than the
analogous parametric procedure when the data truly are approximately normal. “Less powerful”
means that there is a smaller probability that the procedure will tell us that two variables are
associated with each other when in fact they are truly associated. If you are planning a study and
trying to determine how many patients to include, a non-parametric test will require a slightly
larger sample size to have the same power as the corresponding parametric test. The second
drawback associated with non-parametric tests is that their results are often less easy to interpret
than the results of parametric tests. Many non-parametric tests use rankings of the values in the
data rather than using the actual data. Knowing that the difference in mean ranks between two
groups is five does not really help our intuitive understanding of the data. On the other hand,
knowing that the mean systolic blood pressure of patients taking the new drug was five mmHg
lower than the mean systolic blood pressure of patients on the standard treatment is both intuitive
and useful. In short, non-parametric procedures are useful in many cases and necessary in some,
but they are not a perfect solution.
Self Check Exercise 2
Q: What is the major advantage of non-parametric statistics?
It makes fewer assumptions about the distribution of measurement in the population.
Q: What type of measurement level is used in Spearman’s correlation?
It is ordinal level of measurement in most cases
Q: Which test is carried out when precondition for ANOVA is absent?
Kruskal-Wallis test does not need any assumption like ANOVA.
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
16
9. SUMMARY
Here is a summary of the major points and how they might affect statistical analyses you
perform:
Parametric and non-parametric are two broad classifications of statistical procedures.
Parametric tests are based on assumptions about the distribution of the underlying
population from which the sample was taken.
The most common parametric assumption is that data are approximately normally
distributed.
Non-parametric tests do not rely on assumptions about the shape or parameters of the
underlying population distribution: If the data deviate strongly from the assumptions of a
parametric procedure, using the parametric procedure could lead to incorrect conclusions.
Researchers should be aware of the assumptions associated with a parametric procedure
and should learn methods to evaluate the validity of those assumptions: If you determine
that the assumptions of the parametric procedure are not valid, use an analogous non-
parametric procedure instead.
The parametric assumption of normality is particularly worrisome for small sample sizes
(n < 30): Non-parametric tests are often a good option for these data.
It can be difficult to decide whether to use a parametric or non-parametric procedure in
some cases: Non-parametric procedures generally have less power for the same sample
size than the corresponding parametric procedure if the data truly are normal.
Interpretation of non-parametric procedures can also be more difficult than for parametric
procedures. Visit a statistician if you are in doubt about whether parametric or non-
parametric procedures are more appropriate for your data. The book Practical Non-
parametric Statistics 2 is an excellent resource for anyone.
Notes
1. Parametric Statistical Distribution in Wolfram Language and System Documentation
Center. Available at:
https://reference.wolfram.com/.../ParametricStatisticalDistributions.html
2. This example is taken from the following website: formulas.tutorvista.com/math/anova-
formula.html
3. Sub-sections ii – v on Non-parametric Tests are taken essentially from
www.statisticssolutions.com
Sociology
Name of Paper: Methodology of Research in Sociology
Name of Module: Statistical procedures- Parametric and Non-parametric
17
10. References
Bagdonavicius, Vilyandas, Krupois, Julius and Nikulin, Mikhail, S.. Nonparametric Tests for
Complete Data. New York: ISTE & John Wiley & Sons, 2011.
Chakravarti, I.M. Laha, R.G. and Roy, J. Handbook of Methods of Applied Statistics, Volume I
Wiley and Sons, 1967.
Connover, W.J. Practical Non-parametric Statistics. New York: John Wiley, 1971.
Fisher, R.A. Contributions to Mathematical Statistics. New York: John Wiley, 1950.
Gibbons, J. D. and Chakraborti, S. Non-parametric Statistical Inference. Boca Raton, FL: CRC
Press, 2009.
Govidarajulu, Z. Non-parametric Inference. Singapore: World Scientific, 2007.
Maritz, J.S. Distribution – Free Statistical Methods. New York: Chapman & Hall / CRC, 1995.
Rao. C.R. Linear Statistical Inference and its Applications. New York: John Wiley & Sons, 2002.
Vaart, Van Der. Asymptotic Statistics. Cambridge: Cambridge University Press, 2000.
Website: www.statisticssolutions.com