20
A term paper on Statistics, Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra Dhakal Assistant professor, Dept. of Agricultural Economics and Agribusiness Mgmt. AFU, Rampur, Chitwan Submitted by: Kapil Khanal AEC 10 M Dept. of Agricultural Economics and Agribusiness Mgmt. AFU, Rampur, Chitwan

Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

A term paper on

Statistics,

Data,

Representation of Data,

Measures of Central Tendency and Dispersion.

Submitted to:

Shiv Chandra Dhakal

Assistant professor,

Dept. of Agricultural Economics and Agribusiness Mgmt.

AFU, Rampur, Chitwan

Submitted by:

Kapil Khanal

AEC – 10 –M

Dept. of Agricultural Economics and Agribusiness Mgmt.

AFU, Rampur, Chitwan

Page 2: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Table of Contents Statistics ........................................................................................................................................................ 3

Functions of Statistics: .............................................................................................................................. 4

Limitations of statistics: ............................................................................................................................ 4

DATA ............................................................................................................................................................. 4

Organization of data: ................................................................................................................................ 5

Frequency Distribution: ................................................................................................................................ 6

CLASSIFICATION AND TABULATION OF DATA ............................................................................................... 8

Methods of Classification:......................................................................................................................... 8

Tabulation of data: ...................................................................................................................................... 12

Representation of data: .............................................................................................................................. 14

Measures of central tendency: ................................................................................................................... 15

Measures of dispersion ............................................................................................................................... 17

Page 3: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Statistics The word “statistics” comes from the Italian word ‘statista’ (meaning “Statesman”) or the

German word ‘statistik’ each of which means a Political State. Statistics can be defined as the

body of concepts, principles and methods dealing with collection, summarization, analysis and

interpretation of data. It was first used by Professor Gottfried Achenwall (1719-1772), a

professor in Marlborough in 1749 to refer to the subject-matter as a whole.

Definitions:

1. Professor Gottfried Achenwall defined statistics as “the political science of the several

countries”.

2. Webster defined statistics as “the classified facts representing the conditions of the people

in a state…..especially those facts which can be stated in numbers or in tables of numbers

or in any tabular or classified arrangement”.

3. Prof. Horace Secrist defined statistics as “aggregates of facts affected to a market extent

by multiplicity of causes, numerically expressed, enumerated or estimated according to

reasonable standards of accuracy, collected in a systematic manner for a pre-determined

purpose and placed in relation to each other”.

This definition clearly points out certain characteristics which numerical data must possess in

order that they may be called statistics.

1. Statistics are aggregates of facts.

2. Statistics are affected to a market extent by multiplicity of causes.

3. Statistics are numerically expressed.

4. Statistics are enumerated or estimated according to reasonable standards.

5. Statistics are collected in a systematic manner.

6. Statistics are collected for a pre-determined purpose.

7. Statistics should be placed in relation to each other.

In the absence of the above characteristics, numerical data cannot be called statistics and

hence “all statistics are numerical statements of facts but all numerical statements of facts

are not statistics”.

Statistical theories are expressed comprehensively by the use of mathematical tools. Hence,

theoretical statistics is frequently called Mathematical Statistics. The branch of statistics dealing

with summarization of data is called Descriptive Statistics. The branch of statistics dealing with

analysis and interpretation of data is called inferential statistics.

Stages in a statistical investigation:

1. Collection

2. Organization

3. Presentation

Page 4: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

4. Analysis

5. Interpretation

Functions of Statistics: 1. It presents facts in a definite form.

2. It simplfies mass of figures.

3. It facilitates comparision.

4. It helps in formulating and testing hypothesis.

5. It helps in prediction.

6. It helps in the formulation of suitable policies.

Role of statistics in agricultural research:

Statistics is used in agricultural research for efficient planning of experiments, and for

interpreting experimental data. In agricultural research, efforts are made to know the hidden

regularities of some aspects of soil, crops and other related biological phenomena. It may be

yield potential of a crop, effects of pest incidence on crop yields, effects of climate change on

pests and crops, effect of fertilizer application on crop yields, effect of cross-fertilization of

different crop varieties, and the like.

Limitations of statistics: 1. Statistics studies facts relating to a group but not the individuals.

2. Statistics can be applied only in quantitative facts but not in general for qualitative facts.

3. Statistical theories can be applied only when there is variability in the experimental

material.

4. Statistical laws are only approximations. They are true on an average in the long run.

5. Statistics is liable to be misused. It can easily be manipulated and conclusion can be

drawn to suit selfish ends.

6. Statistics is only one of the methods of studying a problem.

DATA Data are the facts or figures from which conclusions can be drawn. The foundation stone of any

research, which are information of any kind, often comprising scores: measurements or

descriptions of individual sampling units are data. These measurements provide information on

the decision maker uses.

Types of data:

Depending on the source of the data, the data may be called as primary data or secondary data.

Page 5: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

1. Primary data: If the data are collected directly from the field of investigation by the

researcher or his/her representative, they are called primary data.

2. Secondary data: If a researcher uses data from any published or unpublished documents

of other organizations or persons, then they are called secondary data.

Secondary data are not as costly as primary data but question of reliability is there

because they are obtained but not directly collected by the researcher.

Based on the nature of the characteristic observed, the statistical data are classified as attribute

data and measurement data.

The observed data given as such is called raw data. When the observed data are grouped into

groups or classes, they are known as grouped data.

Organization of data: Data should be organized such that they:

1. Are concise without losing the details.

2. Arouse interest in the reader.

3. Become simple and meaningful to form impressions.

4. Need few words to explain.

5. Define the problem and suggest the solution too; and

6. Become helpful in further analysis.

Methods of collecting Primary data:

Primary data may be obtained by applying one of the following methods:

1. Direct personal interviews

2. Indirect oral interviews

3. Information from correspondents

4. Mailed questionnaire method

5. Schedules sent through enumerators

Sources of secondary data:

The sources of secondary data can broadly be classified under two heads:

1. Published sources: Reports and official publications of Governmental bodies,

International Organizations, private institutes, etc.

2. Unpublished sources: Records maintained by various Government and private offices,

studies made by research institutions, scholars, etc.

Page 6: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Frequency Distribution: The fundamental requirement in data analysis is that of counting how many times each

distinct value of a variable has occurred. The number of times a category or class occurs is

called the frequency and is denoted by ‘f’. Sorting of data into categories or classes will lead

to the formation of frequency distribution. Frequency distributions are shown in the form of a

table called the frequency table. Such arrangement of a data is called the tabulation of data.

Table: Frequency distribution of number of seeds germinated out of five in each of 50 pots.

No.of seeds germinated(X) No. of pots(f)

0 3

1 14

2 18

3 8

4 4

5 3

Total 50

Frequency Distribution using class intervals:

Class intervals are non-overlapping, contiguous intervals selected arbitrarily in such a way

that each and every value in the set of data can be placed in one, and only one, of the

intervals. This is done by dividing the range (lowest to highest value) into equal intervals of a

given size and then tabulating the frequencies associated with each interval. The number of

class intervals depends on the numbers of observations we are describing. It is generally

advised that the number of classes should lie between 5 and 20. After the number of classes

is decided, the class width is fixed by using the relationship

Class width=l-s/k

Where,

l= largest value

s= smallest value in the data set

k= no. of classes

Struge (1926) has suggested a rule to determine number of class intervals, which is as

follows:

Page 7: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

k=1+3.222(log10 n)

Where,

k= no. of classes

n= no. of observations

Table: Frequency distribution of plant growth using class intervals.

Class Growth(cm) Frequency

1 21.5-21.8 5

2 21.9-22.2 8

3 22.3-22.6 9

4 22.7-23.0 12

5 23.1-23.4 8

6 23.5-23.8 5

7 23.9-24.2 3

Cumulative frequency distribution tables:

The cumulative frequency of a class is the total frequency upto and including that class. The

table of cumulative frequencies is called a cumulative frequency distribution table. There are

two types of cumulative frequencies, i.e., (1) less than (or, from below) cumulative

frequency, and (2) more than (or, from above) cumulative frequencies. In the less than type

the cumulative frequency of each class-interval is obtained by adding the frequencies of the

given class and all the preceding classes, when the classes are arranged in the ascending

order of the value of the variable. In the more than type the cumulative frequency of each

class-interval is obtained by adding the frequencies of the given class and the succeeding

classes.

Table: Cumulative frequency distribution of weight of 68 students.

Class boundary points

(wt. in kg)

Cumulative frequency

Less than More than

9.5 0 68

19.5 6 62

29.5 16 52

39.5 41 27

49.5 56 12

59.5 68 0

Page 8: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

CLASSIFICATION AND TABULATION OF DATA Connor defined classification as: “the process of arranging things in groups or classes according

to their resemblances and affinities and gives expression to the unity of attributes that may

subsist amongst a diversity of individuals”.

Importance of classification:

1. Classification condenses the data by dropping out unnecessary details. Millions of figures

can be arranged in a few classes having common features.

2. It facilitates comparison between different sets of data clearly showing the different

points of agreement and disagreement.

3. It enables us to study the relationship between several characteristics and make further

statistical treatment like tabulation, etc.

4. It pinpoints the most significant features of the data at a glance.

Methods of Classification: There are four types of classification, viz,

(i) Qualitative classification: It is done according to attributes or non-measurable

characteristics; like social status, sex, nationality, occupation, etc. For example,

the population of the whole country can be classified into four categories as

married, unmarried, widowed and divorced. When only one attribute, e.g., sex, is

used for classification, it is called simple classification. When more than one

attributes, e.g., deafness, sex and religion, are used for classification, it is called

manifold classification.

(ii) Quantitative classification: It is done according to numerical size like weights in

kg or heights in cm. Here we classify the data by assigning arbitrary limits

known as class-limits. The quantitative phenomenon under study is called a

variable. For example, the population of the whole country may be classified

according to different variables like age, income, wage, price, etc. Hence this

classification is often called ‘classification by variables’.

(1) Discrete variable: A variable which can take up only exact values and

not any fractional values is called a ‘discrete’ variable. Number of workmen in a

factory, members of a family, students in a class, number of births in a certain

year, number of telephone calls in a month, etc., are examples of discrete-

variable.

(2) Continuous variable: A variable which can take up any numerical

Page 9: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

value (integral/fractional) within a certain range is called a ‘continuous’ variable.

Height, weight, rainfall, time, temperature, etc., are examples of continuous

variables. Age of students in a school is a continuous variable as it can be

measured to the nearest fraction of time, i.e., years, months, days, etc.

(iii) Temporal classification: It is done according to time, e.g., index numbers

arranged over a period of time, population of a country for several decades,

exports and imports of Nepal for different five year plans, etc.

(iv) Spatial classification: It is done with respect to space or places, e.g., production

of cereals in quintals in various states, population of a country according to

states, etc.

According to the nature of the variable under study, we can have two types of classified

frequency distribution. While we are dealing with continuous variables like height, weight,

distance etc, we use exclusive type of classification. It has following characteristics:

1. The upper limits are not included in the concerned classes.

2. Upper limits of the preceding classes are the lower limits of the succeeding classes.

3. There is no gap between two consecutive classes.

An example of such type of classification is given below:

Weight(kg) No. of goats

20.5-25.5 5

25.5-30.5 8

30.5-35.5 10

35.5-40.5 7

40.5-45.5 5

But for discrete variables like number of insects, number of endangered species in a site, number

of patches of wetlands etc, we use other type of classification called the inclusive type of

classification. It has following characteristics:

1. The upper limits are included in the concerned classes.

2. Upper limits of the preceding classes are no more the lower limits of the succeeding

classes.

3. There is a gap between two consecutive classes.

An example of such type of classification is given below:

No. of insects No. of plants

15-19 3

20-24 7

25-29 2

Page 10: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

30-34 4

Converting inclusive type into exclusive classification:

We can convert inclusive classes into exclusive classes by defining the “correction factor” as

c.f=(L-U)/2

Where,

L= Lower limit of succeeding class

U= Upper limit of preceding class

Let us consider the following inclusive classification

Class 5-9 10-14 15-19 20-24

Frequency 3 7 2 4

Here, c.f= (L-U)/2 = (10-9)/2 or (15-14)/2 etc

=0.5

The correction factor thus obtained is subtracted from each of the lower limits of the classes and

is added to each of the upper limits of them.

So that,

5-0.5=4.5 and 9+0.5=9.5

And so on.

Principles of classification:

1. Decide the number of classes.

2. Find the class interval.

3. Select the lower limit of the first class as either the lowest score or a value slightly less

than the lowest score.

4. Add the class interval to the lower limit of the first class to get the lower limit of the

second class. Add the class interval to the lower limit of the second class to get the lower

limit of the third class and so on.

5. List the lower class limits in a vertical column and enter corresponding upper class limits

(which are also identified by adding class interval to the corresponding lower class

limits).

6. Represent each score in a class by a tally.

Page 11: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

7. Replace the tally marks in each class with the total frequency count of that class.

Page 12: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Tabulation of data: One of the simplest and most revealing devices for summarizing data and presenting them in a

meaningful fashion is the statistical table. A table is a systematic arrangement of statistical data

in columns and rows.

Significance of tabulation:

1. It simplifies complex data.

2. It facilitates comparison.

3. It gives identity to the data.

4. It reveals patterns.

Parts of a table:

1. Table number

2. Title of the table

3. Caption

4. Stub

5. Body of the table

6. Head note

7. Footnote

The following is a specimen of table indicating the above parts:

Title Head note

Stub

Heading

Caption

Column heading-Column heading

Stub

Entries

Body

Footnotes Table number Source

Format of a table

Page 13: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Types of tables:

Tables may be broadly classified into two categories:

1. Simple and complex tables

2. General purpose and special purpose (or summary) tables

1. Simple and complex tables

i. Simple table: In a simple table, only one characteristic is shown. Hence, this type of

table is also known as one-way table. This is the simple type of table. The following

is the illustration of such a table:

No. of employees in a bank according to age group

Age(in years) No. of employees

Below 20 4

20-30 14

30-40 8

40-50 6

Above 50 3

Total:35

ii. Complex tables: In a complex table, two or more characteristics are shown. When

two characteristics are shown, such a table is known as two-way table or double

tabulation. When three characteristics are shown in a table, it is known as treble

tabulation.

A two-way table is formed when either the stub or the caption is divided into two

coordinate parts. The following is the illustration of such a table:

No. of employees in a bank in different age groups according to age

Age (in years) Employees Total

Males Females

Below 20 2 2 4

20-30 10 4 14

30-40 5 3 8

40-50 3 3 6

Above 50 2 1 3

Total 22 13 35

iii. Higher order table: When three or more characteristics are represented in the same

table, it is called higher order table.

Page 14: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

2. General purpose and special purpose (or summary) tables

i. General purpose table: It provides information for general use or reference. It usually

contains detailed information and is not constructed for specific discussion.

ii. Special purpose (or summary) tables: It provides information for particular

discussion. These tables are also called derivatives tables since they are often derived

from general tables.

Representation of data: A very important role of statistics is to reduce the huge mass of data in order to make it

intelligible to the investigator, so that he may be able to draw some definite conclusions about

the population. If they are presented by some kind of diagram, the relation between the quantities

can be easily illustrated, since a pictorial representation usually makes a clearer impression on

the mind than the columns of figures, and prepare grounds for further analysis and interpretation.

Merits of diagrammatic representation:

1. Attractiveness and effectiveness

2. Readily intelligible

3. Comparison possible

4. Saving of time and energy

5. Visible and clear at a glance

Types of diagrams:

1. One dimensional diagrams: eg. Simple bar diagram, multiple bar diagram, sub-divided

bar diagram

a. Simple bar diagram- The simple bars are the thick or thin lines without any sub-

division with equal width whose lengths are proportional to the given figures. They

are drawn at equal distances.

Example: The following table gives the population of 3 districts of Nepal during 2001

and 2011.

Year Population(lakhs)

Chitwan Kathmandu Jumla

2001 4 10 3

2011 6 15 4

b. Multiple bar diagram-

2. Two dimensional diagrams: eg. Pie-diagram

Page 15: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Measures of central tendency: The value which can effectively represent all the values of a distribution is called as measure of

central tendency or averages. The name measures of central tendency is so given because they

are those measurements which are intended to capture the value at the center of a particular

group of data.

Averages are the representative of the frequency distribution. They give group characteristics in

an abbreviated form. They do not tell anything about any individual case.

Characteristics of a good average:

1. It should take all the items into consideration.

2. It should not be affected by extreme values.

3. It should be stable from sample to sample.

4. It should be as close to maximum number of observed values as possible.

5. It should be capable of being rapidly and easily calculated, and easily be understandable.

6. It should be capable of being used for further statistical analysis.

Mean: Mean is the value that represents the whole data of a population. It is arrived at by

dividing the sum of observations by the total numbers of observations. Symbolically, for a

sample of ‘n’ observations:

Mean is of three types:

(i) Arithmetic mean

Arithmetic mean is the most common and widely used method to compute measures of

central tendency. It is obtained by adding the scores and dividing the total by the number

of scores.

Advantages:

It is rigidly defined, based on all observations.

It can be easily and readily calculated.

The general nature of this average is easily comprehensible.

Its algebraic treatment is easy.

It is also least affected by fluctuations of sampling.

Disadvantages:

It is very much affected by values at extremes.

Its value may not coincide with any of the given values.

Like median and mode, neither it can be located on the frequency curve nor it can

be obtained by inspection.

(ii) Geometric mean

Page 16: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Geometric mean is the nth root of the product of the non-zero and non-negative scores. It

is denoted by ‘G’.

(iii)Harmonic mean

Harmonic mean is the reciprocal of arithmetic mean of the reciprocals of the given set of

values. It is denoted by ‘H’.

Weighted Arithmetic mean

One of the limitations of arithmetic mean discussed above is that it gives equal

importance to all the items. But there are cases where the relative importance of different items is

not the same. When this is so, we compute weighted arithmetic mean. The term ‘weight’ stands

for the relative importance of different items. It is calculated as

Weighted Arithmetic mean= Ʃ 𝑊𝑋

Ʃ 𝑊

Where;

X represents the variable values, i.e. X1, X2….., Xn

W represents the weights attached to the variable values i.e.W1, W2,…., Wn respectively.

Median: Median is the middle most item that divides the distribution into two equal parts when

items are arranged in ascending order.

Median=( 𝑛+1

2)th item

Mode: Mode is that item which occurs most frequently. For a frequency distribution, the mode

may or may not exist. Even it if exists, it may be unimodal, bimodal or multimodal.

Mode=l+(𝑓𝑠

𝑓𝑝+𝑓𝑠×c)

Where;

l= lower limit of the modal class

fp= frequency of the class preceding the modal class

fs= frequency of the class succeeding the modal class

c= class interval

Percentile and Quartile: The percentiles are the value which divides the whole distribution into

100 equal parts; whereas the quartiles are the values which divide the whole distribution into 4

equal parts.

Page 17: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Conditions for using different averages:

1. If the data is qualitative one, mode can be used.

2. If the data is quantitative one, any one average can be used.

3. If the frequency distribution is skewed, the median and mode will be the proper average.

4. In case of symmetrical frequency distribution, either mean, or median or mode is used.

But mean is preferred.

5. Harmonic mean is used while dealing with rate, speed and prices.

6. If we are interested in relative change such as cell division, bacterial growth etc.,

geometric mean is the most appropriate average.

Measures of dispersion According to Bowley “Dispersion is the measure of the variation of the items”

The measures of dispersion measure only the degree but not the direction of the variation. The

measures of dispersion are also called averages of the second order because they are based on the

deviations of the different values from the mean or other measures of central tendency which are

called averages of the first order.

An average is the representative of a frequency distribution. But it does not tell anything about

the scatterness of observations within the distribution.

Following are the measures of dispersion:

Range

Mean Deviation

Quartile Deviation

Standard Deviation

Coefficient of variation

1. Range

It is defined as the difference between the smallest and the largest observations in a given

set of data. It is the simplest measures of dispersion.

R = L – S

Where, L= largest observation S= smallest observation

Ex. Find out the range of the given distribution: 1, 3, 5, 9, 11

The range is 11 – 1 = 10.

Range takes only maximum and minimum value into consideration. So it is unreliable or

unstable. It is affected by extreme values.

2. Mean Deviation

Page 18: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Mean Deviation is the mean of the deviation of individual values from their average

which is usually the arithmetic mean and sometimes the median.

Mean deviations are computed first by summing the absolute differences of each

observation from mean and then dividing it by number of observations. The sign of

deviations are ignored i.e. only absolute values are used, since sum of deviations from

mean is always zero.

3. Standard Deviation

This method of measuring dispersion is most widely acclaimed by statisticians since it

nearly have all properties of a good measure of dispersion.

It involves squaring the individual deviation from mean. These squared individual

deviations are summed up, then averaged and finally its square root has been identified as

a measure of standard.

4. Quartile Deviation

It is estimated using following formula:

Quartile Deviation (QD) = (Q 3–Q 1)/2

Where, Q3= third quartile Q1= first quartile

Thus quartile deviation gives the average amount by which two quartiles differ from the

median. A lower/higher value of quartile deviation in less skewed data reflects that

more/less distributions are around the median value.

For symmetrical and moderately skewed distribution, the quartile deviation is usually

two-third of the standard deviation.

Q.D.=2.SD/3

5. Coefficient of Variation

The coefficient of variation is a relative measure of dispersion. It is stable measure of

variation based on all observation.

C.V. =𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

𝑀𝑒𝑎𝑛×100%

The C.V. is a unit free measure. It is always expressed as percentage. The C.V. will be small if

the variation is small. Of the two groups, the one with less C.V. is said to be consistent.

If the means are widely different or if they are expressed in different units of measurements, then

we have to use C.V. in such situation. The C.V. is unreliable if the mean is near zero. Also, it is

unstable if the measurements scale used is not ratio scale.

Page 19: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Skewness and kurtosis:

Skewness and kurtosis deal with the nature of frequency curves .i.e. shape of a frequency

distribution.

A. Skewness

Skewness means lack of symmetry or degree of asymmetry. It estimates the direction in

which and the extent to which a curve of a frequency distribution is away from

symmetry. It occurs due to the existence of extremely large or small values in the data

set.

A frequency distribution when it is not symmetrical about the mean is said to be

asymmetrical or skewed. Similarly, a distribution is said to be symmetrical when the

values are uniformly distributed around the mean.

For a symmetrical distribution (normal distribution), the mean, median and mode are

equal.

A positively skewed distribution means that it has a long tail in the positive direction i.e.

a long right tail. For positively skewed distribution,

Mean>median>mode

A negatively skewed distribution means that it has a long tail in the negative direction i.e.

a long left tail. For negatively skewed distribution,

Mean<median<mode

Measure of skewness:

The relative measure of skewness is known as coefficient of skewness. According to Karl

Pearson, several measures are used to express the direction and extent of skewness.

Coefficient of skewness=3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛)

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Properties of skewness:

Skewness lies between +3 and -3.

Skewness is determined by β1 (b1). When β1=0 or b1=0, it means that that the curve is

symmetrical (not skewed).

When β1≤0, then negative skewness and if β1≥0, it is positive skewness.

Page 20: Statistics, Data, Measures of Central Tendency and Dispersion. · 2015. 6. 12. · Data, Representation of Data, Measures of Central Tendency and Dispersion. Submitted to: Shiv Chandra

Fig.: Types of Skewness

B. Kurtosis

Kurtosis in Greek means ‘bulginess’. Kurtosis is a measure of peakness or flatness in the

region about the mode of a frequency curve. It is of three types.

1. Leptokurtic (Leaping curve): The curve is very highly peaked than the normal curve.

In such a case, items are more closely bunched around the mode. It has β2 >3, y2>0

and the value of kurtosis is positive.

2. Mesokurtic (Normal curve): It is a normal curve. It has β2=0 and y2=0.

3. Platykurtic (Flat curve): It has more flat top than the normal curve. Here, β2<3 and

y2<0. The value of kurtosis is negative.

Fig.: Types of kurtosis