11
___________________________________________________________________________________ _________________ PSYCHOLOGY PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement Subject Psychology Paper No and Title Paper 4: Applied Psychometrics Module No and Title Module 3: Basic Statistical Concepts in Measurement Module Tag PSY_P4_M3

PSY P4 M3 e-Text - INFLIBNET Centre

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

Subject Psychology

Paper No and Title Paper 4: Applied Psychometrics

Module No and Title Module 3: Basic Statistical Concepts in Measurement

Module Tag PSY_P4_M3

Page 2: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. Frequency Distribution 4. Measures of Central Tendency 5. Measures of Variability 6. The Normal Curve 7. Divergence form Normality: Skewness and Kurtosis 8. p-value and Statistical Significance 9. Standard Scores

9.1 z score 9.2 T score 9.3 Stanine scores 9.4 Percentile Rank

10. Summary

Page 3: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

1. Learning Outcomes

After studying this module, you shall be able to:

• Review the basic statistical concepts that are basis of psychometric theory and principles. • Understand frequency distributions and measures of central tendency and variability. • Understand normal curve and divergence from normality. • Define and differentiate among various types of standard scores.

2. Introduction

One of the primary roles of a psychometrician is that of a testtaker. Psychometricians design tests and use them in different settings like business, academics, clinics, counseling etc. Measurement of individual performance by means of psychological tests is usually expressed as ‘Scores’. The scores obtained by an individual on a test are expressed in the form of numbers which are then subjected to statistical methods for drawing meaning and interpretation. Data collected from tests and experiments may have little significance to a psychometrician until he knows the art of arranging and classifying scores in some meaningful way. The primary task, therefore, is to classify and organize the test scores in a way so that some meaning can be derived out of it. Statistics is the science of classifying, organizing and analyzing the data. Without statistical tools, a psychometrician is in a fix to comprehend the meaning of test scores and results of a carefully controlled experiment. Thus, it will be useful to look upon some basics statistical methods so that advanced computations used in psychological assessment can be understood. In this module, we will review frequency distribution, measures of central tendency, measures of variability, characteristics of normal distribution, skewness and kurtosis, p values and statistical significance. Finally linear conversion of raw scores into standard scores used extensively in measurement literature is also discussed.

3. Frequency Distribution

“A distribution can be defined as a set of test scores arranged in a systematic way for study purpose”. A distribution helps to compare the performance of one with the others. Out of many ways that exist to arrange the test scores, the commonest is to organize them by the frequency with which they are distributed in the raw scores. A frequency distribution groups scores under subheads or into classes according to defined rules. In a frequency distribution, scores are tabulated together with the number of times each score occurred.

“A grouped frequency distribution is another kind of frequency distribution in which class intervals are used to represent the actual test scores”. The total number of class intervals and width of each class interval depends on the convenience and choice of statistician. However, commonly used grouping intervals are 3, 5 and 10 units in length as these are somewhat easier to work with in later calculations. In a grouped frequency distribution, the higher class interval is called ‘the upper limit’ and the lower class interval is termed as ‘the lower limit’ of the distribution.

Page 4: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

Exercise:

Following are the marks obtained by a class, consisting of 30 students, in their psychology mid-term examination. Tabulate the following raw scores into (a) frequency distribution and (b) grouped frequency distribution of class interval 3 and (c) class interval 5.

84 92 82 84 84 79 82 78 90 78 90 85 89 83 77 92 86 89 78 86 79 90 85 91 83 87 85 91 83 90

Solution:

Table: 1 Frequency Distribution (n = 30)

Table 2: Grouped Frequency Distribution (Class Interval =3)

Table 3: Grouped Frequency Distribution (Class Interval = 5)

Score Frequency Class Interval Frequency Class Interval Frequency 77 1 77 - 79 6 77- 81 6

78 3 80 - 82 2 82 - 86 13

79 2 83 - 85 9 87 - 91 9 82 2 86 - 88 3 92 - 96 2

83 3 89 - 91 8 Total 30 84 3 92 - 94 2

85 3 Total 30 86 2

87 1 89 2

90 4 91 2

92 2 Total 30

4. Measures of Central Tendency

“Central Tendency is a statistical tool that measures the ‘average’ or ‘central position’ score between the extreme scores in a distribution”. It gives a concise description of the performance of a group as a whole and helps an experimenter / researcher to compare two or more groups in terms of a typical performance. The three most commonly used measures of central tendency are arithmetic mean (also called average), median (also called middle score of the distribution) and mode.

“Arithmetic mean or average represents sum of total scores in a distribution divided by total number of scores”. Mean is considered as a ‘balance point’ of the distribution about which the

Page 5: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

negative deviations equals that of the positive deviations. Mean is highly sensitive to the presence or absence of extreme scores in a distribution.

“The Median also termed as ‘middle score of the distribution’, divides the score in a distribution into exact two halves- the lower half and the upper half”. Median is responsive to the number of scores above or below it but not to their magnitude and thus is less affected by extreme scores than the mean. If the total number of scores (n) in a raw distribution is an odd number, median represents the exact middle point, with one-half of the left over scores lying above it and the other half of the left over scores lying below it. If the total number of scores (n) in the raw distribution is an even number, the median is computed by finding the average of the two middle scores.

Mode is the third measure of central tendency generally used with qualitative data having characteristics of nominal scale. “Mode is the most frequently occurring score in a distribution of scores”. Though easy to compute, mode is subject to vary from sample to sample. In addition, a frequency distribution may have more than one mode for a particular set of scores.

Example:

Calculate the mean, median, and mode of the following set of raw scores.

5, 12, 7, 14, 8, 9, 7

Solution

The mean is 5 + 12 + 7 + 14 + 8 + 9 + 7 = 8.86 7

To find the median, order the income as 5, 7, 7, 8, 9, 12, 14. The median will be 8 as this is the exact middle point which divides the scores into equal upper and lower halves. From the same set of raw scores, you can see that 7 is the only score that occurs more than once. So, the mode is 7. 5. Measures of Variability

To define a distribution, we must provide a measure of variability along with the measure of central tendency. “Measures of variability determine how scores in a distribution are scattered or clustered”. The two distributions may have the same arithmetic mean and differ with respect to the measures of variability, that is, the extent to which the scores are clustered or scattered apart in the distribution may vary around their central tendency measures. The four commonly used measures of variability are: the range, the interquartile and semi- interquartile deviation, variance or the average deviation and the standard deviation.

The range is the most common measure of variability that compares two or more groups. Statistically, “it is the difference between the highest and the lowest score in a distribution”. Easy to calculate, the range is generally used as a rough measure and of little significance in inferential statistics.

Page 6: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

“The interquartile range, represented by Q, is the difference between the 75th (Q3) and 25th (Q1) quartile points in a distribution” and “the semi-interquartile range is one half of the interquartile range”. Both, interquartile range and semi-interquartile range, closely corresponds to the median in its characteristics and properties.

“The average deviation or variance is the mean of the deviations of all of the individual scores in a series taken from their mean”. Average deviation is computed by ignoring the positive or negative signs of the scores in a distribution and treating all deviation scores as positive. Deviation scores thus calculated are added and then divided by total number of scores. Variance is widely used statistical tool of measure of variability since it is resistance to sampling variation.

Finally, “the standard deviation is the square root of the variance” and is most widely used variability measure in descriptive and inferential statistics. Comparing means of two distributions becomes simple and meaningful when standard deviation is used as a measure of variability.

Exercise:

Calculate standard deviation of the scores 14, 6, 9, 15, 13, 8, 10, and 11 obtained by eight students in their psychology class.

Solution:

First computing the mean, we get = 14 + 6 + 9 + 15 + 13 + 8 + 10 + 11 8

= 10.75 Next, deviation score is calculated by subtracting each raw score with mean. Square of each deviation sore followed by adding them is the next step to calculate standard deviation

x X – (X – )2

14 3.25 10.56

6 -4.75 22.56

9 -1.75 3.06

15 4.25 18.06

13 2.25 5.06

8 -2.75 7.56

10 -0.75 0.56

11 0.25 0.06

Total 0.0 67.48 ‘

Finally, dividing 67.48 by 8 − 1 = 7 and taking the square root, we get

s = √ 9.64 ≈ 3.10

Page 7: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

6. The Normal Curve

Scores obtained on psychological tests are often said to be ‘normally distributed’ in a population. “The Normal distribution is a bell-shaped, smooth, symmetrical, unimodal, continuous curve that peaks at the center and as one moves towards the ‘tails’ of the distribution, the curve tapers on both the sides”. The tails of a normal curve are asymptotic to horizontal axis so that the area under the curve is maximum in the middle and becomes less and less on both the sides. Being symmetrical, with both the sides exactly replica of each other, the mean, the median and the mode of a normally distributed curve have same value.

Fig 1: The Normal Curve

Characteristics of the Normal Curve:

7. Divergence from Normality: Skewness and Kurtosis

A frequency distribution can be defined depending on the nature and extent to which symmetry is present or absent. If the distribution looks similar on the left and right side of centre point, it is called a normal distribution curve or a symmetrical curve. In a symmetrical distribution, the values of mean, median and mode are equal. In contrast, lack of symmetry in a distribution makes it skewed in which in the mean and the median fall at the different points in the distribution and the centre or the balance point is shifted to either the left or the right side. The presence of relatively extreme scores at the high end of the distribution makes it positively skewed. In a positively skewed distribution, the mean, median and mode relationship is represented as: Mean > Median > Mode. A distribution is said to be skewed negatively when extreme scores are present at the low end of the distribution. The relationship between mean, median and mode in a

• 50 % of the scores occur above the mean and 50 % of the scores lie below the mean. • About 34% of all scores occur between the mean and 1 standard deviation above the

mean. • About 34 % of all scores lie between the mean and 1 standard deviation below the

mean. • About 68 % of total scores lie between the mean and + 1 standard deviation. • About 95 % of total scores occur between the mean and 2 standard deviations. • About 99.7 % of total scores occur between mean and 3 standard deviations.

Page 8: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

negatively skewed distribution is: Mean < Median < Mode.

Kurtosis corresponds to the shape of a frequency distribution as compared with the normal. It describes whether the distribution is peaked or flattened. The three general types of the curves on the basis of their shape are (a) leptokurtic- a frequency distribution more peaked than the normal; (b) platykurtic- a relatively flatter curve than normal curve; and (c) mesokurtic- a normal flat curve.

(a) Skewness (b) Kurtosis

Fig 2: Divergence from Normality

8. p-value and Statistical Significance

Consider an example in which a researcher is interested to test the hypothesis that girls exhibit more shy behavior than boys. To answer the question and carry out an empirical investigation, researcher decided to collect the data. A sample of high school students, comprising of equal number of boys and girls, was selected and subjected to a questionnaire rating the level of shyness. Suppose, 70 % of the responses agreed with the fact that girls are shyer than boys and the remaining 30 % disagreed. Would it be justifiable to conclude that girls exhibit more shy and bashful behavior than boys? What would have researcher concluded if he would have got responses in the ratio of 55 % agreement to 45 % disagreement? What if the percentages were 65 % to 35 %? This signifies that there should be a “criterion” with which the researcher can claim that a “statistically significant difference” exists between girls and boys on their level of shyness. This criterion is commonly known as ‘α level of significance’. It reports that “the difference found between the groups in a sample would happen by chance alone X number of times out of 100”. Usually, X equals to 1 or 5 which means that the difference in percentages found in the sample would have to be large enough to only occur by chance 1 out of 100 times (α = .01); or 5 out of 100 time (α = .05).

Although the level of significance should be decided upon before any data are collected, many researchers fail to report the α level of significance and instead report only the p-values – “the probability level of the sample outcome if null hypothesis is true”. Suppose a p-value for a particular statistical test is equal to 0.06, then this means that results from the particular sample would occur by chance alone 3 times out of 100. If the p value equals to .050, then this can be

Page 9: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

interpreted that the results from the particular sample would occur by chance alone 50 times out of 100. Similarly, a p value of 0.007 means that the results from the particular sample would occur by chance alone 7 times out of 1,000. In psychological sciences, research workers generally make decisions and evaluate the outcome of the tests according to .05 and .01 level of significance.

9. Standard Scores

Julia and Robert, respectively, scored 123 & 86 marks in their psychology exam. To correctly define the performance of Julia and Robert, one needs to have a standard frame of reference with which the marks obtained by them can be compared and interpreted. Suppose, the mean value of psychology test scores of the class is 113 and the highest and the lowest marks obtained by the class are 145 and 76, respectively. This means that Julia and Robert have performed well in the class. Thus, raw scores, originally without meaning, becomes interpretable by relating it with a measure of central tendency and a measure of variability.

With psychological measures, meaningful interpretations of raw scores can only be done when they are converted from one scale to another scale- the latter typically being one that is more widely used and interpretable-that has some arbitrarily defined mean and standard deviation. This conversion transforms raw scores into standard scores. “A standard score provides meaning by relating the position of a raw score to other scores in the same distribution or to a distribution of scores by a representative group”. With standard scores, the relative position of a testtakers’s performance with other testtakers can be known. Standard scores are obtained by linear transformation of raw scores and therefore they have a fixed mean and standard deviation. The proportionality of the interscore distances in a distribution remains constant in the process of derivation of standard scores from raw scores. Many types of standard scores are used to interpret the meaning of raw scores in psychological assessments. Standard scores derived from raw scores may involve linear or non linear transformations. A linear transformation of raw scores preserves the proportionality of interscore distance and thus the shape of the distribution. The magnitude of differences between standard scores exactly parallels the differences between corresponding raw scores. z score and T score are examples of linear transformation of scores. A nonlinear transformation of data is required when the raw scores are not normally distributed and comparisons with normal distribution need to be made. When such a nonlinear transformation is done, the original distribution is said to have been “normalized”. Conceptually, normalizing a distribution involves “stretching” the skewed curve into a shape of a normal curve and creating a corresponding scale of standard scores- a scale that is technically referred to as a normalized standard score scale. Percentile rank and stanine scores are examples of normalized standard score scales.

9.1 z- score: One type of standard score is z score. z-score represents “how many standard deviation a raw score falls above or below the mean of the distribution (Minium )”. The mean and the standard deviation of a set of z scores are 0 and 1, respectively. Standard score ‘z’ is mathematically calculated by using the formula:

Where z represents standard score; SD

MXZ ii

)( −=

Page 10: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

Xi represents raw score; M is mean of the distribution and SD is the standard deviation of the distribution. Example: Say, we have a distribution of a set of test scores with a mean of 80 and standard deviation of 10. For the same distribution, a raw score of 92 will be equal to standard score of 1.2. Solution: z = 92 – 80 = 12 = 1.200 10 10 Properties of z score:

• The mean of a set of z-scores is always 0. • The standard deviation of a set of standardized scores is always 1. • The distribution of a set of standardized scores has the same curve as the unstandardized

scores. • Scores on a z score distribution can have both positive and negative values.

9.2 T score: is another standardized score that has “the mean set at 50 and a standard deviation set at 10”. The scale used in the computation of T scores is called a “fifty plus or minus ten” scale. T –score is calculated from z score by using the formula:

T = 10 z + 50 Thus, in the above example, the equivalent T score for z score of 1.200 will be

T = 10 z + 50 = 10 (1.20) + 50 = 62

The fact that a T – score can never have a negative value gives it an advantage over z-score. It is to be noted that a widely used personality test, Minnesota Multi-phasic Personality Inventory (MMPI) uses T –scores as standardized measure. 9.3 Stanine scores: “Stanine score is a standard score with a mean of 5 and a standard deviation of approximately 2”. The word ‘stanine’ is derived from two contractions namely ‘standard’ and ‘nine’, thus, it is a standard score divided into nine units. Stanines, widely used in educational assessments, have whole number value ranging from 1 to 9. The main advantage of stanine scores is their simplicity as just one digit is required to express the stanine’s value. But, stanine is a coarse unit of measurement because the difference between successive stanines values is one half of a standard deviation. 9.4 Percentile Ranks also represent “a type of derived scores that describes the location of a raw score in relation to the other scores in a distribution”. Percentile rank has an advantage in terms of directness of meaning. Thus a percentile rank of 68 on a psychology test means that a student performed better than 68 % of comparable students in the class. Percentile ranks are easy to comprehend and is relatively meaningful even to persons without statistical training. 10. Summary

Page 11: PSY P4 M3 e-Text - INFLIBNET Centre

____________________________________________________________________________________________________

PSYCHOLOGY

PAPER 4: Applied Psychometrics MODULE 1: Basic Statistical Concepts in Measurement

• Statistical tools are the basis with which psychological test scores are made interpretive.

Statistical data deals with large number of observations. To comprehend the meaning of these large number of responses, frequency distribution is arranged which shows the number of observations for the possible categories or scores values in a set of data.

• To describe a frequency distribution adequately, we must provide a measure of central tendency, as well as information about its variability and shape. Central tendency provides a single numerical figure that determines the centre location of a distribution of a score. Variability measures are summary figures that determine how scores are scattered or clustered in a distribution. The normal curve is a smooth, bell shaped curve which is symmetrical, continuous with two tails symptotic to horizontal axis. It serves to describe the frequency of occurrence of many variables facts with a relatively high degree of accuracy. Divergence from normalcy is determined by skewness and kurtosis.

• Standard score is a derived score from a given raw score whose meaning and interpretation becomes possible after derivation. Different types of systems for standard scores exist, each unique as regards to its mean and standard deviation. The four main types are: z scores, T scores, stanines and percentile ranks.

• The knowledge of basics of statistics is essential for a psychometrician to communicate the meaning of data.