16
ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006 STATISTICS IN PSYCHOLOGY SECTION-A Answer the following question in about 1000 words (wherever applicable) each. Q1. Describe the major statistical techniques for organising the data with suitable example and diagrams. Ans. The four major statistical techniques for organising the data are as follows: (1) Classification: Classification stands for the entire process of sorting out things ‘similar’ from amongst a group of things of different characteristics. In other words, a diverse heterogeneous data is classified into separate homogeneous classes according to the identity of characteristics that exist amongst the different individuals or quantities constituting the data. The process of arranging the data in groups and classes according to resemblance and similarities is technically called classification. According to Corner, “Classification is the process of arranging things (either actually or notionally) in groups or classes according to their resemblances and affinities and gives expression to the unity of attributes that may subsist amongst a diversity of individuals.” Secrist had defined classification as, “It is the process of arranging data into sequences and groups according to their common characteristics or separating them into different related parts.” According to Tuttle A.M., “A classification is a scheme for breaking a category into a set of parts called classes according to some precisely defined differing characteristics possessed by all the elements of the category.” In classification of data, units having common characteristics are placed in a class and in this fashion; the whole data are divided into a number of classes. (2) Tabulation: Tabulation is the process of summarising classified or grouped data in the form of a table so that it is easily understood and an investigator is quickly able to locate the desired information. A table is a systematic arrangement of classified data in columns and rows. Thus, a statistical table makes it possible for the investigator to present a huge mass of data in a detailed and orderly form. It facilitates comparison and often reveals certain patterns in data which are otherwise not obvious. Classification and tabulation as a matter of fact, are not two distinct processes. Actually, they go together. Before tabulation, data are classified and then displaced under different columns and rows of a table. Example of data in tabular form is as follows:

ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

ASSIGNMENT REFERENCE MATERIAL (2017-18)

M.P.C.-006

STATISTICS IN PSYCHOLOGY

SECTION-A

Answer the following question in about 1000 words (wherever applicable) each.

Q1. Describe the major statistical techniques for organising the data with suitable example and

diagrams.

Ans. The four major statistical techniques for organising the data are as follows:

(1) Classification: Classification stands for the entire process of sorting out things ‘similar’ from

amongst a group of things of different characteristics. In other words, a diverse heterogeneous data is

classified into separate homogeneous classes according to the identity of characteristics that exist

amongst the different individuals or quantities constituting the data. The process of arranging the data

in groups and classes according to resemblance and similarities is technically called classification.

According to Corner, “Classification is the process of arranging things (either actually or notionally)

in groups or classes according to their resemblances and affinities and gives expression to the unity of

attributes that may subsist amongst a diversity of individuals.”

Secrist had defined classification as, “It is the process of arranging data into sequences and groups

according to their common characteristics or separating them into different related parts.”

According to Tuttle A.M., “A classification is a scheme for breaking a category into a set of parts

called classes according to some precisely defined differing characteristics possessed by all the

elements of the category.”

In classification of data, units having common characteristics are placed in a class and in this fashion;

the whole data are divided into a number of classes.

(2) Tabulation: Tabulation is the process of summarising classified or grouped data in the form of a

table so that it is easily understood and an investigator is quickly able to locate the desired

information. A table is a systematic arrangement of classified data in columns and rows. Thus, a

statistical table makes it possible for the investigator to present a huge mass of data in a detailed and

orderly form. It facilitates comparison and often reveals certain patterns in data which are otherwise

not obvious. Classification and tabulation as a matter of fact, are not two distinct processes. Actually,

they go together. Before tabulation, data are classified and then displaced under different columns and

rows of a table. Example of data in tabular form is as follows:

Page 2: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

(3) Graphical Presentation: Besides formal tables, statistical data can also be presented in the form

of various types of graphs. Graphs are a useful way of conveying information very quick and in brief.

With the same ease and efficiency, they help in comparing data over time and space. They are visual

aids and have a powerful impact on the people.

It is often said, “A picture is worth a thousand words.” They attract a reader's attention to what they

are supposed to convey about the data. The graphical methods of presentation for the table 1.12 are as

follows:

(1) Histogram, Frequency Polygon and Frequency Curve Histogram is a very common type of graph

for displaying classified data. It is a set of rectangle erected vertically. It has the following features:

(i) It is a rectangular diagram;

(ii) Since the rectangles are drawn with specified width and height, histogram is a two dimensional

diagram. The width of a rectangle equals the class interval and height

(iii) The area of each rectangle is proportional to frequency of the respective class.

(4) Diagrammatical Presentation: Diagram is an ideal method of presenting statistical data in visual

form. By diagram we mean the pictorial representation of the quantitative information with a view to

make them easily and readily understandable. There are various forms of diagrams which are as

follows:

(1) Bar Diagrams

(i) Simple Bar Diagram: Like line diagrams, these figures are also used where only single dimension.

Procedure is almost the same, only one thickness of lines is measured. These can also be drawn either

vertically or horizontally. Breadth of these lines or bars should be equal. Similarly, distance between

these bars should be equal. The breadth and distance between them should be taken according to

space available on the paper.

Page 3: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

(ii) Multiple Bar Diagram: The diagram is used, when we have to make comparison between more

than two variables. The number of variables may be 2, 3, 4 or more. In case of 2 variables, pair of bars

is drawn. Similarly, in case of 3 variables, we draw triple bars. The bars are drawn on the same

proportionate basis as in case of simple bars. The same shade is given to the same item. Distance

between pairs is kept constant.

(iii) Sub-divided Bar Diagram: The data which is presented by multiple bar diagram can be presented

by this diagram. In this case, we add different variables for a period and draw it on a single bar as

shown in the following examples. The components must be kept in same order in each bar. This

diagram is more efficient if number of components is less, i.e. 3 to 5.

(2) Pie Diagram

Page 4: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

The pie diagram is a circle. Within each circle, there is a section that represents a percentage of a

whole number. These sections are displayed like sections of a pie, which is where the graph's name

originates. The pie graph can have as many pieces as necessary to represent the data. In constructing a

pie diagram, the first step is to convert the various values of components of the variable into

percentages and then the percentages transposed into corresponding degrees. The total percentage of

the various components, i.e. 100 is taken as 360o (degrees around the centre of a circle) and the degree

of various components are calculated in proportion to the percentage values of different components.

It is expressed as:

°360

100× Component's Percentage

It should be noted that in case the data comprises of more than one variable, to show the two-

dimensional effect for making comparison among the variables, we have to obtain the square root of

the total of each variable. These square roots would represent the radius of the circles and then they

will be subdivided. A pie diagram helps us in emphasising the area and in ascertaining the relationship

between the various components as well as among the variables. However, compared to a bar

diagram, a pie diagram is less effective for accurate interpretation when the components are in large

numbers.

Q2. Using ANOVA find out if significant difference exists between the scores obtained by the

three groups of employees on Work Motivation.

Ans. K=3 (i.e., 3 groups), n=10 (i.e., each group having 10 cases), N=30 (i.e., the total number of

units in the group)

Null hypothesis Ho = µ1= µ2= µ3

Page 5: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

Thus,

GROUP A GROUP B GROUP C

X1 X12 X2 X2

2 X3 X3

2

12 144 67 4489 43 1849

12 144 45 2025 45 2025

14 196 32 1024 34 1156

43 1849 32 1024 33 1089

34 1156 23 529 21 441

45 2025 23 529 11 121

54 2916 12 144 16 256

34 1156 43 1849 27 729

45 2025 45 2025 26 676

65 4225 56 3136 32 1024

∑∑∑∑X1=358 ∑∑∑∑X12=15836 ∑∑∑∑X2=378 ∑∑∑∑X2

2=16774 ∑∑∑∑X3=288 ∑∑∑∑X3

2=9366

10 10 10

35.8 37.8 28.8

Page 6: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

Source of variance df SS MSS F Ratio

Among the groups (K-1)

3-1 = 2

-12369.7 -6184.85

8.61

Within the groups (N-K)

30-3=27

-19393.2 -718.2

From F table for 2 and 27 df at .05 level, the F value is 21.3. Our calculated F value is 8.61, which is

greater than F value. Therefore, the F ratio is significant at 0.5 level. Thus, the null hypothesis is

rejected.

Q3. Define nonparametric statistics. Discuss assumptions, advantages and disadvantages of

nonparametric statistics.

Page 7: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

Ans. Non-parametric statistics are not based on the assumption of normal distribution of population.

Therefore, these are also known as distribution free statistics. They are not bound to be used with

interval scale data or normally distributed data. The data with non-continuity are to be tackled with

such statistics. In the samples, where it is difficult to maintain the assumption of normal distribution,

non-parametric statistics are used for analysis. The samples with small number of items are treated

with nonparametric statistics because of the absence of normal distribution. These can be used even

for nominal data along with the ordinal data. Some of the usual non-parametric statistics include chi-

square, Spearman’s rank difference method of correlation, Kendall’s rank difference method and

Mann- Whitney U-test.

Assumptions of Non-parametric Statistics are as follows:

(1) We face many situations, where we cannot meet the assumptions and conditions, and thus, cannot

use parametric statistical procedures. In such situation, we are bound to apply non-parametric

statistics.

(2) If our sample is in the form of nominal or ordinal scale, and the distribution of sample is not

normally distributed, and also the sample size is very small; it is always advisable to make use of the

non-parametric tests for comparing samples and to make inferences or test the significance or

trustworthiness of the computed statistics. In other words, the use of non-parametric tests is

recommended in the following situations:

(i) Where sample size is quite small. If the size of the sample is as small as N=5 or N=6, the only

alternative is to make use of non-parametric tests.

(ii) When assumption like normality of the distribution of scores in the population are doubtful, we

use nonparametric tests.

(iii) When the measurement of data is available either in the form of ordinal or nominal scales or

when the data can be expressed in the form of ranks or in the shape of + signs or – signs and

classification like “good-bad”, etc. we use non-parametric statistics.

Advantages: Few advantages of using non-parametric statistics are as follows:

• Non-parametric methods can be used to analyse categorical (nominal scaling) data, rank (ordinal

scaling) data and interval (ratio scaling) data.

• Non-parametric methods are generally easy to apply and quick to compute when sample size is

small.

• Non-parametric methods require few assumptions but are very useful when the scale of

measurement is weaker than required for parametric methods. Hence, these methods are widely used

and yield a more general broad-based conclusion.

• Non-parametric methods provide an approximate solution to an exact problem whereas parametric

methods provide an exact solution to an approximate problem.

• Non-parametric methods provide solution to problems that do not require to make the assumption

that a population is distributed normally or any specific shape.

Limitations: Few major limitations of non-parametric statistic are follows:

Page 8: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

• Non-parametric methods should not be used when all the assumptions of the parametric methods can

be met. However, they are equally powerful when assumptions are met, when assumptions are not

met these may be more powerful.

• Non-parametric methods require more manual computational time when sample size gets larger.

• Table values for non-parametric statistics are not as readily available as of parametric methods.

• Non-parametric tests are usually not as widely used and not well known as parametric tests.

SECTION-B

Answer the following questions in about 400 words (wherever applicable) each.

Q1. Discuss type I and type II error and describe steps in setting up the level of significance.

Ans. When a statistical hypothesis is tested, there are four possible results:

(1) The hypothesis is true but out test rejects it.

(2) The hypothesis is false but our test accepts it.

(3) The hypothesis is true and our test accepts it.

(4) The hypothesis is false and our test rejects it.

Obviously, the first two possibilities lead to errors. If we reject a hypothesis when it should be

accepted (possibility No. 1), we say that a Type I error has been made. On the other hand, if we accept

a hypothesis when it should be rejected (possibility No. 2), we say that a Type II error has been made.

In either case, a wrong decision or error in judgment has occurred. The sizes of Type-I error and

Type-II error are denoted by α and β respectively. The usual practice in testing of hypothesis is to fix

α, the size of Type-I error and they try to obtain a criterion which minimises β, the size of Type-II

error.

Steps: The steps in setting up the level of significance are as follows:

(1) State the null hypothesis and the alternative hypothesis. (Note: The goal of inferential statistics is

to make general statements about the population by using sample data. Therefore in testing

hypothesis, we make our predictions about the population parameters).

(2) Set the criteria for a decision.

(3) Level of significance or alpha level for the hypothesis test: This is represented by μ which is the

probability used to define the very unlikely sample outcomes, if the null hypothesis is true.

Now suppose, a manufacturer produces some type of articles of good quality. A purchaser by chance

selects a sample randomly. It so happens that the sample contains many defective articles and it leads

the purchaser to reject the whole product. Now, the manufacturer suffers a loss even though he has

produced a good article of quality. Therefore, this Type-I error is called “producers risk”.

On the other hand, if we accept the entire lot on the basis of a sample and the lot is not really good,

the consumers are put to loss. Therefore, this Type-Il error is called the “consumers risk”.

Page 9: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

Q2. Using Spearman’s rho for the following two sets of scores.

Ans.

Page 10: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

Q3. With the help of Mann Whitney ‘U’ test find if significant difference exists between the

scores obtained on attitude towards health of individuals belonging to low Socio Economic

Status and high Socio Economic Status.

Ans. First rank the ratings from lowest to highest score of the both scores.

Page 11: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

Q4. Describe partial correlation with suitable example.

Ans. A partial correlation between two variables is one that partials out or nullifies the effects of a

third variable (or a number of other variables) upon both the variables being correlated. Two

variables, A and B, are closely related. The correlation between them is partialled out, or controlled

for the influence of one or more variables is known as partial correlation. Therefore, when it is

assumed that some other variable is influencing the correlation between A and B, then the influence of

this variable(s) is partialled out for both A and B. Hence, it can be considered as a correlation between

two sets of residuals. Suppose a simple case of correlation between A and B is partialled out for C.

This can be represented as rAB.C which is read as correlation between A and B partialled out for C. The

correlation between A and B can be partialled out for more variables as well.

For example, the researcher is interested in computing the correlation between anxiety and academic

achievement controlled from intelligence. Then correlation between academic achievement (A) and

anxiety (B) will be controlled for Intelligence (C). This can be represented as:

rAcademic Achievement (A) Anxiety (B). Intelligence (C). To calculate the partial correlation (rp),we will need a data on

all three variables. The computational formula is as follows:

( )( )AB AC BC

p AB.C

AC BC

r r rr r

r r

−= =

− −2 2

1 1

The data of academic achievement, anxiety and intelligence is given below in Table. Here, the

academic achievement test, the anxiety scale and intelligence test is administered on ten students.

Table: Data of academic achievement, anxiety and intelligence for 10 students

Page 12: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

In order to compute the partial correlation between the academic achievement and anxiety partialled

out for Intelligence, we first need to compute the Pearson’s Product moment correlation coefficient

between all three variables.

The correlation between anxiety (B) and academic achievement (A) is – 0.369.

The correlation between intelligence (C) and academic achievement (A) is 0.918.

The correlation between anxiety (B) and intelligence (C) is – 0.245.

Now, we can calculate the partial correlation as:

( )( )

( )

( ) ( )( )AB AC BC

AB.C

AC BC

. . .r r rr

r r . .

.

.

− − × −−= =

− − − − −

−=

2 2 2 2

0 369 0 918 0 245

1 1 1 0 918 1 0 245

1441

0 385

=0.375

Q5. Compute Chi-square for the following data:

Ans.

Page 13: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006
Page 14: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

SECTION-C

Answer the following in about 50 words each.

Q1. Hypothesis testing

Ans. Hypothesis testing has a vital role in psychological measurements. By hypothesis, we mean the

tentative answer to any question. Hypothesis testing is a systematic procedure for deciding whether

the results of a research study, which examines a sample, support a particular theory or practical

innovation, which applies to a population. Hypothesis testing is the central theme in most psychology

research.

Q2. Measures of central tendency

Ans. Measures of central tendency are measures of the location of middle or center of a distribution.

The definition of “middle” or “center” is purposely left somewhat vague so that the term “central

tendency” can refer to a wide variety of measures. Measures of central tendency are the best way to

reduce a set of data and still retain part of the information. In other words, it is a way to summarise the

set with a single value.

Page 15: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

Q3. Scatter diagram

Ans. Scatter diagram is a graphic method of finding out correlation between two variables. By this

method, direction of correlation can be ascertained. For constructing a scatter diagram, X-variable is

represented on X-axis and the Y-variable on Y-axis. Each pair of values of X and Y series is plotted

in two-dimensional space of X–Y. Thus, we get a scatter diagram by plotting all the pair of values.

Q4. Tetrachoric Correlation

Ans. Tetrachoric correlation is a correlation between two dichotomous variables that have underlying

continuous distribution. If the two variables are measured in a more refined way, then the continuous

distribution will result. For example, attitude to females and attitude towards liberalisation are two

variables to be correlated. Now, we simply measure them as having positive or negative attitude. So

we have 0 (negative attitude) and 1 (positive attitude) scores available on both the variables. Then the

correlation between these two variables can be computed using Tetrachoric correlation ( )tetr .

Q5. Multiple correlation

Ans. The multiple correlation coefficient (R) denotes a correlation of one variable with multiple other

variables. The multiple correlation coefficient is denoted as RA.BCD....K which denotes that A is

correlated with B, C, D upto k variables.

For example, we want to compute multiple correlation between A with B and C then it is expressed as

RA.BC. In this case, we create a linear combination of the B and C which is correlated with A.

Q6. Regression equation

Ans. Regression is the study of nature of relationship between the variables, so that one may be able

to predict the unknown value of one variable for a known value of another variable.

If two variables are significantly correlated, and if there is some theoretical basis for doing so, it is

possible to predict values of one variable from the other. This observation leads to a very important

concept known as ‘Regression Analysis’.

Q7. Normal curve

Ans. The Normal Probability Curve is the ideal symmetrical frequency curve. It is supposed to be

based on the data of a population. In it, the measures are concentrated closely around the centre and

taper off from this central point or top to the left and right. There are very few measures at the low

score end of the scale; an increasing number upto a maximum at the middle position; and a

symmetrical falling-off towards the high score end of the scale. The curve exhibits almost perfect

bilateral symmetry. It is symmetrical about central altitude.

Q8. Degree of freedom

Ans. The number of degrees of freedom, usually denoted by the Greek Symbol—v (read as nu) can be

interpreted as the number of useful items of information generated by a sample of given size with

respect to the estimation of a given population parameter. Thus, a sample of size 1 generates one piece

of useful information if one is estimating the population mean but none if one is estimating the

population variance. In order to know about the variance, one need at least a sample of size n ≥ 2. The

number of degrees of freedom, in general, is the total number of observation minus the number of

independent constraints imposed on the observations.

Page 16: ASSIGNMENT REFERENCE MATERIAL (2017-18) M.P.C.-006

Q9. Measuring skewness

Ans. There are two methods to study the skewness in a distribution. These are as follows:

(1) Observation Method: There is a simple method of detecting the directions of skewness by the

inspection of frequency polygon prepared on the basis of the scores obtained regarding a trait of the

population or a sample drawn from a population.

(2) Statistical Method: To know the skewness in the distribution, we may also use the statistical

method. For the purpose, we use measures of central tendency, specifically mean and median values

and use the following formula.

( )k

Mean MedianS

−=

σ

3

Q10. Kruskal Wallis Analysis of Variance

Ans. The Kruskal-Wallis test is a non-parametric alternative to one-way ANOVA. There may be

cases where a researcher is not clear about the shape of the population. In this situation, the Kruskal-

Wallis test is a nonparametric alternative to one-way ANOVA. One-way ANOVA is based on the

assumptions of normality, independent groups and equal population variance. In order to perform one-

way ANOVA, it is essential that data is atleast interval scaled. On the other hand, Kruskal-Wallis test

can be performed on ordinal data and is not based on the normality assumption of the population.

Kruskal-Wallis test is based on the assumption of independency of groups. It is also based on the

assumption that individual items are selected randomly.