Transcript

NOTEBOOK contains a miscellaneous collection of items all designed to aid your study of biology. Whether dealing with creatures or concepts, evolution or exams, NOTEBOOK will help, inform and remind you of things that you should find useful.

Biological statistics part 3 S o fa r in this se ries we have looked at types of va ri ables . distributions. measures of central tendency and variability, and the Chi-squa red tes t. This part looks at two remaining stat istical tech niq ues you need fo r A-level - the I- test and correlat ion.

COMPARING SMALL SAMPLES The I-test was dev ised to overcome prob­lems of dea ling with small sample sizes and it is widely used in a variety of situa­tions. It compares two sets of samples by referr ing to tbe means and the sample scores aro und the mean (tbat is, the standa rd deviation or va riance). If you can cal cu late the means and va riances of the samples then you can use the I-tes t to compare them , and to determine whether they are significantly different - if they come from the sa me or different popu ­lations. Generally. we can use thi s test to compare two sets of sample data, usuall y those fro m experimental and con trol groups. It is more accurate if the sample size is 30 or over. The value of I can be calculated from the fo rm ula:

In th is form ul a (for samples A and B), is the mean, 52 is the variance (the

sum of the squares of the differences from the mean di vided by the number of observat ions), and 11 is th e number in the sample. Having obtCl ined I you need to refer to a table to determ ine its meaning.

Example: looking at lymphocyte counts A pa rasitologist was tes ting her hypothesis that the number of lymphocytes present in the blood of pat ients infected by a blood paras ite is greate r than that found in unin fected indi viduals. To make the ca lculati ons in thi s example eas ier to follow. onl y the counts from the blood of fi ve individu als in each group are con­sidered. The lymphocyte coun ts obtained from equ al areas of bl ood on prepared films are shown in Tab le 1.

Group A (infected patients) 165, 170, 151 , 164, 160

Group B (un infected individuals) 150, 155, 152, 146, 152

Table 1

(Although counts are used. the I-tes t can be used since means and variances of the data ca n be ca lc ulated. ) Are these two groups statistically diffe rent'J

To fi nd out, we first ca lcul ate the means and standard deviations for the two groups (see BIOLOGICA L SC IE i\CES Rn' IEW, Vo l. 7, No.4 , pp. 38- 41):

Standard Mean deviation

Group A 162 7.11 Group B 151 3.32

Nex t we su bstitu te these va lues in the forlllu la:

xrXB = 162-151 = 11

(It does not matter if this va lue is nega tive . since it is only the difference between the means which is important.)

= -V 12314 =35 1

I is equal to 11 di vid ed by 3.51 = 3. 13.

The I table gi ving p values for different degrees of freed om is shown in Table 2. It concentra tes on the levels of p that deal with significant differences (tha t is, p<O .05). The number of degrees of free­dom is (the num ber in sample A minus 1)

0.05

plus (the number in sample B minus 1), that is:

(nA - 1) + (nB- 1)

or alternat ively

(I1 A + " B) - 2

As with the chi- squared test , the table is entered on the left at the appropria te number of degrees of freedom . In this case it is (5 - \)+(5- 1) =8. rVloving across the ta ble \-ve find where 3.13 lies - between p = 0.05 and p = 0.01, so p<0.05. The difference between the two se ts of data is therefore significant and we can be fairly confid ent that the difference is not due to chance bu t to some other factor. Looking back at the mean fig ures (Group A1 62 and Group B, 15\) we can conclude that the infec ted patients actually have a lymphocyte count that differs from that of the uni nfec ted people. so the paras ito­logist's hypothesis is supported by these data . although many more counts would normall y be done.

deceasing value of p __ Degrees of

pvaluesfreedom (df) 0.10 0.05 0.01 0.001

1 2 3 4 5

6.31 2.92 2. 35 2. 13 2.02

12.71 4.30 3. 18 2.78 2.57

63.66 9.92 5.84 4. 60 4.03

636.60 31.60 12.92 8.61 6.87

6 7 8 9 10

1.94 1.89 1.86 1.83 1.81

2.45 2.36 2.31 2.26 2.23

3.71 3.50 3. 36 3.25 3.17

5.96 5.41 5.04 4.78 4.59

12 14 16 18 20

1.78 1.76 1.75 1.73 1.72

2.18 2. 15 2.1 2 2.10 2.09

3.05 2.98 2.92 2.88 2.85

4.32 4.14 4.02 3.92 . 3.85

22 24 26 28 30

1.72 1.71 1.71 1.70 1.70

2. 08 2.06 2.06 2.05 2.04

2.82 2.80 2.78 2.76 2.75

3.79 3.74 3.71 3.67 3.65

40 1.68 2.02 2.70 3.55

60 1.67 2.00 2.66 3.46

120 1.66 1.98 2.62 3.37

1.64 1.96 2.58 3.29

Table 2 Table of t distribution

0.01 0.001

P is greater than 0.05 (p >0.05)

pis less than 0.05 (p <0.05 )

p is less than 0.01 (p<0.01)

pis less than 0.001 (p <0.001 ) ..

Not significant •Significant ..

Highly significant ..

Very highly (NS) (fairly confident) (very confident) significant

(almost certain)

MARCH 1997 ------------------------------------------------------------------------------------ 15

i

-

•• • • • • • • • • • ••• • • • • • • • • • • • • • • • • • • • •

Self-test question I The closer the dots come to lying on a I t is possi ble to determine if a particular (a) If df = 6 and t = 2.94, what wou ld be straight line, the closer the relationship correlation is significant or not, bu t the

the value of p? between the variables; the mOre scattered equation is quite complicated and you would (b) If df = 30 and t = 4.57 , what would be the dots the less close is the re lat ionsh ip. normally use a computer programme to

the value of p? Such a relationship is known as correlation. compute this for you. Details of the form ul a (c) What do these values of p mean? If the dots slope up to the right, then it is and table can be found in stat istics texts.

a positive correlation; if they slope down Many statistics books and courses include to the right, then it is a negative correla­ Self-test question 3 use of what is knQ\.vn as the null hypothesis. tion. If the dots do not seem to have any A student wa nted to find out if there was In my opinion this on ly causes confusion pattern then there is no correlation (see any rela tionship between the length and and it is better to forget about it . However, Figure 1). breadth of privet leaves. He measured 20 if it is required for your course, details If the pa ttern of dots indica tes a pos­ leaves carefully, making sure that he can be fo und in Garvi n ( 1986) pages 9 1 sible relationshi p between the variables recorded the length and bread th for each and 93 (see Further Reading). then we can draw a line through the dots leaf. His resul ts were as fo llows:

which most closely fits all the poin ts ­Length Breadth Length Breadthsuch a line is known as the regression linePOSSIBLY RELATED VARIABLES Leaf (mm) (mm) Leaf (mm) (mm)or the line of best fit. There are a number

Often in biology questions arise concern­ of ways of drawing this line of best fit, but 1 53 24 11 41 19 ing the relationsl1ip between two or more the simplest are: 2 31 16 12 40 21 interdependent variables, where a change 3 42 19 13 38 16

(1) Place a transparent ruler on the scat­ 4 46 26 14 55 24in one characterist ic is matched by a tergraph and move it around until you 5 33 15 15 35 18change in another one. Exam ples are length think that the edge of the ruler passes 6 39 21 16 35 19and width (longe r seeels are wider), or size through the middle of the dots. Then 7 27 16 17 27 15and mass (larger shells are heavier). There draw a straight line through them. 8 33 16 18 22 13are two questions that need to be answered (2) Draw two parallel lines which enclose 9 51 22 19 21 16regarding such an interdependent rela­ most, if not all , of the dots. Then draw 10 32 20 20 49 22

tionship between variables: a straight line equidistant between your (a) Plot these data as a scattergraph. • How close is the relat ionship (correla ­ two parallel lines . (b) Is the correlation positi ve , nega tive, ortion )?

is there no apparent correlation? • What is the form of the relationship Self-test question 2

(c) Draw, if poss ib le, a line of best fit(regression )? Using the sca ttergraph given in Figure 2, through the points.use methods 1 and 2 to obtain lines of bestWe can answer these questions by plotting fit. Are they the same? the data regarding the two variables as a Answers to these questions can be found

scattergraph, in which each axis is occupied on p. 36. by one of the two variables, and each pair Hopefully, after reading parts 1-3 ofof values acts like the coordinates of a point 'Biological stati st ics' and having answered on the grid and is represented by a dot. the ques tions, statistics will be much more This resu lts in dots spread over the graph meaningful to you than when you started . grid . 'X's or circles surrounding the dots are You should be confident when tackling only used when you intend to join the points examination questions involving elements up with a line. The points are not joined of sta ti stics, and when analysing your data up by a line since each example is qu ite from project wo rk . • di stinct and separate. Exa mination of the distribution of the dots can te ll us if there is any relationship between the va ri ables, FURTHER READING

Figure 2how close it is and what form it takes. Garvin, W (1996) 'Biological statistics ­These two method s are straightforward part 2', Biological Sciences Relliew,

•• •••• • • • ••• •• •

Positive corre lation

• •• • • • •••

•• • No correlation

Figure 1

• •. :.• • • ·.:.

• Negative correlation

and qu ick. However, if you want a more Vol. 9, No.3 , pp. 7-9. accurate regression line you can improve its Garvin, W. (1995) 'Biological stati stics ­accuracy as follows: part 1', Biological Sciences Relliew,

Vol. 7, No.4, pp. 38-41. (3) Ca lcu late the means of each of the two Garvin, I.W. (1986) Skills in Advanced

variab les - X and Y. Using these Biology, 1101. 1: Dealing wilh Data, va lu es as coordinates, plot the point Stanley Thomes . X, Y with a distinct cross. This cross Rownt ree, D. (1981) Sta.ti slics 'vvi/houlshould lie on your line of best fit if you Tears, Penguin Books. have drawn it accurately. You ca n adjust the line slightly if it is not , Wilbert Garvin rotating it about this point . Wilbert Garvin is a Lecturer in Education

(Biosciences) at the Queen 's University There are more complex methods of Belfast. He is Director of the Northern obta ining the regress ion line - see Further Ireland Centre for School Biosciences and Reading. Once you have obtained a regres­ author of the Skills in Advanced Biology sion line, you can, if you wish, develop an series of books. His main area of research equation linking the two variables. is biotechnology education.

16 ---------------- -------------------- BIOLOGICAL SCIENCES REVIEW


Recommended