Word length and its semantic complexity

WORD LENGTH AI{DITS SEMANTIC COMPLEXITY

!

Janrun Elts

Tlre relatioruhip between the word semantic complexiry

and its length is studied on 1356 nouns from Rwsian popular-

scientific biology texts. The correlation between the word length

aild its abstractness w(N 0.!re, between the word length and its

terminological@ w(N 0.86. Tlre mean length of nouns is a simple

characteristic oI their semantic compluity.

It is of vital importance that the readability of any

written material should be rnatched to the reading abilities

of the students to whom it will be present (Richardson,

[-ock, 1993), One of the important factors influencing on

readability ,is semantic complexity of text. Teachers ofbiology krtow scientific terms and present them to their

students in the due course of teaching. But there is a

danger of overemphasizrng terminology in speech and inwritten text so that it may become a teaching gol in its own

rights.Although technical language may have a vital role

to play in science, students often find it difficult. It may be

explained by the langrrage register of the subjects taught at

school, including science, being impersonal. The terms intextbooks do not belong to students' everyday vocabulary.

As this terminology is alien to the student, it can act as a

barrier to learning (Richardson, [-ock, 1993).

If students, reading a text, misunderstand it they

cannot resort to rephrasing or seek clarification the way ithappens in a face-to-face sinntion. Students may thfutk that

they understand a word that they do not, or may be trnsure

l5* 115

of the meaning of a word that they actually understandwell. Questions fornn a specific kind of school texts.R.Thereadgold ( 1982) mentions that examination questionsare usually written in a formal style with frequent u$e ofpassive impersonal and imperative verhs. So, in addition tothe problem caused by lorry and abstract words or terms theformal syntax has also an imporhnt influence on the levelof text comprehension.

Many scientific terrns are derived from Greek andLatin words. These tenns, for example polymorphonucleargranulocyte, ffioy seern long and difficult, and students,therefore, may see the concepts that they represent as

irrelevant (Evans , 1973, Metzyn, 1987). The percentincidence of Graeco-I-atin words seems to correlate withthe reading ages of texts. This may be because

Graeco-Iatin words are often used to represent moreabstract ideas than words of native origin, ffid so theirincidence gives an indication of the semantic complexity oftexts. Another reason for using Graeco-I-atin words in textsis that they have a high status and make the writing appearmore academic. For both of the reasons texts with a highincidence of Graeco-l,atin words may be less readable thantexts with a low incidence (Corsor, L982: in Richardson &I-ock, 1993). On the other hand, R.C. Anderson and A.Davison (1988) conclude that the presence of longsentences and cornplex words in a text should reflect or becorrelated with the complexity of the subject maffer, butthey should not directly make a text difficult. Earlierresearch (Mikk, Elts 1993) has demonstrated that there isa high correlation between the word length and thesernantic characteristics of the word.

The purpose of this paper is to establish how theword length and its semantic characteristics are related.Special affention is paid to abstract terms, whereabstractness and terminologicalrty were of maximal values.

116

trVlaterial and method

The study is an analysis of 48 Russian texts of about1900 character spaces each taken from popular-scientificbooks on biology. Only one or two texts were picked atrandom from each book to make the sample more variedand representative.

All the texts were computer-analysed. The textswere entered into a computer and analysed by the programsof morphological analysis worked out in Kiev byN.P.Dartschuk and her colleagues (Automatization. ..,1984). Then T.Tanman determined the incidence of everyanalysed word in Russian on the basis of Russian wordincidence lists elaborated in the Moscow University byBuchstab and colleagues. The number of words in theparagraphs and the ntrmber of leffers in the words werealso established.

The degree of abstractness of every noun in the textand the role of terms were measured. The substantivalabstractness of nouns was measured by grading the nounsin the text as follows (Mikk, 1974):1 - concrete nouns designating things directly perceptible

by senses (e.9. man, stone);2 - nouns designating phenomena and processes perceptible

by senses (e.9. rain, light);3 abstract nouns designating objects and notions

imperceptible directly by senses (e.9. evolution,cell)

To measure the terminologicality of texts the nouns weregrouped into three classes (Elts, 1992):1 - everyday nouns that are not terms (e.g. plant, winter);2 substantival terms of everyday meaning (e.9.

reproduction,: biotechnology);

LL7

3 - tenns that are not used in everyday speech, substantivalprofessionalisms (e.9. replication, DNA).The experimental determination of the effectivity of

the texts was carried our at 2 Russian secondary schools inEstonia. The expefiment was carried out at schools by

biology teachers, who had been previotrsly instnrcted how

they should conduct the experiment.The students were asked to study the text

independently for 15-20 minutes. Then they filled in a

questionnaire and answered a set of open-end questions to

measure th€ text difficulty. The questionnaire meastred the

students' estimates of the difficulty and interest levels ofthe text. The students also provided inforrnation of the level

of familiarity of the subject matter. They estimated how

wett they coutd understand and remember the text. The

students were also asked if the words in the text were

familiar, if the extract was interesting enough to stimulate

them to read the whole text, and if the time glve to them to

study the text was sufficient for the puq)ose.

Resrrlts

The total ntrmber of different nouns in the biology

texts was 1356. The word length was within the range of2-20 letters. Most words were shorter than ten leters

(Figure 1). The words of 14 and more leffers constinrted

only 3.4 per cent of the total ntrmber of nouns in the texts.

The mean tength of nouns was 7.5 letters.

The correlation between the abstractness and the

terminologicality of nouns was unexpectedly low only

O.3'1-. It is true that there are many abstract words ineveryclay speech that are not scientific terms (e.g. dream,

idea), and, on the other hand, biologists use many concrete

scientific terms (e.g. stamen, rose-chafer). Consequently,

118

oo)ct+-,coor-oL

2 5 I 11 14 17 20Word length in letters

Fig. L. The distribution of the noun length in the studiedtexts (atl nouns : 100 %).

the word abstractness and its terminologicality appeared tobe independent characteristics, at least to some extent.

The comparison of nouns of different levels ofsemantic complexity (Fig. 2 and 3) reveals that on the

average words of semantic complexity tend to be longerthan words of simpler semantic complextty.

The relationship betrreen the word length and itslevel of abstractness was:

A : l.l'l + 0.09*w; r:0.96

where: A - abstractness,W word length in letters.

The relationship between the word length and its average

119

oCDq,{..goof-

oTL

oo)6+,trooo(L

2 5 woi tensth l,] ,"o",, 'o 17 N

Fig . 2. The percentage of words of different abstractness

level in each word length group (the total numhr of words

of the tength - 100 %).

Word length in letterj

Fig. 3. The percentage of words of different

terminologicality level in each word tength group (the total

ntnnber of words of the length - 100 %).

120

terminologicality was:

T - 0.89 + 0.0M*W; r:0.86

where: T - terminologicality,W word length in letters.

First of the formulas prognosticates the reliableabstractness up to V2 per cent, the second up to 74 per cent

of the trustworthy terminologicality.What is the role of abstract terms in the build-up of

the text difficulty? While comparing the different semanticlevels of nouns, the average length of abstract termsexceeded the word length in all the other semantic levels(Fig. 4). It is of interest that all words longer than 18

letters in these texts were abstract words (Fig. 5).

po,b.oso-co,co;4

L-

o

=

123abstract-

ness

123terminotogi-

calityabstract

terms

Figurewords

I6,:

4. Comparison of the average noun length of the

in the different semantic value groups.

T2L

oCDq,+.troof-

oL

2 s wo,: tensrh lr],"n",, 'o 17 n

Fig. 5. The percentage of abstract terms in the noun groupsof different length.

All the different semantic groups are distributed toform distinct peaks on the word lenglh scale. The averageword lengths of these groups difter significantly with theexception of the groups of high abstractness and highterminologicality levels (Tabte L).

Table 1

Student T-test for the word length for words ofdifferent abstractness (A) and terminologicality (T) (*-p:0.5, **- P:0.1, 't€*{'- p:0.001).

A'1 AO A3 T1 Y2 T3A2 6.6:r**A3 8.7*** 3.1**Tl 5.8*,r,F 3.2** 7.8***T2 2.4* 2.2* 4.6*** 0.5T3 6.9***s 3.0** 0.8 4.1{s{€* 4.7***A3T3 6.5*** 3.6*{c* L .9 4.1,r** 5. 1{.** 1 .3

t22

Talile 2.

Correlations benreen the text effectivity and thecharacrcristics of semantic complexity. The coefficients areprinted without mintrses, zeros and points. Rl totalntrmber of abstract terms (AT) in the text, R2 - number ofdifferent AT, A mean abstractness of nouns, T meatiterminolqgicality of nouns, PA - percentage of abstractwords, PT - percentage of tems.

From the facts above it can be ssumed that the morecornplicated the semantic characteristics of nounabstractness and terminologicality are the more difficult thetext are to comprehend. Consequently abstract terms (AT)should most markedly hamper text comprehension. And yetthe correlations between the semantic characteristics andthe learning effectivity do not support the hlpothesis (Table2). It does.not mean, however, that the influence of ATcan be neglected. The correlation benueen the result offinal test and the number of different abstract terms in thetext was -0.69. Nonnally only a certain portion of nouns

Characrcristic R1 R2 A T PA PT

Text difficulty levelTex inrcr,est levelText familiarity levelText comprehnsibilityEasc of rcxt memorizationFamiliar wordsSubject difficulty levelSubject interest levelSnfficiercy of learniqg timeSubjct familiarity levelClore testFinal test

7774N77737676724548&65

797642797681

797347496869

6361

3462656363573436526

69&3369637268&6456256

73704L7L

737372673949u7t

767L

M767L

77757T

495467&

t6* r23

are AT in scientific texts. This must have been ofiinportance in the experiment too, as the average nurnberof different AT per text was only 3.5 nouns (rnaximum was15, STD- 3.2). On the other hano, it is to be rememberedthat in some texts there were more AT than the average.For example, in one text there were 31 AT (includingrepeated AT) and they fonned 35 per cent of the totalnumber of the nouns in the text! On the average the ATsoccurred only once in the text of about 250 words (therepetition index was 1.1).

Discussion

School teachers know well that students do not liketo use complex words while answering the teachers'questions or writing tests. They especially avoid longwords, to read and remember. Our experimental materialsuggests that the word lenglh and its semantic complexityhave a simulhneous effect on text comprehension as theyare highly correlated. which of the two factorspredominates in the process remains to be established.

The readability formula for biology texts has thetwo factors as its components (Elts, lgg}). The wordteqgth measures a quantitative complexity of nouns and thesemgntic complexity (abstractness + tenninologicality)measures the qualitative complexrty of nouns. Theunderstanding of texts is influenced by these quantitativeand qualitative complexities. As shown above, it is possibleto assess the quantitative complexity of texts by measuringthe length of the words and sentences in it. To assess thereadability of texts more precisely their qualitativecomplexities are to be measured. The determination of thesimplest way of measuring text complexity needs furtherresearch.

124

References

Automatizstion of the analysis of scientilic text. Kiev:Naukova dtrnka, 1984. 258 p. (in Russian)

fuiderson R.C., Davison A. 1988. Concepnral andempirical bases of readabihty formulas. fn: A.Davison and G.M. Green (eds.). Linguisticcomplexity and text comprehension. Hillsdale, NJ,Erlbatrm, pp. 23-54.

Corson D.J. 1982. The Graeco-Latin instrument: a newmeasure of semantic complexity in oral and wriffenEnglish. L,anguage and Speech, Vol . 25, No L,pp. 1-L0.

Elts J. lW2. A readability formula for texts on biology. -Psychologicat problems of reading. Theses of papersfor the international scientific conference, Vilnius,6-7 May. Vilnius: 42-M.

Evans J.D . L973. Towards a theory of technicalcommunication. School Science Review 55 (191),pp. 233-241.

Merrlrn 1987. The language of school science. InternationalJournal of Science Educatioo, Vol . 9, No 4, pp.483-489.

Mikk J. 1974. Metodika ruzrfiotki formul citabelnosti(Methods of elaborating of readability formulae). -Sovetskaja pedagogika i skola (Soviet pedagogy andschool) 9,78-163. (in Russian)

Mikk J., Elts J. L993. Cornparison of text on familiar orunfamiliar subject maffer. Hrebicek L., AltmannG. (eds). Quantitative text analysis. Trier: WVTWissenschaftlicher Verlag Trier, 1993. 223-233 p.

Richardson J., [-ock R. L993. The readability of selectedA-level biology exafirination papers. Journal ofBiological Education, Vol . 27, No 3, pp. 205-212.

125

Thereadgold R. 1982. The problern of register in selectedCSE examinations across the curriculum. Readitrg,Vol . 16, No. 3, pp. 169-L19.

126

Documents

Word length and its semantic complexity