Upload
ut-ee
View
0
Download
0
Embed Size (px)
Citation preview
WORD LENGTH AI{DITS SEMANTIC COMPLEXITY
!
Janrun Elts
Tlre relatioruhip between the word semantic complexiry
and its length is studied on 1356 nouns from Rwsian popular-
scientific biology texts. The correlation between the word length
aild its abstractness w(N 0.!re, between the word length and its
terminological@ w(N 0.86. Tlre mean length of nouns is a simple
characteristic oI their semantic compluity.
It is of vital importance that the readability of any
written material should be rnatched to the reading abilities
of the students to whom it will be present (Richardson,
[-ock, 1993), One of the important factors influencing on
readability ,is semantic complexity of text. Teachers ofbiology krtow scientific terms and present them to their
students in the due course of teaching. But there is a
danger of overemphasizrng terminology in speech and inwritten text so that it may become a teaching gol in its own
rights.Although technical language may have a vital role
to play in science, students often find it difficult. It may be
explained by the langrrage register of the subjects taught at
school, including science, being impersonal. The terms intextbooks do not belong to students' everyday vocabulary.
As this terminology is alien to the student, it can act as a
barrier to learning (Richardson, [-ock, 1993).
If students, reading a text, misunderstand it they
cannot resort to rephrasing or seek clarification the way ithappens in a face-to-face sinntion. Students may thfutk that
they understand a word that they do not, or may be trnsure
l5* 115
of the meaning of a word that they actually understandwell. Questions fornn a specific kind of school texts.R.Thereadgold ( 1982) mentions that examination questionsare usually written in a formal style with frequent u$e ofpassive impersonal and imperative verhs. So, in addition tothe problem caused by lorry and abstract words or terms theformal syntax has also an imporhnt influence on the levelof text comprehension.
Many scientific terrns are derived from Greek andLatin words. These tenns, for example polymorphonucleargranulocyte, ffioy seern long and difficult, and students,therefore, may see the concepts that they represent as
irrelevant (Evans , 1973, Metzyn, 1987). The percentincidence of Graeco-I-atin words seems to correlate withthe reading ages of texts. This may be because
Graeco-Iatin words are often used to represent moreabstract ideas than words of native origin, ffid so theirincidence gives an indication of the semantic complexity oftexts. Another reason for using Graeco-I-atin words in textsis that they have a high status and make the writing appearmore academic. For both of the reasons texts with a highincidence of Graeco-l,atin words may be less readable thantexts with a low incidence (Corsor, L982: in Richardson &I-ock, 1993). On the other hand, R.C. Anderson and A.Davison (1988) conclude that the presence of longsentences and cornplex words in a text should reflect or becorrelated with the complexity of the subject maffer, butthey should not directly make a text difficult. Earlierresearch (Mikk, Elts 1993) has demonstrated that there isa high correlation between the word length and thesernantic characteristics of the word.
The purpose of this paper is to establish how theword length and its semantic characteristics are related.Special affention is paid to abstract terms, whereabstractness and terminologicalrty were of maximal values.
116
trVlaterial and method
The study is an analysis of 48 Russian texts of about1900 character spaces each taken from popular-scientificbooks on biology. Only one or two texts were picked atrandom from each book to make the sample more variedand representative.
All the texts were computer-analysed. The textswere entered into a computer and analysed by the programsof morphological analysis worked out in Kiev byN.P.Dartschuk and her colleagues (Automatization. ..,1984). Then T.Tanman determined the incidence of everyanalysed word in Russian on the basis of Russian wordincidence lists elaborated in the Moscow University byBuchstab and colleagues. The number of words in theparagraphs and the ntrmber of leffers in the words werealso established.
The degree of abstractness of every noun in the textand the role of terms were measured. The substantivalabstractness of nouns was measured by grading the nounsin the text as follows (Mikk, 1974):1 - concrete nouns designating things directly perceptible
by senses (e.9. man, stone);2 - nouns designating phenomena and processes perceptible
by senses (e.9. rain, light);3 abstract nouns designating objects and notions
imperceptible directly by senses (e.9. evolution,cell)
To measure the terminologicality of texts the nouns weregrouped into three classes (Elts, 1992):1 - everyday nouns that are not terms (e.g. plant, winter);2 substantival terms of everyday meaning (e.9.
reproduction,: biotechnology);
LL7
3 - tenns that are not used in everyday speech, substantivalprofessionalisms (e.9. replication, DNA).The experimental determination of the effectivity of
the texts was carried our at 2 Russian secondary schools inEstonia. The expefiment was carried out at schools by
biology teachers, who had been previotrsly instnrcted how
they should conduct the experiment.The students were asked to study the text
independently for 15-20 minutes. Then they filled in a
questionnaire and answered a set of open-end questions to
measure th€ text difficulty. The questionnaire meastred the
students' estimates of the difficulty and interest levels ofthe text. The students also provided inforrnation of the level
of familiarity of the subject matter. They estimated how
wett they coutd understand and remember the text. The
students were also asked if the words in the text were
familiar, if the extract was interesting enough to stimulate
them to read the whole text, and if the time glve to them to
study the text was sufficient for the puq)ose.
Resrrlts
The total ntrmber of different nouns in the biology
texts was 1356. The word length was within the range of2-20 letters. Most words were shorter than ten leters
(Figure 1). The words of 14 and more leffers constinrted
only 3.4 per cent of the total ntrmber of nouns in the texts.
The mean tength of nouns was 7.5 letters.
The correlation between the abstractness and the
terminologicality of nouns was unexpectedly low only
O.3'1-. It is true that there are many abstract words ineveryclay speech that are not scientific terms (e.g. dream,
idea), and, on the other hand, biologists use many concrete
scientific terms (e.g. stamen, rose-chafer). Consequently,
118
oo)ct+-,coor-oL
2 5 I 11 14 17 20Word length in letters
Fig. L. The distribution of the noun length in the studiedtexts (atl nouns : 100 %).
the word abstractness and its terminologicality appeared tobe independent characteristics, at least to some extent.
The comparison of nouns of different levels ofsemantic complexity (Fig. 2 and 3) reveals that on the
average words of semantic complexity tend to be longerthan words of simpler semantic complextty.
The relationship betrreen the word length and itslevel of abstractness was:
A : l.l'l + 0.09*w; r:0.96
where: A - abstractness,W word length in letters.
The relationship between the word length and its average
119
oCDq,{..goof-
oTL
oo)6+,trooo(L
2 5 woi tensth l,] ,"o",, 'o 17 N
Fig . 2. The percentage of words of different abstractness
level in each word length group (the total numhr of words
of the tength - 100 %).
Word length in letterj
Fig. 3. The percentage of words of different
terminologicality level in each word tength group (the total
ntnnber of words of the length - 100 %).
120
terminologicality was:
T - 0.89 + 0.0M*W; r:0.86
where: T - terminologicality,W word length in letters.
First of the formulas prognosticates the reliableabstractness up to V2 per cent, the second up to 74 per cent
of the trustworthy terminologicality.What is the role of abstract terms in the build-up of
the text difficulty? While comparing the different semanticlevels of nouns, the average length of abstract termsexceeded the word length in all the other semantic levels(Fig. 4). It is of interest that all words longer than 18
letters in these texts were abstract words (Fig. 5).
po,b.oso-co,co;4
L-
o
=
123abstract-
ness
123terminotogi-
calityabstract
terms
Figurewords
I6,:
4. Comparison of the average noun length of the
in the different semantic value groups.
T2L
oCDq,+.troof-
oL
2 s wo,: tensrh lr],"n",, 'o 17 n
Fig. 5. The percentage of abstract terms in the noun groupsof different length.
All the different semantic groups are distributed toform distinct peaks on the word lenglh scale. The averageword lengths of these groups difter significantly with theexception of the groups of high abstractness and highterminologicality levels (Tabte L).
Table 1
Student T-test for the word length for words ofdifferent abstractness (A) and terminologicality (T) (*-p:0.5, **- P:0.1, 't€*{'- p:0.001).
A'1 AO A3 T1 Y2 T3A2 6.6:r**A3 8.7*** 3.1**Tl 5.8*,r,F 3.2** 7.8***T2 2.4* 2.2* 4.6*** 0.5T3 6.9***s 3.0** 0.8 4.1{s{€* 4.7***A3T3 6.5*** 3.6*{c* L .9 4.1,r** 5. 1{.** 1 .3
t22
Talile 2.
Correlations benreen the text effectivity and thecharacrcristics of semantic complexity. The coefficients areprinted without mintrses, zeros and points. Rl totalntrmber of abstract terms (AT) in the text, R2 - number ofdifferent AT, A mean abstractness of nouns, T meatiterminolqgicality of nouns, PA - percentage of abstractwords, PT - percentage of tems.
From the facts above it can be ssumed that the morecornplicated the semantic characteristics of nounabstractness and terminologicality are the more difficult thetext are to comprehend. Consequently abstract terms (AT)should most markedly hamper text comprehension. And yetthe correlations between the semantic characteristics andthe learning effectivity do not support the hlpothesis (Table2). It does.not mean, however, that the influence of ATcan be neglected. The correlation benueen the result offinal test and the number of different abstract terms in thetext was -0.69. Nonnally only a certain portion of nouns
Characrcristic R1 R2 A T PA PT
Text difficulty levelTex inrcr,est levelText familiarity levelText comprehnsibilityEasc of rcxt memorizationFamiliar wordsSubject difficulty levelSubject interest levelSnfficiercy of learniqg timeSubjct familiarity levelClore testFinal test
7774N77737676724548&65
797642797681
797347496869
6361
3462656363573436526
69&3369637268&6456256
73704L7L
737372673949u7t
767L
M767L
77757T
495467&
t6* r23
are AT in scientific texts. This must have been ofiinportance in the experiment too, as the average nurnberof different AT per text was only 3.5 nouns (rnaximum was15, STD- 3.2). On the other hano, it is to be rememberedthat in some texts there were more AT than the average.For example, in one text there were 31 AT (includingrepeated AT) and they fonned 35 per cent of the totalnumber of the nouns in the text! On the average the ATsoccurred only once in the text of about 250 words (therepetition index was 1.1).
Discussion
School teachers know well that students do not liketo use complex words while answering the teachers'questions or writing tests. They especially avoid longwords, to read and remember. Our experimental materialsuggests that the word lenglh and its semantic complexityhave a simulhneous effect on text comprehension as theyare highly correlated. which of the two factorspredominates in the process remains to be established.
The readability formula for biology texts has thetwo factors as its components (Elts, lgg}). The wordteqgth measures a quantitative complexity of nouns and thesemgntic complexity (abstractness + tenninologicality)measures the qualitative complexrty of nouns. Theunderstanding of texts is influenced by these quantitativeand qualitative complexities. As shown above, it is possibleto assess the quantitative complexity of texts by measuringthe length of the words and sentences in it. To assess thereadability of texts more precisely their qualitativecomplexities are to be measured. The determination of thesimplest way of measuring text complexity needs furtherresearch.
124
References
Automatizstion of the analysis of scientilic text. Kiev:Naukova dtrnka, 1984. 258 p. (in Russian)
fuiderson R.C., Davison A. 1988. Concepnral andempirical bases of readabihty formulas. fn: A.Davison and G.M. Green (eds.). Linguisticcomplexity and text comprehension. Hillsdale, NJ,Erlbatrm, pp. 23-54.
Corson D.J. 1982. The Graeco-Latin instrument: a newmeasure of semantic complexity in oral and wriffenEnglish. L,anguage and Speech, Vol . 25, No L,pp. 1-L0.
Elts J. lW2. A readability formula for texts on biology. -Psychologicat problems of reading. Theses of papersfor the international scientific conference, Vilnius,6-7 May. Vilnius: 42-M.
Evans J.D . L973. Towards a theory of technicalcommunication. School Science Review 55 (191),pp. 233-241.
Merrlrn 1987. The language of school science. InternationalJournal of Science Educatioo, Vol . 9, No 4, pp.483-489.
Mikk J. 1974. Metodika ruzrfiotki formul citabelnosti(Methods of elaborating of readability formulae). -Sovetskaja pedagogika i skola (Soviet pedagogy andschool) 9,78-163. (in Russian)
Mikk J., Elts J. L993. Cornparison of text on familiar orunfamiliar subject maffer. Hrebicek L., AltmannG. (eds). Quantitative text analysis. Trier: WVTWissenschaftlicher Verlag Trier, 1993. 223-233 p.
Richardson J., [-ock R. L993. The readability of selectedA-level biology exafirination papers. Journal ofBiological Education, Vol . 27, No 3, pp. 205-212.
125