Measuring the Organizational Aspects of Writing Ability

Measuring the Organizational Aspects of Writing AbilityAuthor(s): Stephen L. Benton and Kenneth A. KiewraSource: Journal of Educational Measurement, Vol. 23, No. 4 (Winter, 1986), pp. 377-386Published by: National Council on Measurement in EducationStable URL: http://www.jstor.org/stable/1434556 .

Accessed: 28/05/2014 14:56

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

National Council on Measurement in Education is collaborating with JSTOR to digitize, preserve and extendaccess to Journal of Educational Measurement.

http://www.jstor.org

This content downloaded from 178.40.151.43 on Wed, 28 May 2014 14:56:14 PMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=ncme

http://www.jstor.org/stable/1434556?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


JOURNAL OF EDUCATIONAL MEASUREMENT VOLUME 23, NO. 4, WINTER 1986, pp. 377-386

MEASURING THE ORGANIZATIONAL ASPECTS OF WRITING ABILITY

STEPHEN L. BENTON Kansas State University

and KENNETH A. KIEWRA

Utah State University

The present study assessed the relationship among holistic writing ability, the Test of Standard Written English (TSWE), and the following tests of organizational ability: anagram solving, word reordering, sentence reordering, and paragraph assembly. Based upon a sample of 105 undergraduate students, the main findings were that writing ability, as measured by the holistic method of scoring, was significantly correlated with performance on the TSWE and the four tests of organizational ability. A composite score on allfour organizational tests was found to have the highest zero-order correlation with the measure of writing ability. A stepwise regression analysis, with the measure of writing ability as the criterion, also indicated that the composite score explained a significant proportion of the variance beyond that explained by the TSWE. The results are discussed in terms of the Kintsch and van Dijk model of strategic discourse processing, which suggests that different organizational strategies operate at the levels of words, sentences, and paragraphs. It is concluded that tests assessing organizational strategies ought to be included in assessments of writing ability.

Teaching students to write well is one of the major goals of education. There are, however, many views of what effective writing is and, consequently, valid measures of writing ability are not yet well-defined.

Measures of writing ability fall into two broad categories: multiple-choice tests and writing samples. Multiple-choice tests usually require the examinee to locate errors or to choose appropriate phrasing based upon decisions related to grammar, usage, diction, and idiom. These tests have been criticized by linguists and educators. For example, the Conference on College Composition and Communica- tion declared that multiple-choice measures of writing are narrowly focused and scores on such tests are gross distortions of writing competence (Troyka, 1982). Nonetheless, these kinds of tests continue to be widely used.

Although multiple-choice tests of writing ability may be highly reliable, they lack construct validity because they do not require examinees to plan and order ideas and generate prose. Consequently, some have argued on behalf of tests that require

Preparation of this article was supported, in part, by a Faculty Research Awards Committee Grant from Kansas State University.

377



BENTON AND KIEWRA

actual writing samples. Numerous problems, however, are also associated with writing samples (Lloyd-Jones, 1982). The measurements provided by writing samples can be unreliable because the scoring of writing samples is subject to various effects that attenuate reliability, including context effects, which occur when essays are rated lower if preceded by good quality essays and higher if preceded by poor quality essays (Hales & Tokar, 1975). In addition, an individual's writing score may vary according to the nature of the writing topic and the idiosyncracies of the essay rater. Consequently, multiple assessments, topics, and raters are usually recommended.

The challenge faced by measurement specialists is that of devising valid and reliable standardized measures of writing skills. An information-processing approach to the assessment of writing ability suggests a way of meeting this challenge. This approach is focused on the component processes underlying writing, processes that distinguish good from poor writers. The notion of creating tests that "capture functional dimensions of developing competence or expertise" contrasts with the more standard approach of building upon curriculum objectives that may be vague with regard to the development of expertise (Messick, 1984, p. 225).

Applications of the information-processing approach to the assessment of writing ability have revealed differences between good and poor writers in organizational ability. Specifically, good writers perform more effectively on tests involving anagram solving; word reordering within scrambled sentences; sentence reordering within scrambled paragraphs; and paragraph assembly, that is, the grouping of sentences into paragraphs (Benton, 1983; Benton, Kraft, Glover, & Plake, 1984). Good writers outperformed poor writers on these tests in both high school and college samples, when reading comprehension, reading speed, general knowledge, verbal ability, and achievement were controlled (Benton et al.). Such findings support models of writing that acknowledge the role of complex organizational skills in writing (Collins & Gentner, 1980; Hayes & Flower, 1980), and speak to the relevance of the information-processing approach to writing assessment.

A precedent for employing such measures of writing ability was set by Godshalk, Swineford, and Coffman (1966), who correlated a paragraph organization test, similar to the sentence reordering test used in the present study, with writing ability, holistically measured. Examinees in the Godshalk et al. study were required to reassemble several scrambled sentences into a coherent paragraph. It was found that the paragraph organization test was significantly related to writing ability. The zero-order correlation between these variables was, however, lower than correlations between writing ability and multiple-choice measures of usage, sentence correction, and error recognition. For this reason, the authors concluded that the paragraph organization test has limited validity as a measure of writing ability.

Such a conclusion, although warranted, should not invalidate further attempts to assess organizational ability in writing. It is notable that Godshalk et al. (1966) assessed organizational ability only at the paragraph level. Models of strategic discourse processing, however, posit that comprehension and production strategies operate at several distinct levels of discourse: (a) propositional strategies, which operate at the levels of words, clauses, and sentences; (b) local coherence strategies,

378



ORGANIZATIONAL ASPECTS

which connect successive sentences; and (c) macrostrategies, which operate at the paragraph and overall text levels (e.g., van Dijk & Kintsch, 1983). Although their model was originally developed for discourse comprehension analysis, van Dijk and Kintsch argued that the basic mappings between surface structure expressions and semantic representations are the same for both comprehension and production. The model, therefore, seems appropriate for analyzing strategies employed during the writing process.

In the present study, several organizational tests were used to assess discourse production strategies at various levels: words, sentences, paragraphs, and overall text. More specifically, the study was an investigation of the validity of these tests in assessing holistic impressions of writing ability. Also, the study provided an assessment of the amount of variance in a holistic measure of writing ability that is explained by these tests.

METHOD

Subjects One hundred five undergraduate students enrolled in an educational psychology

course volunteered to participate in three activities for course credit.

Procedures

The students completed three activities on each of two occasions. On the first occasion, the participants wrote two essays, each within 10 minutes, on the following topics: "Describe an event that has had an impact on your development" and "If you had your adolescence to live over again, what would you change?" All essays were scored by two independent raters using the 6-point holistic scoring scale developed by Breland and Gaynor (1979). The interrater reliability of scoring (Pearson r) was 0.83. The parallel forms estimate of the measurement of writing ability- correlation over examinees of their two essay scores-was 0.80. On the second occasion, 2 days after the first, the students returned and completed first the TSWE of the College Board (1983) in the required 30 minutes, and then the four organizational tests.

The four organizational tests were contained in a single booklet. The anagram- solving test consisted of twelve 5-letter anagrams selected from The Teacher's Word Book of 30,000 Words (Thorndike & Lorge, 1959). Four anagrams were derived from the list of words occurring at least once per 1,000,000 words, four from the list of words occurring at least once per 4,000,000 words, and four from the list of words that appeared 1,000 or more times in either the Lorge Magazine Count or the Lorge-Thorndike Semantic Count. The words were scrambled by randomly arranging the order of the letters. The 12 anagrams were printed on the same page of the test booklet, and were ordered according to increasing response latency as determined in previous research (Benton, 1983). Students were directed to unscramble each word as rapidly as they could, and to write the correct version of each word in the space provided. In some cases, more than one answer was correct; therefore subjects were informed that only one solution was necessary. Subjects

379



BENTON AND KIEWRA

were then told they had 4 1/2 minutes to solve all 12 anagrams.' An example with a solution, provided at the top of the page (and shown here), was then read by the

experimenter:

Anagram: EUASB Answer: ABUSE.

The materials for the word reordering test were six 10-word sentences taken from an educational psychology textbook unfamiliar to the students. These words were scrambled by arranging the words randomly in each sentence. The six sentences were then printed on one page of the text booklet. The sentences were ordered

according to response latency as determined in previous research (Benton et al., 1984). Students were directed to unscramble each sentence as rapidly as they could, and to write the correct version of each sentence in the space provided. In some cases, more than one answer was correct; therefore students were informed that only one answer was necessary. Subjects were then told that they had 11 minutes to unscramble all six sentences. The following example with a solution, provided at the

top of the page, was then read by the experimenter.

Scrambled sentence: was urged or no come near forced to even one. Correct version: No one was forced or even urged to come near.

The sentence reordering test employed six intact paragraphs (drawn from the source of items for the word reordering test); each paragraph contained from five to nine sentences. The six paragraphs contained sentences that were organized chronologically, that is, each paragraph described an event from beginning to end, sentence by sentence. Each scrambled paragraph appeared on a separate page of the test booklet. The paragraphs were ordered by response latency as established in

previous research (Benton et al., 1984). The sentences of each paragraph were

randomly ordered, with single spacing between the lines in the same sentences and double spacing between the sentences. Students were directed to order the sentences

chronologically by placing the correct order number for events in the blanks

alongside each sentence. Occasionally, there was more than one correct solution; therefore it was specified that only one solution was necessary. They were informed that when they finished one paragraph, they should go on to the next page until they reached a place that said "stop." Subjects were then told they had 10 minutes to complete all six paragraphs. The following example with a solution, provided at the

'A pilot study with 28 undergraduates not involved in the current study was conducted to ascertain the best time limit for each of the four tasks. Originally, an arbitrary decision was made to use a time limit

corresponding to the completion time of the 70th percentile from previous response latency data (Benton, 1983; Benton et al., 1984) for each of the four tasks. Under this constraint, a ceiling effect was observed in the pilot study across all four tasks. Consequently, a new criterion from the previous response latency data (the median) was selected, thereby resulting in the following time constraints: (a) anagram solving, 4 1/2 minutes; (b) word reordering, 11 minutes; (c) sentence reordering, 10 minutes; and (d) paragraph assembly, 8 minutes.

380




top of the page, was then read by the experimenter:

8 Subsequently, each day that Hugh did a better job of putting the food in his mouth instead of elsewhere I rewarded him with peaches. 7 Hugh received no peaches. 1 Hugh had a great fondness for peaches. 3 I showed him the peaches he could expect and pointed out that he should put the food in his mouth, not on the floor. 5 I gave him the peaches. 2 I told him that he could have peaches for dessert if he did not mess his food up so much. 4 He did better, although liberal amounts of food still fell on the floor. 6 The next day Hugh was in an exuberant mood and scattered his vegetables far and

wide. 9 He improved rapidly and was eventually willing to substitute other fruits for his

reward.

The paragraph assembly test employed six sets of three intact paragraphs taken from an essay by Bruning (1968). In the original essay, each paragraph contained one topic sentence and three subordinate sentences. For the current study, each set of three paragraphs appeared on a separate page of the test booklet. For each set of three paragraphs, the 12 sentences were listed in random order. The six sets of paragraphs were ordered by response latency as determined in previous research (Benton et al., 1984). Students were directed to group the sentences correctly on each page into three 4-sentence paragraphs by placing a letter (A, B, or C) in the blank before each sentence. (Unlike the sentence reordering test, the order of the sentences within a paragraph was irrelevant.) Students were further informed that when they finished one set of paragraphs they should go on to the next page until they reached a place that said "stop." They were then told they had 8 minutes to complete all six sets of three paragraphs. The following example with a solution, presented on the first page, was then read by the experimenter:

B There are only 450 miles of paved roads in Mala. C The only non-military high official in Mala is the premier. A Aluminum mining has been especially productive for the northern region. A The economy of Northern Mala is based on mining. B There is only one telephone for every 15,000 inhabitants of Mala. C The cabinet of the premier must be approved by a panel of military officers. A About two-thirds of the work force in the north are involved in mining. C The government of Mala can be classified as a military dictatorship. B There are only 300 miles of railways in the entire country. A Mining of all types provides about 80 percent of the income in the northern region. B Mala's communication system would probably rank as the worst of all African

nations. C Whoever controls the Malan army controls the country of Mala.

The anagram-solving, word-reordering, sentence-reordering, and paragraph- assembly tests were pilot tested with undergraduates not otherwise connected with the current study. The test-retest reliabilities were .85, .86, .90, and .87, respectively.

381



BENTON AND KIEWRA

Examinee responses to the anagram and paragraph assembly tests were scored correct or incorrect. Scores on the word-reordering and sentence-reordering tests were the total number of relative position errors, that is, the total number of misplaced words or sentences across all six word-reordering or sentence-reordering items. The TSWE was scored according to the formula number of correct answers minus one-fourth the number of incorrect answers.

As a check on the degree to which the sample in this study conformed to the TSWE norming sample, TSWE raw scores were converted to TSWE standard scores, which range from 20 to 60 in the 1984 norms. The mean for the norming sample was 43; the standard deviation was 11.0. For the present sample the mean was 44.06, the standard deviation 8.66, and the measure of skewness -.51.

RESULTS

Means and standard deviations for raw scores on all variables are presented in Table 1. The intercorrelations among the six predictors and the criterion are presented in Table 2. The negative correlations reveal that performance on the TSWE and writing improved as errors on the four tests of organizational ability declined. Specifically, significant correlations were observed between holistic writing ability and all six predictors.

Table 1

Means and Standard Deviations for Predictor and

Criterion Measures

Test Maximum M SD

Score

Possible

Anagram solving 12 3.28 2.12

Word reordering 60 13.59 10.48

Sentence reordering 42 12.76 6.59

Paragraph assembly 72 21.50 17.53

Composite score 186 51.12 28.39

TSWE 50 30.02 8.86

Holistic writing 6 3.85 1.06

382




Table 2

Intercorrelations Among Predictor and Criterion

Variables (n = 105)

Task 2 3 4 5 6 7

1. Holistic

writing -.37 -.46 -.21 -.40 -.49 .47

2. Anagram

solving .51 .20 .26 .47 -.47

3. Word

reordering .44 .43 .77 -.59

4. Sentence

reordering .33 .61 -.43

5. Paragraph

assembly .87 -.39

6. Composite -.59

7. TSWE

Note. All coefficients are sufficiently large to reject Ho: p =

0.00 given a = 0.05.

A stepwise regression, with writing ability as the criterion, was then performed forcing the TSWE to enter on the first step and the composite score on the second step. This was done in an attempt to assess the variance explained by the organizational tests beyond that explained by the TSWE. The TSWE explained 22% of the variance in the criterion, F(1, 103) = 29.13, p < .0001; the composite score explained an additional 7% of variance, F(1, 102) = 10.13, p < .002. This combination of predictors explained 29% of the variance in the writing scores (R = .54).

DISCUSSION

The results of the present study support the conclusion that performance on tests that primarily assess organizational ability is related to writing ability. Specifically,

383



BENTON AND KIEWRA

the composite organizational score explained a significant proportion of the variance in writing ability beyond that explained by the TSWE. More importantly, the composite score was associated with the highest zero-order correlation with the criterion.

The current study employed measures of organizational ability at various levels of discourse production, whereas previous research had focused on organization only at the paragraph level. Godshalk et al.'s (1966) finding that a paragraph- unscrambling test lacked concurrent validity was perhaps confirmed in that a similar test used in the present study (sentence-reordering) also produced a relatively low, although significant, zero-order correlation with the criterion. It appears, then, that assessment at the lexical level (anagram solving), the sentence level (word reordering), and the text level (paragraph assembly) may be necessary for measuring organizational ability, because these measures were most significantly correlated with the criterion.

The organizational tests used in the present study assess strategies at the various levels of discourse production. The anagram-solving and the word-reordering tests are intended to tap the propositional strategies that are employed in word generation and clause formation. One propositional strategy that may be assessed by the anagram test is that of generating alternative meanings. The propositional strategies of semantic interpretation and contextual cueing might be tapped in the word-reordering test, which measures the writer's abilities to detect clause boundaries and to generate sentences. It has been suggested that writers merely activate phrase structure rules or transformational rules in performing such tasks (Miller & McKean, 1964). Alternatively, it can be argued that operations at this level of discourse require sentence parsing strategies that help the writer decide which words are linked together in a phrase or a clause. Consequently, the significant relationship observed between writing ability and performance on the word-reordering test may be due, at least in part, to the sentence parsing strategies employed in both.

The sentence-reordering test, on the other hand, would seem to require the use of local coherence strategies, which are involved in connecting successive sentences. The establishment of local coherence is strategic, because it is dependent upon context and is not merely dependent upon rules that define conditions for coherence. The writer, for example, produces coherent text by searching both memory and the text generated thus far for related facts and potential links. Hypotheses about coherent links, in fact, are made as the propositions themselves are formed, even though coherence is not established until several sentences have been written. To write coherent text, the writer must connect sentences in a manner that is sequential or hierarchical.

Finally, the paragraph assembly test is intended to assess macrostrategies that facilitate semantic inference. Semantic inference is influenced by prior knowledge, and different readers will derive unique inferences from the same text. Knowing this, writers must attempt to constrain this kind of personal variation in interpretation. They do so through textual signaling of the main theme or topic, such that sentences within the text share similar ideas. It would seem imperative, then, that good writers be able to differentiate between closely related concepts, such that only

384




similar ideas are grouped together, a strategy also employed in the paragraph assembly test.

Overall, the current results extend the study of individual differences in information processing to the assessment of writing ability. It appears that tests can be designed to assess organizational ability at the levels of propositional strategies, local coherence strategies, and macrostrategies. Developing test items that capture such dimensions of expertise in writing may prove helpful in producing more valid measures of organizational ability in writing. This approach to measurement is

closely allied with the long-standing specificity doctrine, which held that individual differences are found in specific knowledge and skills acquired through learning. From this perspective, then, it would appear that organizational skills can be

specifically diagnosed.

REFERENCES

BENTON, S. L. (1983). Cognitive predictors of writing ability. Unpublished doctoral dissertation, University of Nebraska, Lincoln.

BENTON, S. L., KRAFT, R. G., GLOVER, J. A., & PLAKE, B. S. (1984). Cognitive capacity differences among writers. Journal of Educational Psychology, 76, 820-834.

BRELAND, H. M., & GAYNOR, J. L. (1979). A comparison of direct and indirect assessments of writing skill. Journal of Educational Measurement, 16, 119-127.

BRUNING, R. H. (1968). Effects of review and test-like events within the learning of prose materials. Journal of Educational Psychology, 59, 16-19.

College Board. (1983). The test of-standard written English. Princeton, NJ: Author. COLLINS, A., & GENTNER, D. (1980). A framework for a cognitive theory of writing. In

L. W. Gregg & E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 51-72). Hillsdale, NJ: Erlbaum.

GODSHALK, F. I., SWINEFORD, F., & COFFMAN, W. E. (1966). The measurement of writing ability. New York: College Entrance Examination Board.

HALES, L. W., & TOKAR, E. (1975). The effect of the quality of preceding responses on the grades assigned to subsequent responses to an essay question. Journal of Educational Measurement, 12, 115-117.

HAYES, J. R., & FLOWER, L. S. (1980). Identifying the organization of writing processes. In W. Gregg & E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 3-30). Hillsdale, NJ: Erlbaum.

LLOYD-JONES, R. (1982). Skepticism about test scores. In K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Notes from the National Testing Network in Writing, October. New York: City University of New York, Instructional Resource Center.

MESSICK, S. (1984). The psychology of educational measurement. Journal of Educational Measurement, 21, 215-237.

MILLER, G. A., & McKEAN, K. 0. (1964). A chronometric study of some relations between sentences. Quarterly Journal of Experimental Psychology, 267-308.

THORNDIKE, E. L., & LORGE, I. (1959). The teachers'word book of 30,000 words. New York: Columbia University, Bureau of Publications.

TROYKA, L. Q. (1982). Looking back and moving forward. In K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Notes from the National Testing Network in Writing, October. New York: City University of New York, Instructional Resource Center.

VAN DIJK, T. A., & KINTSCH, W. (1983). Strategies of discourse comprehension. New York: Academic Press.

385



386 BENTON AND KIEWRA

AUTHORS

STEPHEN L. BENTON, Assistant Professor, Department of Administration and Founda- tions, College of Education, Kansas State University, Manhattan, KS 66506. Degrees: BA, MA, PhD, University of Nebraska. Specialization: Educational psychology.

KENNETH A. KIEWRA, Associate Professor, Department of Psychology, Utah State University, Logan, UT 84322-2810. Degree: PhD, Florida State University. Specializa- tion: Cognitive aspects of autonomous learning.



Documents

Measuring the Organizational Aspects of Writing Ability