Scaling Writing Ability: A Corpus-Driven Inquiry

http://wcx.sagepub.com/Communication

Written

http://wcx.sagepub.com/content/30/1/3The online version of this article can be found at:

DOI: 10.1177/0741088312466992

2013 30: 3Written CommunicationDylan B. Dryer

Scaling Writing Ability: A Corpus-Driven Inquiry

Published by:

http://www.sagepublications.com

On behalf of:

Annenberg School for Communication and Journalism

can be found at:Written CommunicationAdditional services and information for

http://wcx.sagepub.com/cgi/alertsEmail Alerts:

http://wcx.sagepub.com/subscriptionsSubscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://wcx.sagepub.com/content/30/1/3.refs.htmlCitations:

What is This?

- Jan 10, 2013Version of Record >> at MEMORIAL UNIV OF NEWFOUNDLAND on August 3, 2014wcx.sagepub.comDownloaded from at MEMORIAL UNIV OF NEWFOUNDLAND on August 3, 2014wcx.sagepub.comDownloaded from

http://wcx.sagepub.com/

http://wcx.sagepub.com/content/30/1/3

http://www.sagepublications.com

http://annenberg.usc.edu/

http://wcx.sagepub.com/cgi/alerts

http://wcx.sagepub.com/subscriptions

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsPermissions.nav

http://wcx.sagepub.com/content/30/1/3.refs.html

http://wcx.sagepub.com/content/30/1/3.full.pdf

http://online.sagepub.com/site/sphelp/vorhelp.xhtml



Written Communication30(1) 3 –35

© 2013 SAGE PublicationsReprints and permission:

sagepub.com/journalsPermissions.navDOI: 10.1177/0741088312466992

http://wcx.sagepub.com

466992WCX30110.1177/0741088312466992Written CommunicationDryer© 2013 SAGE Publications

Reprints and permission:sagepub.com/journalsPermissions.nav

1University of Maine, Orono, Maine

Corresponding Author:Dylan B. Dryer, Department of English, University of Maine, 304 Neville Hall, Orono, ME 04401 Email: [email protected]

Scaling Writing Ability: A Corpus-Driven Inquiry

Dylan B. Dryer1

Abstract

This analysis of 83 scoring rubrics and grade definitions from writing pro-grams at U.S. public research universities captures the current state of the struggle to define and measure specific writing traits, and it enables an induc-tion of the underlying theoretical construct of “academic writing” present at these writing programs. Findings suggest that writing specialists have man-aged to permeate U.S. first-year writing assessment with certain progressive assumptions about writing and writing instruction, but they also indicate critical areas for revision, given such documents’ critical gatekeeping role at postsecondary institutions. The study also raises a broader question about the difficulties of rhetorically constructing “writing ability” in a way that is consistent with the contextualist paradigm dominant in contemporary writ-ing studies.

Keywords

writing instruction, English for academic purposes, measurement, genre, uptake, validity, institution

Researchers in writing studies have found writing standards operating in many sites, not just educational institutions: novice social workers internal-izing the rules of case studies (Paré, 1993), new accountants struggling with the conflicting cultural conventions of the client letter (Devitt, 1991), an engineer revising an intern’s work while admitting not knowing why the

Articles

at MEMORIAL UNIV OF NEWFOUNDLAND on August 3, 2014wcx.sagepub.comDownloaded from


4 Written Communication 30(1)

standard was the way it was (Winsor, 2001, p. 18), citizens protesting a neighborhood development compelled to take up a genre system that changes the very frame of their protest (Turner, 2002, p. 300), among many others. Still, it is in educational institutions that most readers and writers first encoun-ter a version of their language explicitly presented as official and legitimate (Bourdieu, 1991, p. 48); it is also true that those descriptions tend to rhetori-cally construct as general that which is particular and local (Balester, 2012; Poe, 2006; Slevin, 2001).

This study of a corpus of documents used to describe and assess academic writing in first-year composition (FYC) courses at U.S. public research uni-versities maps their descriptions of specific features (or “traits”) of student writing across their descriptions of the ability levels (or “performance cate-gories”) that students can achieve—or fail to achieve—in coursework designed to introduce them to the conventions of academic writing. For FYC teachers and their students, these documents mediate high-stakes moments in the ongoing maintenance of the “standard” in academic writing conventions. As will be seen, there is much to admire in these documents’ attempts to define writing traits and performance categories (especially when compared to scales developed before the emergence of writing studies), and this study finds evidence of progress toward a more valid construct of the complexities of writing than those that circulated during the “preprocess” and “static abstraction” eras. However, this analysis will also lend empirical support to concerns about construct validity in writing assessment, as granular analyses of the nouns, verbs, and modifiers that dominate each performance level of each trait find several questionable operating assumptions in the construction of academic writing ability at all levels (but especially at performances con-structed as “below average”). It will also be clear that these documents rou-tinely efface the labor of student writers and the role of readers who use these documents to arrive at consequential decisions about the students who pro-duce that writing.

After some brief historical context, a description of the assembly and cod-ing of the corpus will be explained. Results from analyses of the resulting database are reported, analyses that have made it possible to induce the theo-retical construct of academic writing that FYC programs in U.S. public research universities appear to be enacting. This is a baseline empirical analy-sis that locates and considers recurrent patterns of language choices and is intended to support efforts to design and field-test a critical-descriptive assessment vocabulary that can enable appraisals of student writing in U.S. FYC that are more consistent with the sociocultural/contextualist construct of writing that is dominant in the writing research community (Behizadeh &



Dryer 5

Englehard, 2011). On those appraisals rest not only the well-documented social consequences of success or failure in FYC but also the construct of “legitimate” or “standard” writing that students will carry into their future classes, their workplaces, and their private and civic lives.

Writing That Scales WritingLittle is known about how scales themselves are composed, and few field-tested recommendations for scaling performance categories exist as of this writing. In the latest edition of Educational Measurement, the authors responsible for the chapter “Setting Performance Standards” concede that the topic has received little attention, and they emphasize the need for thoughtful research on meaningful performance category descriptions (Hambleton & Pitoniak, 2006, p. 453; see also Knoch, 2011, p. 90). Yet scale design is a critical factor in raters’ perception of writing ability, as evident in Mills and Jaeger’s (1998) study of the effect of changing performance category defini-tions. Raters who used a rubric that was specific about what students had to do to achieve a particular category scored more generously than those who did not, a finding that suggests that the specificity enhanced the raters’ ability to recognize and reward a broader range of approaches (Mills & Jaeger, 1998, pp. 82-83).

What is well known is that scales tend to rhetorically construct local per-formance categories as universal descriptions (Brindley, 1998, p. 144), likely because their function has always been to counteract readers’ varied responses to student writing. Rater-borne variance in scoring (i.e., poor interreader agreement on scores and single-rater “drift” over time) has been a serious concern since the first part of the 20th century (e.g., Thorndike, 1911; Hillegas, 1912). The rise of psychometrics and an increasing awareness of evaluative subjectivity (Connors, 1997, p. 152) have left hundreds of scales preserved in the amber of JSTOR that remain useful for throwing current assumptions about text evaluation into sharper relief. For example, the 1940 grade definitions to be found in “English 1: Aims and Methods,” a document recently unearthed1 at the University of Texas, Austin, employ a distinct rhet-oric of connoisseurship (e.g., “naturalness of movement,” “doubtful propor-tions”; Figure 1) that grounds the appraisal in the reader’s taste.

A thorough history of the long move away from connoisseurship and toward more precise measurement of specific traits such as “organization” is not possible here (but for this, see Elliot, 2005). Space does permit a few glimpses from landmark studies necessary for context: In Factors in Judgments of Writing Ability, Diederich, French, and Carlton (1961) asked




the readers they had assembled to score 300 student papers and “use what-ever hunches, intuitions, or preferences you normally use in deciding one piece of writing is better than another” (p. 11). Only 6 years later, Follman and Anderson (1967) characterized a scale as a means to organize raters’ per-ceptions and direct their values (pp. 198-199), and by the time of S. W. Freedman’s (1981) study of sources of variance in scoring, Freedman could assume that some scale would be part of any scoring environment (pp. 250-252, 254). Yet those two decades also saw doubts surface about scales’ operation as independent variables in scoring. A. Freedman and Pringle (1980) bluntly characterized their scale as a “rhetorical instrument” (p. 315); two decades later, Lumley (2002) agreed that scales might help channel raters’ diverse reactions to texts into narrower, more usable statements but disagreed that those statements were necessarily valid (p. 268).

And scales are texts themselves. They are material artifacts of theoretical constructs—beliefs about writing and what it should look like (Hamp-Lyons, 2011, p. 3); as such, they too are subject to competing interpretations (Rezaei

Figure 1. First-year composition grade definitions.From the University of Texas, Austin, 1940.



Dryer 7

& Lovorn, 2010) and agendas (Columbini & McBride, 2012). As Turley and Gallagher suggest (2008), their ubiquity moots many of the more abstract debates over their use, and given the scope of U.S. FYC (Crowley, 1998; Gere, 2009), it is critical to identify as precisely as possible the theoretical constructs of academic writing that operate there, a project for which corpus analysis is particularly well suited. Patterns writ large across this collection will reveal the beliefs about writing to which teachers and students are being asked to subscribe in U.S. universities in the first decade of the 21st century.

MethodCorpus analysis examines patterns of language choices in a preferably large collection of texts that have been produced under whatever conditions would be typical of that genre. Ideally, the collection is machine analyzed for con-sistency and reliability, and any conclusions drawn from the resulting data are grounded in theoretically coherent interpretive judgments (Biber, Conrad, & Reppen, 1998, pp. 4-5). In Lee’s (2008) taxonomy of approaches to corpus analysis, this study lies between the purely qualitative and intuitive corpus-informed approaches and the purely quantitative corpus-induced studies used in, say, machine learning (p. 88). There are fewer preconcep-tions operating here than in analyses of corpora that rely on close reading alone, but because the corpus was manually rendered analyzable through the coding decisions described below, it is most accurately described as corpus driven (pp. 89-91).

The CorpusText corpora are usually delimited by strict genre conventions (e.g., research article abstracts in the Journal of Pragmatics; Gillaerts & Van de Velde, 2010) or controlled-topic undergraduate essays (McNamara, Crossley, & McCarthy, 2010). But genres are usefully defined not by formal features but by the social actions they accomplish (Miller, 1984, p. 151). Though the individual documents in this corpus are diverse in form, each is an institu-tional response to the belief that readers’ connoisseurship of writing quality requires supplementing (or focusing or channeling, as the case may be) to reliably and fairly categorize first-year undergraduates’ performance of aca-demic writing conventions. As will be seen, this collection contains critical assumptions that, like those of any other genre that has not been studied en masse, have escaped unnoticed (Römer & Wulff, 2010, p. 102).




The scales in this corpus currently circulate among hundreds of teachers and thousands of students at many of the 166 U.S. public doctoral research, Research II, and Research I institutions.2 The sampling plan is limited to public research universities for three reasons: First, their large FYC programs must accommodate the diverse needs of comparatively heterogeneous cohorts of matriculating students; second, these programs are routinely staffed by contingent and/or novice instructors, necessitating additional guarantees of intersection consistency; third, the writing program administrators of these programs are likely to be trained compositionists. It is therefore assumed that the documents in this corpus represent the best-designed and most exten-sively field-tested examples of the genre to be found.

A review of these universities’ websites eliminated three institutions with-out FYC programs and six that did not provide contact information for their programs, if they had them. A personalized, signed letter describing the proj-ect was sent via USPS to the writing program administrators of the remaining 157 programs to explain the project, to offer a copy of the University of Maine’s English 101 scoring rubric as a good-faith gesture, and to explain that I would be following up with an e-mail. After two rounds of individual-ized e-mail, 101 writing program administrators responded (64%), with 72 sending either program definitions of letter grades or descriptions of perfor-mance categories for end-term portfolio assessment (or both), resulting in a total of 96 documents.3

To avoid overrepresentation, if a writing program administrator sent two scales from two writing courses (e.g., 101 and 102), only the former was used; if multiple scales for different assignments from the same course were sent, only the scale for the most heavily weighted assignment was used. These deci-sions culled the total to 83 documents that concern only introductory writing courses (or, in two cases, the most significant deliverable for such a course). Every geographic sector of the United States is represented; more important, the full range of public institutions is present, from the “public ivies” to those that are effectively open admission. Considering Knoch’s (2011, p. 82) and Haswell’s (1998, pp. 237, 245) emphasis on the importance of distinguishing among kinds of scales, it is important to note that the corpus contains no diag-nostic scales—that is, those that are used for placement purposes. Instead, each is intended to assess proficiency. With only two exceptions, the entire corpus assumes native speaker status.4 Even so, this corpus dramatically expands pre-liminary attempts to collect and investigate the linguistic construction of writ-ing performance, which have relied on documents freely available on the Internet (Balester, 2012; Jeffrey, 2009; Poe, 2006; Scott, 2005).



Dryer 9

Corpus Coding

The corpus contains 32 documents that define letter grades and 51 scoring rubrics that describe performance categories on a numerical scale or with adjectives.5 As Table 1 illustrates, 20 of the grade definition documents have discrete definitions for the typical U.S. practice of evaluating work at an A, B, C, D, or F level; 8 collapse D and F into a single definition of nonpassing work; 4 define only C as a way to set a benchmark for minimum competency.

Unlike the 32 grade definitions, most of the 51 scoring rubrics describe pro-gram outcomes for an assessor who is not the instructor of record for the course (usually other teachers of the same or related courses). Unconstrained by the five-letter convention, they divide performance categories in many ways. As Table 2 shows, the two most common scales identify five performance catego-ries, with a pass/fail break at 3/2 and a 4-point scale with a break at 2/1.

Like the grade definitions, the scoring rubrics are heavily weighted to pass (note that only two of the 6-point scales and only five of the 4-point scales are equally weighted).

Table 1. Distinctions Among Categories: Grade Definition Documents

n = 32 Passing Grades Failing Grades

20 A B C D F 8 A B C D/Fa 4 Cb

aOne definition of “below passing work.”bOnly C defined to establish “minimally passing work.”

Table 2. Distinctions Among Performance Categories: Scoring Rubric Documents

n = 51 Passing Scores Failing Scores

4 6 5 4 3 2 1 2 5 4 3 2 1 13 5 4 3 2 1 2 6 5 4 3 2 111 4 3 2 1 1 3 2 1 7 3 2 1 5 4 3 2 1 5 1




All documents were scrubbed of any information that identified their institutional origin and uploaded to NVivo 9 qualitative research soft-ware. While NVivo is not specifically designed for corpus analysis, it does enable electronic markup of documents, a certain level of autocod-ing, and matrix and word frequency queries, on which much of the fol-lowing analyses rely. As explained in the next section, each document was coded for the levels (or “performance categories”) at which it was possi-ble to rate students’ writing ability and then again for all the specific features (or “traits”) of academic writing that readers were asked to appraise. While much of the coding could be accomplished by simple text captures, some coding required more researcher judgment. The discus-sion below attempts to account for any sources of error these decisions may have introduced.

Performance Category CodingThe documents that define grades of A and C or scores of 4 and 2 present few difficulties for reliable and accurate coding. But because not all scoring rubrics break the pass/fail continuum at the same place, they present three problems. First, as Table 2 indicates, two scoring rubrics distinguish among three levels of inability, and six distinguish among four levels of ability. The language in this handful of outlier categories proved so similar to that of the adjacent categories that it could be effectively merged into the top and bot-tom coding nodes, as shown in Table 3.

Table 3. Performance Category Distinctions: Scoring Rubrics

Passing Failing

Superlative Middling Good Adequate Inadequate Severe Fail

6 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 4 3 2 1 6 5 4 3 2 1 4 3 2 1 3 2 1 3 2 1 1



Dryer 11

Two problems remained: There is no equivalent in the portfolio scoring guides to a D, in the sense of “just passing” (since all that language is necessarily coded as adequate), nor is there anything in the grade definitions that will accommodate a “failing but not in the worst way.” In other words, some scoring rubrics ask raters to distinguish between two levels of failure, and most grading definitions allow for what might be called a grudging pass. These two problems have been resolved by creating six master performance categories—four pass-ing and two failing:

Superlative: All grade A definitions and the top-most performance cat-egory (or categories, if there are more than three levels of passing work).

Middling good: All grade B definitions and performance categories characterizing passing work that lies below the superlative category but above baseline or adequate.

Adequate: All grade C definitions and all minimal performance categories.

Just passing: Definitions of the grade D. As explained above, this cat-egory cannot be merged with adequate, because that language is designed to construct minimally satisfactory outcomes, when the language of D is clearly dissatisfaction. A D is a rebuke, but it is not a final no and cannot be merged with inadequate.

Inadequate: This category contains all language from the portfolio scoring guides that describes either single categories of nonpass-ing work (e.g., a 1 on a 3-point scale) or the first of two layers of nonpassing work (as in Table 3). This should sharpen the distinction with the category of just passing.

Severe fail: This category contains all definitions of the grade F and the handful of distinctions reserved for performance levels worse than inadequate.

Two examples illustrate these distinctions. First, here are the superlative and just passing performance categories of Grading Scale 37 (the tightly par-allel structure of the paragraphs is typical):

The “A” essay demonstrates the writer’s ability to address rhetorical situations in innovative, creative, and perceptive ways. The writing is more than above average; it is exceptional. The purpose is distin-guished by some depth or breadth of insight; all support offered is




interesting, relevant, and boldly thought-provoking. The organization is not only coherent but marked by appropriateness to the specific rhetorical situation, and the transitions show sophistication and origi-nality. The writing exhibits finesse on the writer’s part in matters of style, diction, and usage. There are no grammatical errors.

The “D” essay indicates the writer’s ability to address rhetorical situations somewhat competently, but the writing contains weaknesses and/or errors that mark it as less than what is expected in one or more of the following ways: The purpose is confused or too general; the support offered is vague, unconvincing, inaccurate, irrelevant or too narrow in focus; the organization is confusing or unsuccessful; the style, voice or tone is inconsistent or inappropriate; the sentence struc-ture is difficult to read or inappropriate. Numerous mechanical and grammatical errors hinder the readers’ ability to understand the text.

The D essay (or in this coding scheme, just passing) may be “confused,” “inconsistent,” and/or “inappropriate” but is just “less than what is expected”—not failing and therefore not codable as inadequate. As the sec-ond example, Figure 2 contains one of the eight traits that Assessment Rubric 42 asks readers to rate on a 5-point scale. Figure 2 should illuminate the dis-tinctions among the coding categories of adequate, inadequate, and severe fail: Note the verb “lacks,” the adjectives “inappropriate” and “nonexistent,” and the phrases “no awareness” and “do not make sense”—not yet the lowest performance category but clearly not a passing score either.

Table 4. Trait Canonicity

Canonical TraitsNo. of Documents

ReferencingTotal References in the Corpus

Grammar (mechanics, conventions, usage) 78 270Evidence (support, development) 73 340Thesis (focus, purpose, argument) 71 272Style (voice, tone, variety, paragraphing) 67 343Organization, structure 59 197Critical thinking, analysis 59 258Audience, rhetorical awareness 53 176Assignment, engagement 48 196Creativity, originality 39 74Writing process, revision 16 57



Dryer 13

Apart from directions to readers, almost all the words in the corpus are codable under one of the six performance categories described above—the nine documents that describe only the requirements of minimally passing papers can be coded in their entirety as adequate; the definition of the char-acteristics of a B paper can be coded middling good (as can the category of 4 in the excerpt of the scale above), and so on.

Trait CodingIn the tradition of analytic scoring, “traits” are isolable elements of a perfor-mance of writing ability (i.e., as distinct from a “holistic” impression). After a list of traits named in each document was compiled, word frequency queries were used to test recurrent synonyms (see Table 4). These queries identified clusters of 10 codable master traits, 8 of which appeared in over half the documents in the corpus. There were many traits particular to only a few (or only one) of the documents (e.g., “familiarity with composing technology” or

5 4 3 2 1

Organization and Coherence

Overall structure, organization, and paragraph construction are appropriate to the assignment and an academic audience. All ideas in the paper flow logically. Transitions show originality and sophistication.

Overall structure, organization, and paragraph construction are appropriate to the assignment and an academic audience, though may be less evident and understandable in some places. Most ideas in the paper flow logically. Transitions are adequate but may be unclear or missing at times.

Overall structure, organization, and paragraph construction are readable, somewhat appropriate to the assignment and an academic audience, though may be a bit awkward in some places. The paper does not always flow logically and make sense. Transitions are formulaic and may be few or weak.

The paper lacks coherence, providing no discernable argument; ideas do not flow logically and do not make sense.

Overall structure, organization, and paragraph construction are difficult to read, or inappropriate for audience. Transitions are confusing or nonexistent.

The writing is very difficult to understand owing to major problems with organization and structure.

Figure 2. Performance categories, organization trait, Scoring Rubric 42.




“grasp of the concepts of literary textual analysis”), which have been excluded here. It was also decided to exclude the “creativity” and “process” traits, since they appeared much less frequently than assignment/engagement, the rarest of the traits that appeared in at least half the documents in the corpus.6 Each document was then coded for any description of the 8 remaining traits.

Since most of the 51 scoring rubrics explicitly label traits for raters, they pres-ent no difficulties for reliable coding. (The row excerpted in Figure 2, for exam-ple, can simply be coded in its entirely as organization.) Some of the grade definition documents presented more of a challenge, since it was necessary to code those sections of the paragraphs or sentences that describe, for example, B- or D-level performance. To reuse the example from Grade Definition 37,

The “A” essay demonstrates the writer’s ability to address rhetorical situations in innovative, creative, and perceptive ways. The writing is more than above average; it is exceptional. The purpose is distin-guished by some depth or breadth of insight; all support offered is interesting, relevant, and boldly thought-provoking. The organization is not only coherent but marked by appropriateness to the specific rhetorical situation, and the transitions show sophistication and origi-nality. The writing exhibits finesse on the writer’s part in matters of style, diction, and usage. There are no grammatical errors.

Recall that this paragraph has already been coded in its entirety at the super-lative performance category: The first sentence is coded audience but not creativity, since the adjectives “creative” and “innovative” are being used to scale audience itself. The second sentence is not coded, since it does not iden-tify a trait of “writing” that is “exceptional.” The third sentence is coded thesis before the semicolon, evidence afterward. The fourth sentence requires some judgment: The descriptors before the “but” are part of organization, but the first half of the sentence that begins “The organization is” must also be coded audience, along with the phrase “marked by appropriateness to the specific rhetorical situation.” Descriptors after the conjunction and the fifth sentence are coded style; the final sentence is coded grammar.

As Table 5 illustrates, these two coding schemas have made it possible to see all of the language that appears in each of the 48 cells—that is, each inter-section of the six performance categories and the eight canonical traits (e.g., every instance where audience awareness is appraised as superlative, where critical thinking is appraised as middling good, where style is appraised as severe fail).



Dryer 15

Findings

The Pearson correlation coefficient was used to measure the degree of word similarity between any two collections of language coded at the above traits. Since NVivo is not a statistical analysis package and cannot generate p val-ues, these correlations must be taken advisedly. Nonetheless, the resulting dendrogram provides a useful framework for ordering the presentation of the findings, and the close looks at the specific language choices found at each trait that follow below help validate the clustering in Figure 3.

The language collected under style and organization correlate at .589, thesis with organization at .527, and thesis with style at .437, stronger correlations than with any other trait. That the words that appear under thesis should be so similar to those codable under style and organization might help explain long-standing research findings that organization and “ideas” tend to disproportionately affect raters’ scoring decisions (Breland & Jones, 1982; Huot, 1990). As the close examination of the language in each trait reveals below, style is insufficiently disambiguated from organization, and neither has been made adequately distinct

Table 5. Word Counts: Performance Category–Canonical Trait Intersections

Performance

Trait Superlative Middling Good Adequate Just Passing Inadequate Severe Fail

Thesis, focus, purpose 490 507 608 118 222 307Organization, structure 505 328 498 141 233 236Style, voice, tone 1,454 981 1,462 252 470 458Support, evidence 933 507 608 118 222 307Critical thinking, analysis 596 554 685 185 288 283Audience awareness 357 252 300 112 96 157Engagement, assignment 384 199 396 105 71 188Grammar, conventions 729 621 1,022 279 379 598

Figure 3. NVivo-generated Dendrogram: Traits Clustered by Word Similarity.




from thesis, failures that may contribute to widespread logical rating error. The following discussion will also shed some light on the strong linguistic similarity between critical thinking and support (here measured at .507) and on the reasons why the language grouped under the assignment, audience, and grammar traits was not as easily grouped with any other trait or subgroup of traits.

As Martin and White (2005) explain, evaluation is not simply a matter of selecting and assigning a value that exists in isolation; rather, it involves complex combinations of meaning (p. 159). In scaling, the relevant subsys-tem of appraisal is what Martin and White call “graduation”: the linguistic construction of intensity, amount, and prototypicality. To reveal the internal logic by which each trait is scaled, the highest-frequency nouns, modifiers, and verbs present at each cell in Table 5 are plotted in the matrixes that fol-low, using boldface, italics, and plain roman Calibri to indicate the order of frequency. The ultimate aim is to synthesize these 8 matrices to induce the theoretical construct of writing that operates in U.S. FYC.

While the rhetoric of connoisseurship seen in Figure 1 is a decisive (and wel-come) retreat, it will be seen that it has been replaced by a rhetoric of self-evi-dence. With one important exception, each of the following matrices shows that descriptive language diminishes with performance level, a tendency that is espe-cially visible in the way that predicates such as “demonstrates,” “structures,” “establishes,” and “engages” dwindle to simple existential verbs, such as “is,” “are,” and “be.” The category of middling good is defined against that of superla-tive and is always grounded in exceptions to top rankings introduced by the word “but.” “But” yields to “may” in the adequate category, which is in turn defined against middling good. “May” is always displaced by “not” and “no” in inade-quate and severe fail performance categories. In severe fail, the key verb in the superlative performance category tends to reappear in a “not” or “no” formation, leaving these levels characterized simply by negations of positive traits (lacks engagement, fails to think critically, has no thesis, etc.). Superlative and severe fail categories are rhetorically constituted by “prototypicality”—that is, the cate-gory is usually free of the disclaimers, qualifiers, or quantifiers that hedge mid-range appraisals (Martin & White, 2005, pp. 137-138).

The categories’ consistent top-down reliance on a rhetoric of “presence” explains the instability of the traits themselves in the failing performance categories. As will be seen, the two lowest categories typically contain words that refer to the students themselves, as projections of the students themselves fill vacuums left by underrepresented traits. For example, “ideas” become an issue in organization; as evident critical thinking wanes, readers are directed to assess students’ “understanding”; raters are asked to take “purpose” into consideration when they are contemplating a low score in audience awareness, and so on.7



Dryer 17

Thesis, Organization, Style

Pearson’s r likely clustered these traits in Figure 3 because they all depend on the rhetorical construction of obviousness. A “thesis” may go by many names, but the data in the modifier row reflect a widespread belief that papers either have a “clear” one or not. Consequently, theses are scaled pri-marily by verbs—what students do with their theses.

Superlative theses are “demonstrated” and “established”; middling good ones are only “presented” or “made.” Synonyms shift as the performance categories struggle to describe papers that “may not have”/“lack”/“have no clear” theses. What inadequate and severely failing papers lack, the corpus suggests, is not a thesis but “ideas,” “evidence,” or “sense.” Modifiers such as “throughout” and verbs such as “develop” and “maintain” suggest that readers are being directed—perhaps confusingly—to consider both a thesis and the organizational and stylistic decisions that extend it.

Ver

bsM

odi

fiers

Trait: Thesis, focus, purpose, argument

Superlative Middling good Adequate Just passing Inadequate Severe fail

Demonstrates, provide, establishes, has, supports

Presents, makes, establishes, argues, demonstrates

Is, be, stated, establish, developed

Be, is maintain Is, lack, have, are

Lack, present, does, contain

Clear, insightful, specific, original, appropriate, effective, consistent, compelling, maintained

Clear, developed, defined, original, generally, fairly, thoughtful, throughout

Clear, general, controlling, central, some

Clear, interesting, inadequate, broad

Unclear, weak, identifiable, inconsistent, vague, throughout

Unclear, central, little, discernable

— But, may May, but, not May, not Not, may, but, no

No, not

Idea, position, writer, inquiry, claim, paper

Paper, sense, writer, statement

Claim, paper, statement, position, problem, balance, evidence

Sense, situation Idea, evidence, paper, points, analysis

Idea, point, direction, paper

Syn

ony

ms




Readers are asked to scale organization by rewarding “effective/clear” “demonstration” of “ideas,” from which they can infer a “plan” that governs the arrangement of the document. The dominant synonym of “ideas” begins to yield to “coherence” at the adequate level; the failing performance catego-ries are distinguished by what they “lack.” In other words, it is assumed that readers will find not “ideas” but a “pattern” of “problems.”8

Trait: organization and structure

Superlative Middling good Adequate Just Passing Inadequate Severe fail

Demonstrates, are, use, follow, supports, completes, establishes, advance

Follow, are, demonstrates, displays, control

Follow, be, are, use, demon-strates, lack

Be, are, consists, fail, missing

Are, lack, attempt, be, does, present, provide, used

Lacks, follow, be, has, impedes, seem, understand

Effective, clear appropriate, coherent, well, strong

Clear, logical, effective, appropriate, coherent, well, evident, somewhat

Clear, overall, logical, occasional, ordered, fairly

Deficient, confusing, missing, clearly, functional, marked

Limited, overall, inappro priate

Difficult, very, clear, illogical, poor, basic

All But, some, may Some, somewhat, but, may, not

May, no, not But, not, much No, may, but

Ideas, argument, paper, plan, development, conclusion

Ideas, introduction, paper, conclusion, transitions, support

Coherence, essay, reader, ideas, patterns, weakness, argument, topic

Introductions, conclusions, ideas

Coherence, ideas, paragraphs, sentences, writer

Pattern, problems, development, essay, ideas, sense, comprehension

Ver

bsM

odi

fiers

Syn

ony

ms

Trait: Style, voice, tone, sentence variety, and paragraphing

SuperlativeMiddling

Good Adequate Just passing Inadequate Severe fail

Use, structures, shows, are

Are, use, contain, is, demonstrates

Are, be, use, show, contains

Are, be, developed, linked

Is, are, uses, lack, contain

Be, are, is, lacks

Clear, effective, coherent, appropriate, developed, precise, logically

Clear, effective, appropriate, logical, precise, most, smooth

Appropriate, generally, awkward, correct, limited

Missing, inappropriate, coherently, confusing, limited, inconsistent, very

Little, generally, inappropriate, overall, weak

Inappropriate present, inadequate, logically, confusing

— But, some May, but, not, some, might

May, not Not, may, but some, often

Not, missing, no, may

Choice, essay, language, writer, diction, ideas

Choice, reader, structure, paper, introduction, thesis

Structure, word, expression, competence, choice, reader, ideas, essay

Options, range, awareness, choice, material, topic

Choice, coherence, writer, structure, logic

Coherence, writing, essay, choice

Ver

bsM

odi

fiers

Syn

ony

ms



Dryer 19

Performance category descriptions of style direct readers to consider students’ choices. While the verbiage of this trait is atypically sparse (i.e., writers “use” good choices, their structures “are” appropriate, or their paper “is” inappropriate), its modifiers suggest that “clarity” is an effect of stylistic decisions. Superlative and middling good stylistic “choices” are held to result in a text assessed as “clear,” a term that disappears when texts are merely adequate, after which a different valence of “appropriate-ness” emerges to assess poor decisions, some of which threaten to reduce the student’s text to “incoherence.” The close relationship between style and organization is most evident at these lowest performance categories, where the entire essay being appraised is a synecdoche for the traits themselves.

Support, Critical ThinkingPearson’s r finds the language of these two traits more similar to each other than that of any other trait (see Figure 3), perhaps because the internal logic of the scaling suggests that students who provide more “details” or “rea-sons” or “examples” appear to be doing “analysis” or to be thinking more “critically.”

Trait: Support, evidence, development

Superlative Middling good AdequateJust

passing Inadequate Severe fail

Use, are, provides, presents, demonstrates, integrates

Use, are, provide, explain, integrates

Use, are, be, engage, incorporate, questions, provides, relate

Are, be, present, use, rely

Is, are, use, lack, present, provide

Is, use, are, present

Appropriate relevant, specific, clear, convincing, compelling, credible, strong, thorough

Effective, appropriate, relevant, clearly, generally

Sufficient, appropriate, major, relevant

Irrelevant, insufficient, absent

Little, integrated, irrelevant, weakly, clear, inaccurate

Insufficient, irrelevant, little, inaccurate, incoherent

— But, not, may May, some, but, not

May, not, no May, not, some, but

Not, no, missing

Details, ideas, examples, writer, reasons, argument, claim

Ideas, details, examples, writer

Ideas, details, analysis, writer, generalities

Details, examples, analysis

Ideas, writer, generalizations, detail

Ideas, writer, content, detail, readings

Ver

bsM

odi

fiers

Syn

ony

ms




The support trait operates on the assumption that student writers either “use” evidence to support their “ideas” or that the evidence “may” be “irrel-evant” (or there is too “little” of it) or it simply “is not” there. Yet the prin-ciple verb of “use” is inflected by the distinction between support that has been “integrated” and support that has only been “presented”—an interest-ing overlap with style. Where all performance categories direct readers to reward “ideas,” high scores in support direct the rater to value “details” and “examples.” The two failing performance categories invoke the “writer” as a source of “irrelevant” or “not integrated” ideas or of development that is “insufficient.”

Critical thinking, perhaps the most speculative and controversial trait measured in this corpus (Peckham, 2010), is effectively limited to the verb “demonstrate” to convey whether (and to what degree) a student paper can be said to be thinking critically. Here, synonyms do most of the work of scaling performance. These synonyms show that as performance levels diminish, the materials that the writer is working on gradually replace activities that the writer could be said to be doing. Superlative critical thinkers, in other words, are able to address “complexities”; middling good writers do “convincing” work demonstrating to readers that they have done “some” work with “issues”; and adequate critical thinkers’ “limited” critical thinking suggest competent “understanding.” But words for the objects of work appear increas-ingly after just passing. “Text” and “paper” displace “awareness” and “rea-soning”; an inadequate or severely failing critical thinker is undone by

Trait: Critical thinking, academic analysis


Is, demonstrates, provides, shows, explores, offers

Is, demonstrates consider, show, engage, acknowledge, view

Demonstrates, is, read, use

Is, lack, remains

Be, has, demonstrates, establish, develop, see, suggest

Does, demonstrate, understand, analyze, display

Insightful, clear, thoughtfully, valid, opposing

More, convincing, less, controlling, valid, logical

Limited, competent, basic

Flawed, minimal, excessive,

seriously

Simplistic, weak, limited, superficial

Flawed, deeply, relevancy, little, logical

— Some, but, may May, some, not May, not May, not, some Not, no, may, only

Complexities, issues, awareness, argument, reasoning, writer

Writer, issues, ideas, awareness, reasoning, significance, topic, text, detail, argument

Understanding, ideas, reasoning, text, questions, engagement, evidence, topic, writer

Ideas, reasoning, summary, subject, text, writing, paper

Writer, positions, text, issues, topic, connection, ideas

Text, issue, positions, reasoning, ideas, writer, writing, subject

Ver

bsM

odi

fiers

Syn

ony

ms



Dryer 21

“positions” and the “text”—which the corpus suggests she or he simply “does not” “understand.”

EngagementThis trait is the first of the three that Pearson’s r was unable to definitively group with any other trait, finding critical thinking to be a distant rhetorical cousin at only .301. The corpus indicates that measures of student engage-ment are inseparable from the assignment to which they are responding, since the assignment produces the situation in which the student paper is written (Bawarshi, 2003, pp. 112-144); accordingly, performance categories in this trait pivot on a narrow set of verbs describing students’ uptake of the prompt. Adequate engagement may be “consistent,” this matrix suggests, but such students “follow” assignments rather than “addressing” them. As the synonyms indicate, it therefore follows that all but superlative writers are preoccupied with “requirements,” whereas the superlatively engaged student writer makes the assignment his or her “own”—that is, something “beyond” the assignment that called for it.

Like style, above, students might be interested to see that the corpus regards “length” as an effect rather than a quality to be pursued for its own sake—that is, a severely failing paper might be characterized as “short” or, in that characterization, might substitute “length” as a noun for “engagement.” Yet “long” is not the adjectival counterpart in superlative assessments; by “address[ing]” the assignment “thoughtfully,” “thoroughly,” or going “beyond”

Trait: Engagement, assignment


Addresses, meets, fulfills, demonstrates, have, shows

Fulfills, meets, engage, follows, attempt

Followed, meets, fulfills, engage, has, addresses

Attempts, respond, follow, does, address

Meet, attempt, shows, be

Address, fails meet, falls, does

Thoughtfully, own, excellent, thoroughly, critical, beyond, depth

Clearly, beyond, generally, rhetorical, mere

Consistently, all, less, appropriate, beyond, some

Directly, little superficially, inadequately

Only, badly Short, inappropriate, obvious, seriously, basic, brief, minimum

— Not, but Not, some, but, may

May, not May, not Not, no, might

Writer, writing, requirements, task, topic, work

Requirements, author, expectations, format, competence, readings, level

Requirements, task, issues, fashion, minimum

Paper, issue Prompt, tasks, writer, effort, credibility

Requirements length, subject, minimum, expectations, prompt

Ver

bsM

odi

fiers

Syn

ony

ms




the “requirements” and into “depth,” student writers produce papers that are not short.

AudienceThe “reader” (who is strongly present in this trait when she or he assesses it highly) responds well only to formal writing, although the documents in this corpus do not quite put it this directly. This reader will perceive “effective” and “appropriate” writing as formal writing, or, to put this one other way, a desire for formal writing should be what the superlative student “senses” would fit the rhetorical occasion. As this formality begins to slip (i.e., where writing is “generally appropriate” or “sometimes consistent”), the “but” and “may” open spaces for the adjective “informal” in middling good and ade-quate performances. Like the thesis trait’s rhetoric of purposeful control, poor writing “shows,” where good writing “demonstrates” and “addresses.” A writer’s informality is thus evidence that one has “little” or “no” sense of the “situation,” a manifestation of one’s “lack” of ability to accommodate this “reader.”

Grammar

Grammar was probably stranded by Pearson’s r in Figure 3 because its presence-absence continuum is constructed in reverse. In all other traits, complexity and nuance diminish with performance category, while hedges

Trait: Audience, rhetorical awareness


Demonstrates, engages, attends, employs

Demonstrates, shows, values, reflects, be, has

Addressed, be, shows, has, contains

Make, contain Communicate, shows

Shows, lack accommodate, demonstrate

Appropriate effective, clear, sophisticated, strong, consistent

Appropriate, clear, consistent, generally, informal, very, real

Appropriate, consistent, most, different, informal, sometimes, generally

Limited, little, prior

Inconsistent, weak, little

Little, informal

— — But, some, may May, but, few not May, not, no No, not, anyReader, needs,

sense, situation, purpose

Sense, purpose, writer, consideration, respect, influence

Sense, needs, reader, purpose, paper

Sense, knowledge, accommodations, assumptions, writer

Purpose, genre, impact

Situation, sense, purpose

Ver

bsM

odi

fiers

Syn

ony

ms



Dryer 23

and modifiers increase; here, the rhetorical construction of certainty increases with lower performance categories. Unlike style, good grammar is invisible. This logic is also evident in the role of the reader: the same rater who is delighted by superlative style or feels that the student has accurately assessed one’s audience does not appear until one appraises grammar as merely ade-quate (see also Table 6). The reader only begins to appear, in other words, in those performance categories that register one’s “difficulty” or where one’s reading is “impeded” or “distracted” by the writer’s lack of “control.”

Presence of the Student/WriterTable 7 indicates the number of words at each intersection in which the stu-dent whose work is being assessed is the grammatical subject (e.g., “The author demonstrates the ability to integrate the material logically and responsibly into his/her argument” or “The student includes transitions or other devices to guide readers”). Agentive students disappear in lower performance catego-ries. Where they are constructed as active agents in their writing, it is primar-ily as makers of decisions about style, use of evidence, and critical thinking. They are not routinely granted agentive control over their work on grammar, thesis, organization, or—curiously—their engagement with the assignment.

Textual AgencyIt is instructive to compare Table 7 with Table 8, which shows that students’ texts are overwhelmingly the grammatical agent in the corpus (e.g. “This paper displays a good grasp of grammar” or “This portfolio shows a strong awareness of the reader’s need for context and clarity”).

Trait: Grammar, mechanics, conventions, and usage


Are, is, use, exhibits, demonstrates, undermines

Is, has, are, do, conform, contain, exhibits

Are, demonstrates, control, distract

Contains, lack, demonstrate, marred

Exhibit, are, is, impede, distract, interfere

Contain, control, understand, are, lack, impede

Few, correct, free, appropriate, consistently

Few, correct, serious, generally, very, occasional

Major, generally, few

Difficult, flawed, academic, consistent

Major, repeated Serious, significant, inappropriate

No, none But, not, may Not, some, may, but

May, numerous, many

Frequent, may, many, not

May, frequent, no, not, numerous

Paper, standard, control, effectiveness, style

Paper, structure, writing, variety, standard

Paper, meaning, format, reader, structure

Comprehension, diction, structure, proofreading, reader, discourse

Structure, style, text, reader, paper

Comprehension, paper, style, number, diction, reader

Ver

bsM

odi

fiers

Syn

ony

ms




Presence of the Teacher/Rater

The corpus is even more selective in the places where it mentions readers, scorers, or raters. It is not surprising to see them consistently in the trait of “audience/rhetorical awareness,” but they are disproportionately present when giving favorable assessments of style (e.g., “The reader was impressed by this paper’s lively tone”) and in their strong reactions to the presence of error. In many documents, this particular intersection was the only place readers appeared at all (e.g., “The reader was frequently confused by severe grammatical problems”). They are seldom seen in the act of appraising theses, support, organization, or critical thinking.

Table 6. Presence of Rater: Performance Category–Canonical Trait Intersections

Performance

Trait Superlative Middling Good Adequate Just Passing Inadequate Severe Fail

Thesis, focus, purpose 0 16 8 13 0 15Organization, structure 36 32 86 5 0 20Style, voice, tone 133 97 129 19 64 52Support, evidence 19 23 12 0 0 6Critical thinking, analysis 14 2 10 17 0 0Audience awareness 46 28 59 48 35 55Engagement, assignment 20 0 8 13 0 0Grammar, conventions 54 66 505 52 122 254

Discussion: The Contemporary Theoretical Construct of Writing in U.S. FYC

Behizadeh and Engelhard (2011) argue that writing theory has had minimal influence on writing assessments, which they find laboring under formalistic constructs of writing that are at odds with the sociocultural/contextualist construct dominant in the contemporary writing research community (p. 206). It is worth pointing out that 25 of the 51 scoring rubrics are explicitly designed to assess portfolios or collections of student work and that many facilitate multiple reviews by different readers and contain protocols for adju-dicating discrepant appraisals, evidence of a collective will to distance assess-ment of student writing from the formalist construct of connoisseurship that obtains, for example, in Figure 1. But Behizadeh and Engelhard’s findings can



Dryer 25

Table 7. Student as Agent: Performance Category–Canonical Trait Intersections

Superlative Middling Good Adequate Just Passing Inadequate Severe Fail

Thesis, focus, purpose 23 21 22 0 10 4Organization, structure 40 10 51 0 39 0Style, voice, tone 148 42 158 0 54 0Support, evidence 124 75 145 0 53 37Critical thinking, analysis 112 96 218 9 85 22Audience awareness 48 2 43 0 0 0Engagement, assignment 36 8 61 0 9 0Grammar, conventions 58 12 73 0 7 17

Table 8. Text as Agent: Performance Category–Canonical Trait Intersections

Superlative Middling Good Adequate Just Passing Inadequate Severe Fail

Thesis, focus, purpose 349 393 452 112 166 244Organization, structure 341 258 400 140 140 211Style, voice, tone 1,093 839 1,048 268 251 327Support, evidence 608 580 634 315 226 367Critical thinking, analysis 339 392 352 175 96 181Audience awareness 247 212 172 112 37 113Engagement, assignment 332 194 330 101 58 156Grammar, conventions 430 438 607 253 177 434

be put to a closer test: With maps of the lexical frequencies and the internal logic of each canonical trait in this corpus, key words from each trait can be used to articulate the relationships among them. That is, it is now possible to induce the theoretical construct of “writing” operating in the FYC programs that contributed documents to this corpus (Figure 4).

As noted in the introduction, there are several important indicators of progress to be emphasized. First, the language patterns in style, organization, and thesis indicate that assessors of student writing are encouraged to appraise students’ ideas in a more nuanced and valid way than the 1940 spectrum of “original” to “childish” from the University of Texas, Austin. Second, the corpus reflects (if spottily) the understanding that readers in different con-texts have different needs and that part of what makes specific instances of writing superlative or adequate are the specific conventions and expectations operating in that context, as opposed to suggesting to students and teachers, as was the case in Figure 1, that these categories stand in for all “good




writing” everywhere. Third, significant progress on the historically fraught question of “error” is also evident. Not only is it expected that surface error is inevitable (note that the most frequent modifier in superlative grammar is not “none” but “few”), but it is also routinely suggested to students and teach-ers that there are kinds or degrees of error. That is, the errors that are “seri-ous” or “significant” are such because they “interfere,” “distract,” or “impede” readers’ understanding and not because they violate some existential category of “correctness.”

However, the only way to articulate the construct mapped in Figure 4 in a way that is consistent with the findings in Tables 6 to 8 is for “writing” to be the agent of the sentence, with the student/writer deferred and the role of the reader in producing the appraisal largely effaced. Readers produce scores, not papers, but five of the eight canonical traits are left heavily reliant on the modifier “appropriate” to justify appraisals with no explicit reference to or description of the putative addressee (Martin & White, 2005, p. 95). In all traits, at all performance levels, readers’ experiences of the texts are pre-sented as intrinsic qualities of those texts.9 By suppressing the role of the reader, the documents in this corpus suppress the uncomfortable fact that—outside of a handful of studies on this effect—raters, students, and writing program administrators have no way to compare the appraisal that their scale constructed with a judgment that might have been made had a different scale been used (Lumley, 2002 p. 268). Here, Behazideh and Engelhard’s point is well taken: The ubiquitous framing of what is local, temporal, and contingent as if it were generalizable, ahistorical, and definitive is inconsistent with what is generally known to be true about the embeddedness of writing in complex

Figure 4. Contemporary construct of “academic writing” in U.S. first-year composition.



Dryer 27

social systems of activity (Prior & Shipka, 2003; Prior, 2006; Roozen, 2009, 2010). As the induced construct itself suggests, however, this corpus does not routinely frame the writing conventions it rewards as existing in just such a context. Of 261 instances of the word “writing” in the corpus, only 34 are explicitly modified with words such as “essay” or “academic,” even though it is specifically within an academic activity system that these documents require students to produce writing and readers to assess it.

Writers themselves are dynamic parts of those systems. Although “to fail” often means “to be unable to,” it means (at least some of the time) “to choose not to.” Apart from two instances of “ignore” (as in “ignores evidence that contradicts the claim”), however, this corpus contains no agentive verbs of failure; there are no instances, for example, of any version of the verbs “refuse,” “decline,” “object,” “resist,” “oppose,” “subvert,” “parody,” “mock,” or “satirize.” It is questionable that these documents construct stu-dents as agents only when they are engaged in the kinds of performances that they guide readers to recognize as conventional. In eliding students’ potential agency in “failing” to meet standards, the documents in this corpus present their criteria and performance categories as uncomplicated means to an ideo-logically neutral end.

While obviously not the intention of the documents that constitute this corpus, the systematic underrepresentation of students in this corpus denies their intellectual labor; the systematic underrepresentation of teachers’ and raters’ involvement with the scores or grades they produce denies their evalu-ative expertise and their role in the process. The heavy weighting of the entire corpus toward higher performance categories, both in simple word count and in richness of description, constrains the theoretical construct to an announce-ment of its most favored conventions. The impoverished language in the lower traits is ironic in this context, since the rhetoric of absence and negation that operate on those levels do little to scaffold teachers’ understanding of the causes and complexities of writing appraised at those levels or to provide opportunities for these readers and writers to recognize themselves as agents able to do things differently next time.

Since the language used to assess these traits and performance categories will inevitably wash back into teachers’ and students’ everyday rhetorical constructions of what counts as good writing and of writing development more generally (Rothermel, 2006; Scott, 2005), the corpus contains at least 227 missed opportunities to emphasize the situatedness of the students’ writ-ing (i.e., why they might want to write like this and for what ends), the local nature of the scoring and grading (i.e., why the readers make the decisions they do and by what authority), and the specific construct of the writing




valued by the assessment (i.e., what this kind of writing is, why it matters and to whom, and what it is and is not good for).

These silences have implications beyond the teaching and assessing of FYC. Discoveries of regularity and repetition in a corpus of related texts reveal the socially ratified preferred practice of collectivities (Flowerdew, 2008, p. 115; Hyland, 2010, pp. 163-164). The documents in this corpus sta-bilize uptakes of student papers within local activity systems by constructing operating assumptions about how writing should be valued, both in assess-ment scenarios and in the language they provide for everyday interactions with students and colleagues as they work toward the values these documents construct (the A, the score of 6; the “pass” decision).

Certainly the corpus, as Figure 4 suggests, constructs as ideal the conven-tions of essayistic expository prose. This will not surprise some, given U.S. composition’s typical housing in departments of English literature, but the point is that the corpus’s lack of self-consciousness about the uses, limitations, and site specificity of these conventions may be working against writers’ abil-ity to negotiate transitions to other local genres. It is possible to imagine a student who learns to honor this construct, for example, but is disserved by applying it in a discipline or workplace in which, for example, collaborative and distributed composing practices (Spinuzzi, 2010; Winsor, 2001) or a deferred thesis (Braddock, 1974) is conventional or in a culture that prefers a tacit organizational scheme with implied transitions (Englander, 2009) or cita-tion by allusion (Canagarajah, 2002; Pennycook, 1996).

The limitations of what a single course in writing instruction can accom-plish have been well described (North, 2011); this discussion should not be understood as a call to increase the accountability or the expectations for courses like those that contributed documents to this corpus. But I do suggest that the rhetorical construction of this corpus works against the teaching and learning of academic writing as operating in a specific sociocultural context. Writers’ difficulties in adopting different writing skills and techniques have repeatedly been documented by both educational researchers (e.g., Huang, 2010; Rounsaville, Goldberg, & Bawarshi, 2008; Sperling & Freedman, 1987; Wardle, 2009; Yancey, 2010) and workplace researchers (e.g., Angouri & Harwood, 2008; Beaufort, 2007; Schryer, 1993; Smart, 2000). Such diffi-culties point to an overgeneralized and brittle theoretical construct of writing that does not easily support adaptive repurposing.

While progressive in many respects, the language patterns in this corpus do not yet consistently scaffold a durable and flexible theoretical construct of “writing” for the writers whose work they help assess. While the designers of the scales in this corpus likely feel that an overt acknowledgment of



Dryer 29

the contingency of the traits they are mandated to assess would constitute a validity threat to an “objective” appraisal of writing, writers’ experience of shifting standards and rationales for the many appraisals they have already received by the time they enter FYC has likely convinced them of the subjec-tivity of reader response already. More to the point, this contingency will simply be a fact of life in writing in the disciplines (Abasi, Akbari, & Graves, 2006; Russell & Yanez, 2003), postgraduate writing (Paré, Starke-Meyerring, & McAlpine, 2009), and workplaces (Dias, Freedman, Medway, & Paré, 1999) to come. Scales that rhetorically construct trait descriptions and perfor-mance categories in ways that are consistent with (rather than in defiance of) the fact that any reader’s appraisal is embedded in cultural and material con-texts could help furnish postsecondary writers with more robust constructs of “writing” for the contradictions and transitions to come. A longitudinal study to empirically test this conjecture on a revised version of any of the 83 docu-ments in this corpus would be complex to administer, control, and measure but is feasible and, given the stakes, overdue.

Acknowledgments

Thanks first to colleagues all over the United States who volunteered their documents for this study, and thanks to Ryan Roderick for his help locating their names and mailing addresses. Many thanks to Pat Burnes and Norbert Elliot for thoughtful feedback on earlier versions of this manuscript and to Mya Poe, Tiane Donahue, and David Russell for their interest in this project. I am obliged to Christina Haas and to two anonymous reviewers whose supportive critiques considerably strengthened the analysis.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or pub-lication of this article.

Notes

1. Thanks to Susan “George” Schorn, this document is available at http://comppile .org/site/archives.htm.

2. A list was generated by setting the Carnegie Foundation (classifications.carnegie-foundation.org) filters to public and (DRU, R2 and R1).

3. No writing program administrator refused (at least not in writing) to send docu-ments. Seven programs were working on a grading scale or scoring rubric at the time of data collection (or hoped to begin work soon); at 10 institutions, instructors




develop their own scales after training; and 12 use neither grading definitions nor scoring rubrics (although it was not always clear whether this was a philosophical position or simply a statement of fact).

4. The two exceptions are Grading Scale 31, which advises assessors that “because certain types of errors persist in L2/English Varieties writing even at an advanced level, some accommodation of L2/English Varieties features is appropriate,” and Scoring Rubric 35, which advises assessors that “papers written by ESL students may be permitted a few additional errors.”

5. For example, “excellent,” “good,” “adequate,” and “poor.” For coding purposes, all adjectival categories have been converted to numbers (e.g., an assessment rubric with four performance categories of distinctive, skillful, competent, and ineffective is a 4-point scale with three passing categories and one failing).

6. As one reviewer noted, the frequency and intensity of references to these traits in the corpus roughly reverses the order of importance routinely assigned to these traits by writing teachers.

7. This finding amplifies Poe’s (2006) close reading of 21 rubrics used in then-current statewide assessments, in which she uncovered an “implicit assumption” operating at the lowest scores that “that there is something wrong with students’ reasoning abilities” (p. 18).

8. Further research is needed to uncover whether the “ideas” that serve as synonyms for organization and structure are sufficiently differentiated from those “ideas” that readers are being directed to assess as the thesis.

9. To be sure, the corpus occasionally directs readers to reward writing that they find “unexpected,” “surprising,” or “risky,” yet these modifiers could as justifiably be applied to, say, grammar in the category of severe fail. Given what is known about the covariance of surface error with task unfamiliarity or complexity (Haswell, 1988, 1991; Lunsford & Lunsford, 2008), errors could well be an effect of an attempt to take just such “risky” or “unexpected” moves. Moreover, this rhetorical reframing of “originality” as “risk” might inadvertently offer middle- and upper-class native speakers an additional advantage in intuiting which “risks” could be safely taken.

References

Angouri, A., & Harwood, N. (2008). This is too formal for us . . . : A case study of variation in the written products of a multinational corporation. Journal of Busi-ness and Technical Communication, 22(1), 38-64.

Balester, V. (2012). How writing rubrics fail: Toward a multicultural model. In M. Poe & A. Inoue (Eds.), Race and writing assessment (pp. 63-77). New York: Lang.

Bawarshi, A. (2003). Genre and the invention of the writer. Logan: Utah State Uni-versity Press.



Dryer 31

Beaufort, A. (2007). Writing in the professions. In C. Bazerman (Ed.), Handbook of research in writing: History, society, school, individual, text (pp. 221-236). New York: Erlbaum.

Behizadeh, N., & Engelhard, G. (2011). Historical view of the influences of measure-ment and writing theories on the practice of writing assessment in the United States. Assessing Writing, 16, 189-211.

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating lan-guage structure and use. Cambridge, England: Cambridge University Press.

Bourdieu, P. (1991). Language and symbolic power. Cambridge, MA: Harvard Uni-versity Press.

Braddock, R. (1974). The frequency and placement of topic sentences in expository prose. Research in the Teaching of English, 8(3), 287-302.

Breland, H. M., & Jones, R. J. (1982). Perceptions of writing skill (CEEB Research Report No. 82-4). Princeton, NJ: ETS.

Brindley, G. (1998). Describing language development? Rating scales and SLA. In L. F. Bachman & A. D. Cohen (Eds.), Interfaces between second language acqui-sition and language testing research (pp. 112-140). Cambridge, England: Cam-bridge University Press.

Canagarajah, S. (2002). A geopolitics of academic writing. Pittsburgh, PA: University of Pittsburgh Press.

Columbini, C. B., & McBride, M. (2012). “Storming and norming”: Exploring the value of group development models in addressing conflict in communal writing assessment. Assessing Writing, 17(4), 191-207.

Connors, R. J. (1997). Composition-rhetoric: Backgrounds, theory, and pedagogy. Pittsburgh, PA: University of Pittsburgh Press.

Crowley, S. (1998). Composition in the university: Historical and polemical essays. Pittsburgh, PA: University of Pittsburgh Press.

Devitt, A. (1991). Writing genres. Carbondale: Southern Illinois University Press.Dias, P., Freedman, A., Medway, P., & Paré, A. (1999). Worlds apart: Acting and writ-

ing in academic and workplace contexts. Mahwah, NJ: Erlbaum.Diederich, P. E., French, J. W., & Carlton, S. T. (1961). Factors in judgments of writ-

ing ability (CEEB Research Bulletin No. 61-15). Princeton, NJ: ETS.Elliot, N. (2005). On a scale: A social history of writing assessment in America.

New York: Lang.Englander, K. (2009). Transformation of the identities of nonnative English speaking

scientists. Journal of Language, Identity and Education, 8, 35-53.Flowerdew, L. (2008). Corpora and context in professional writing. In V. K Bhatia,

J. Flowerdew & R. H. Jones (Eds.), Advances in discourse studies (pp. 115-127). Oxford, England: Routledge.




Follman, J., & Anderson, J. (1967). An investigation of the reliability of five pro-cedures for grading English themes. Research in the Teaching of English, 1(2), 190-200.

Freedman, A., & Pringle, I. (1980). Writing in the college years: Some indices of growth. College Composition and Communication, 31(3), 311-324.

Freedman, S. W. (1981). Influences on evaluators of expository essays: Beyond the text. Research in the Teaching of English, 15(3), 245-255.

Gere, A. (2009). Initial report on a survey of CCCC members. East Lansing: Michigan State University, Squire Office of Policy Research.

Gillaerts, P., & Van de Velde, F. (2010). Interactional metadiscourse in research article abstracts. Journal of English for Academic Purposes, 9, 128-139.

Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433-470). Westport, CT: American Council on Education.

Hamp-Lyons, L. (2011). Writing assessment: Shifting issues, new tools, enduring questions. Assessing Writing, 16(1), 3-5.

Haswell, R. (1988). Error and change in college students’ writing. Written Communi-cation, 5, 470-499.

Haswell, R. (1998). Rubrics, prototypes, and exemplars: Categorization theory and systems of writing placement. Assessing Writing, 5(2), 231-268.

Haswell, R. (1991). Gaining ground in college writing: Tales of development and interpretation. Dallas, TX: Southern Methodist University Press.

Hillegas, M. B. (1912). A scale for the measurement of quality in English composition by young people. Teachers College Record, 13(4), 1-55.

Huang, J. C. (2010). Publishing and learning writing for publication in English: Per-spectives of NNES PhD students in science. Journal of English for Academic Purposes, 9(1), 33-44.

Huot, B. (1990). The literature of direct writing assessment: Major concerns and pre-vailing trends. Review of Educational Research, 60(2), 237-263.

Hyland, K. (2010). Community and individuality: Performing identity in applied lin-guistics. Written Communication, 27 159-188.

Jeffery, J. V. (2009). Constructs of writing proficiency in US state and national writing assessments: Exploring variability. Assessing Writing, 14, 3-24.

Knoch, U. (2011). Rating scales for diagnostic assessment of writing: What should they look like and where should the criteria come from? Assessing Writing, 16, 81-96.

Lee, D. Y. W. (2008). Corpora and discourse analysis: New ways of doing old things. In V. K. Bhatia, J. Flowerdew & R. H. Jones (Eds.), Advances in discourse studies (pp. 86-99). Oxford, England: Routledge.



Dryer 33

Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Langauge Testing, 19(3), 246-276.

Lunsford, A. A., & Lunsford, K. J. (2008). Mistakes are a fact of life: A national com-parative study. College Composition and Communication, 59(4), 781-806.

Martin, J. R., & White, P. R. R. (2005). The langauge of evaluation: Appraisal in English. London: Palgrave MacMillan.

McNamara, D. S., Crossley, S., & McCarthy, P. M. (2010). Linguistic features of writ-ing quality. Written Communication, 27(1), 57-86.

Miller, C. R. (1984). Genre as social action. Quarterly Journal of Speech, 70, 151-167.Mills, C. N., & Jaeger, R. M. (1998). Creating descriptions of desired student acheiv-

ment when setting performance standards. In L. Hansche (Ed.), Handbook for the development of performance standards: Meeting the requirements of Title I (pp. 73-85). Washington, DC: Council of Chief State School Officers.

North, S. M. (2011). On the place of writing in higher education (and why it doesn’t include composition). In L. Massey & R. C. Gebhardt (Eds.), The changing of knowledge in composition: Contemporary perspectives (pp. 194-210). Logan: Utah State University Press.

Paré, A. (1993). Discourse regulations and the production of knowledge. In R. Spilka (Ed.), Writing in the workplace: New research perspectives (pp. 111-123). Carbondale: Southern Illinois University Press.

Paré, A., Starke-Meyerring, D., & McAlpine, L. (2009). The dissertation as multi-genre: Many readers, many readings. In C. Bazerman, A. Bonini & D. Figueiredo (Eds.), Genre in a changing world (pp. 179-194). Fort Collins, CO: WAC Clear-inghouse.

Peckham, I. (2010). Going north, thinking west: The intersections of social class, critical thinking, and politicized writing instruction. Logan: Utah State University Press.

Pennycook, A. (1996). Borrowing others’ words: Text, ownership, memory, and pla-giarism. TESOL Quarterly, 30(2), 201-230.

Poe, M. (2006). Race, representation, and writing assessment: Racial stereotypes and the construction of identity in writing assessments. Unpublished dissertation, Uni-versity of Massachusetts, Amherst.

Prior, P. (2006). A sociocultural theory of writing. In C. A. MacArthur, S. Graham & J. Fitzgerald (Eds.), Handbook of writing research (pp. 54-66). New York: Guil-ford Press.

Prior, P., & Shipka, J. (2003). Chronotopic lamination: Tracing the contours of literate activity. In C. Bazerman & D. R. Russell (Eds.), Writing selves/writing societ-ies: Research from activity perspectives (pp. 180-238). Fort Collins, CO: WAC Clearinghouse.




Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assesing Writing, 15(1), 18-39.

Römer, U., & Wulff, S. (2010). Applying corpus methods to written academic texts: Explorations of MICUSP. Journal of Writing Research, 2(2), 99-127.

Roozen, K. (2009). “Fan-fic-ing” English studies: A case study exploring the inter-play of vernacular literacies and disciplinary engagement. Research in the Teach-ing of English, 44(2), 136-169.

Roozen, K. (2010). Tracing trajectories of practice: Repurposing in one student’s devel-oping disciplinary writing processes. Written Communication, 27(3), 318-354.

Rothermel, B. A. (2006). Automated writing instruction: Computer-assisted or computer-driven pedagogies? In P. F. Ericsson & R. Haswell (Eds.), Machine scoring of student essays: Truth and consequences (pp. 199-210) Logan: Utah State University Press.

Rounsaville, A., Goldberg, R., & Bawarshi, A. (2008). From incomes to outcomes: FYW students’ prior genre knowledge, meta-cognition, and the question of trans-fer. WPA: Writing Program Administration, 32(1-2), 97-112.

Russell, D. R., & Yanez, A. (2003). “Big picture people rarely become historians”: Genre systems and the contradictions of general education. In C. Bazerman & D. R. Russell (Eds.), Writing selves/writing societies: Research from activity per-spectives (pp. 331-362). Fort Collins, CO: WAC Clearinghouse.

Schryer, C. (1993). Records as genre. Written Communication, 10(2), 200-234.Scott, T. (2005). Creating the subject of portfolios: Reflective writing and the convey-

ance of institutional prerogatives. Written Communication, 22(1), 3-35.Slevin, J. (2001). Engaging intellectual work: The faculty’s role in assessment. Col-

lege English, 63(3), 288-305.Smart, G. (2000). Reinventing expertise: Experienced writers in the workplace

encounter a new genre. In P. Dias & A. Paré (Eds.), Transitions: Writing in aca-demic and workplace settings (pp. 223-252). Cresskill, NJ: Hampton Press.

Sperling, M., & Freedman, S. W. (1987). A good girl writes like a good girl: Written response to student writing. Written Communication, 4(4), 343-369.

Spinuzzi, C. (2010). Secret sauce and snake oil: Writing monthly reports in a highly contingent environment. Written Communication, 27(4), 363-409.

Thorndike, E. L. (1911). A scale for merit in English writing by young people. Journal of Educational Psychology, 2, 361-368.

Turley, E., & Gallagher, C. (2008). On the uses of rubrics: Reframing the great rubric debate. English Journal, 97(4), 87-92.

Turner, S. (2002). Texts and the institutions of municipal government: The power of texts in the public process of land development. Studies in Cultures, Organiza-tions, and Societies, 7, 297-325.



Dryer 35

Wardle, E. (2009). “Mutt genres” and the goal of FYC: Can we help students write the genres of the university? College Composition and Communication, 60(4), 765-789.

Winsor, D. (2001). Learning to do knowledge work in systems of distributed cogni-tion. Journal of Business and Technical Communication, 15(1), 5-28.

Yancey, K. B. (2010). Responding forward. In P. Sullivan, H. Tinberg & S. Blau (Eds.), What is “college-level” writing? Volume 2: Assignments, readings and student writing samples (pp. 300-11). Urbana, IL: National Council of Teachers of English.

Bio

Dylan B. Dryer is assistant professor of composition studies at the University of Maine. His interests include the textual organization of civic institutions and con-struct validity in writing assessment. He is currently working on the problem of consequential validity in the teaching of writing teachers.