Understanding Intermediate-Level Speakers’ Strengths and

Understanding Intermediate-Level Speakers’ Strengths andWeaknesses: An Examination ofOPIc Tests From KoreanLearners of EnglishTroy L. CoxBrigham Young University

Abstract: This study profiled Intermediate-level learners in terms of their linguisticcharacteristics and performance on different proficiency tasks. A stratified randomsample of 300 Korean learners of English with holistic ratings of Intermediate Low(IL), Intermediate Mid (IM), and Intermediate High (IH) on Oral Proficiency Inter-views-computerized (OPIcs)—100 at each level—were analyzed by trained ACTFLraters to determine what was needed for the learners to progress to the next highersublevel. The findings indicate that while ILs minimally met all the linguistic character-istics required of the Intermediate level, they needed to improve in the quantity andquality of all the linguistic characteristics they employed and improve their mastery ofthe types and variety of questions they could use when performing Intermediate tasks tomove to the IM sublevel. In contrast, IMs demonstrated a pattern of strength whencompleting Intermediate tasks, but to move to the IH sublevel they needed to improvetheir ability to perform all Advanced-level tasks, especially in terms of accuracy whenusing paragraph-length discourse. Similar to the IMs, for the IHs to move to theAdvanced Low sublevel, they needed to improve their accuracy with paragraph-lengthdiscourse and expand their content mastery to beyond the autobiographical.

Key words: English as a foreign/second language, oral proficiency

IntroductionIn a recent audit of oral proficiency test results from a large university that the authorconducted, it was discovered that a single student had taken either the OralProficiency Interview (OPI) or the Oral Proficiency Interview-computerized

Troy L. Cox (PhD, Brigham Young University) is Associate Director of Researchand Assessment, Center for Language Studies, Brigham Young University, Provo,Utah.Foreign Language Annals, Vol. 50, Iss. 1, pp. 84–113. © 2017 by American Council on the Teaching of ForeignLanguages.DOI: 10.1111/flan.12258

84 SPRING 2017

(OPIc) nine times over a 3-year period.Further examination revealed that this stu-dent, a language teaching minor whoneeded a rating of Advanced Low for in-structor licensure, was languishing at theIntermediate level. After an initial OPI scoreof Intermediate High (IH) in 2013, the nextfour tests resulted in ratings of IntermediateMid (IM), while the final four tests wererated IH. Reaching the Advanced levelis critical for those pursuing teachinglicensure (Brooks & Darhower, 2014;Chambless, 2012), and this student’s lackof progression toward higher proficiency onthe ACTFL scale represented a real-worldexample of the importance of understand-ing the characteristics of speech and thetypes of tasks that are required to progressthrough the three sublevels that constitutethe Intermediate level and move into theAdvanced range. However, since the ratingis holistic, information on the specific as-pects of a test taker’s performance that pre-vent that person from being rated at the nextadjacent level is not documented, nor is itprovided in the final rating. Thus, there canbe a disconnect between what the examin-ees see as their rating, the information thatinstructors provide to students about theassessment and the rating system, andwhat raters are attending to when assigningratings. The purpose of this study was toexamine information that is not tradition-ally available to either test takers or instruc-tors so as to provide more detailedinformation about the specific profiles ofspeakers who received the same proficiencyrating within the Intermediate range anddetermine how a test taker’s skills alongfour linear axes (function, text type, con-tent, and accuracy) contributed to theirfinal, global rating.

BackgroundThe ACTFL defines proficiency as the “abil-ity to use a language to communicate mean-ingful information in a spontaneousinteraction, and in a manner acceptableand appropriate to native speakers of the

language” (ACTFL, 2012d, p. 4). The profi-ciency guidelines (ACTFL, 2012c) havelong been represented as an inverted pyra-mid, which illustrates that language learn-ing is not linear but rather that theprogression from one level to the next canbe best represented as a pattern of geometricgrowth. When envisioning the inverse pyr-amid, the geometric area in the Novice andIntermediate tiers is much smaller than thatof the higher levels. However, the skills thatare acquired at those levels form the struc-tural foundation upon which the higherlevels are built. For example, while the abil-ity to narrate in the past is a critical charac-teristic of Advanced-level communication,language learners usually first learn to re-port events that have taken place in stringsof sentences using the simple past. How-ever, the learner who does not developthe ability to use paragraph-length dis-course will not be able to progress beyondthe Intermediate level (ACTFL, 2012a).Communicative habits that seem to appro-priately convey meaning but that are notcorrected and extended become ingrainedand thus impede progress into and beyondthe Advanced level. These fossilized errorsin essence become faulty girders and beamsthat are incapable of supporting the increas-ing communicative weight when learnersare required to carry out more sophisticatedfunctions and address more robust and var-ied content. Thus, understanding the devel-opmental stages through which learnersprogress is vital in assisting students in theirlanguage-learning journey, both within aparticular level but also from one levelinto the next.

Although the ACTFL guidelines wereintroduced in 1982 (Liskin-Gasparro,2003) and the ACTFL recently certifiedthe 1,000th OPI tester worldwide (ACTFL,2016), it is quite likely that many foreignlanguage educators may still be unclearabout how exactly to use them to improvestudent learning outcomes. While there aremore than 4,000 institutions of higher edu-cation and more than 35,000 high schools(U.S. Department of Education, 2016a,

Foreign Language Annals � VOL. 50, NO. 1 85

2016b) across the United States, only a frac-tion of the secondary and postsecondaryinstitutions (1 in every 390) has certifiedpersonnel to assist in assessment and leadproficiency-oriented curricular revisions.Even though some institutions have insti-tuted curriculum-wide training in profi-ciency assessment (Brooks & Darhower,2014; Gouoni & Feyten, 1999), many for-eign language educators must rely on writ-ten descriptions of the scale with littleunderstanding of how the descriptors relateto actual language production. The result isthat a huge segment of the foreign languageeducation community is left with an under-standing of the guidelines that is cursory atbest or reductive to certain grammaticalforms at worst.

In simple terms, each major proficiencylevel (Novice, Intermediate, Advanced,Superior, and Distinguished) is defined asa confluence of four domains: function, texttype, content, and accuracy. These featuresare defined in more detail in Table 1.

While learners may progress in a linearway on each of these characteristics, con-joint mastery of multiple linear character-istics is necessary for movement throughonemajor level and into the next. Obtaininga rating at the next higher level only occursthrough sustained performance of the lowerlevels (ACTFL, 2012a; Clifford, 2016). Be-fore a rating can be awarded, the speakermust demonstrate a sustained level or“floor” of performance across tasks, texttype, content, and accuracy (see Table 2)as well as a breakdown level or “ceiling” inwhich the examinee can no longer sustainperformance in one or more of the fourdomains (ACTFL, 2012a). For examineesin the Intermediate range, the floor is theability to create with language in sentence-length utterances that demonstrate controlover the content that is needed in daily life;the ceiling is the ability to use paragraph-length discourse to narrate and describetopics of personal and community interestin all major time frames. While an Interme-diate-level speaker may exhibit some char-acteristics of Advanced-level proficiency in

certain topic domains or with certain lin-guistic features, he or she is unable to sus-tain this level of performance across therequisite range of topics, tasks, or linguisticfeatures with the required level of accuracyand thus does not demonstrate Advanced-level ability.

As shown in Table 2, the fundamentaldifference between speech that is rated atany one of the three Intermediate sublevels(Low, Mid, High) lies in the quality andquantity of the examinee’s language whenengaged in at-level tasks (Clifford, 2016).The Low sublevel is indicative of a speakerwho just barely demonstrates competencewhen performing the tasks for the majorlevel. Meanwhile, a rating at the Mid sub-level indicates that the speaker fulfills all therequirements of the major level with suffi-cient quantity and quality of languageacross the assessment criteria. There is nodoubt that the examinee can perform thefunctions of that major level; indeed, theresponse is much more substantial thanthat of a speaker at the Low sublevel. TheHigh sublevel rating indicates that thespeaker demonstrates a robust ability tomeet the criteria for the proficiency levelin question and that he or she also attemptsand executes with success some of the tasksand can often—but not always—meet therelated expectations for text type, context,and level of accuracy that are required at thenext higher (adjacent) major level—in thiscase, Advanced. Thus, a rating of IH indi-cates that the speaker exhibits Advanced-level performance most, but not all of thetime, by either exhibiting all the traits of theAdvanced level in certain topic domains andnot others, or by exhibiting Advanced fea-tures such as text type and fluency but notothers such as pronunciation or grammati-cal accuracy.

While a number of studies have lookedat the validity of the OPI and the use of itsscale in oral proficiency testing (Dandonoli& Henning, 1990; Halleck, 1996; Surface &Dierdorff, 2003; Thompson, 1995, 1996),little empirical research has specificallysought to document examinees’ strengths

86 SPRING 2017

and weaknesses at each sublevel within amajor level, primarily because the single ho-listic rating of the speech sample as a wholeresults in a lack of transparency about ex-actly what such a ratingmeans and onwhichdimensions a test taker showed strength orweakness. Thus, while the small percentageof instructors who have received formal OPItraining can intuit the reason their studentsmay have received a particular score, thelarge number of instructors who have lessfamiliarity with the scale may:

� fail to understand the conjunctive natureof a proficiency rating (Clifford, 2016),

� overestimate their own students’ abilities(Levine & Haus, 1987), or

� confound the performance of rehearsedmaterial with proficiency (Cox, Bown, &Burdis, 2015).

In an attempt to help instructors under-stand differences in levels, Liskin-Gasparro(1996) analyzed the communication strate-gies of IH and Advanced Low (AL) Spanishspeakers and found that the AL speakersused a broader range of communicativestrategies; however, she did not analyzeother aspects of the interview samples anddid not look at the differences among theIntermediate sublevels. Apart from thisstudy, most of the research has focused onlearners’ expected proficiency outcomes atparticular points in their program of study(Carroll, 1967; Chambless, 2012; Glisan &Foltz, 1998; Gouoni & Feyten, 1999) or hascompared learners’ results on the two alter-nate forms of the assessment, the OPIc andthe OPI (Surface, Poncheri, & Bhavsar,2008; SWA Consulting, 2009; Thompson,Cox, & Knapp, 2016). In contrast, this

TABLE 1

Speech Characteristics Analyzed by Area of Focus

Area of Focus Characteristicsof Speech

Description

Function Focus on topic/task

The degree to which the examineecompleted the task presented as definedby the major level

Text type Text length The extent to which the amount of languagecompleted the function of the task (wordsand phrases, sentences, strings ofsentences, or connected paragraphs)

Discourseorganization

The extent to which the text was organizedappropriately and the use of appropriatecohesive markers to organize speech

Content Vocabulary use The quantity and quality of lexicon neededto accomplish the task appropriately

Accuracy/

comprehensibilityexpectations

Fluency The extent to which the rate of speech,length of runs, pauses, and other timingfeatures affected the comprehensibility ofthe message for a native listener

Pronunciation The extent to which individual words andphrases were articulated in a way that wascomprehensible to the listener

Grammatical/structuralaccuracy

The degree of control of the grammar/syntaxneeded to accomplish the task in a waythat was comprehensible to the listener


study sought to determine how the scale isoperationalized. That is, the purpose of thestudy was to look into the black box, so tospeak, of the Intermediate level to find em-pirical data and identify the patterns of thelinguistic strengths and weaknesses of IL,IM, and IH speakers when they carried outdifferent types of tasks. The study addressedthe following questions:

1. What are the most common linguisticfeatures of speakers at each Intermediatesublevel (IL, IM, IH)? Which character-istics prevent speakers from being ratedat the next higher sublevel?

2. Howwell do speakers at each Intermediatesublevel (IL, IM, IH) perform on differenttask types that operationalize the criteria ofIntermediate and Advanced proficiency?Which task types prevent speakers frombeing rated at the next higher sublevel?

MethodTo answer the research questions, the authoranalyzed data from a research report thatCREDU (a subsidiary of Samsung) hadcommissioned the ACTFL to study and then

wrote a technical report (Cox, 2015).Thedatafrom that report form the basis for the currentarticle. To determine commonly manifestedIntermediate-level speech characteristics anddiscover what prevents a test taker frombeingrated at the next higher adjacent level, experi-enced ACTFL raters were recruited to analyzeexisting assessment data from an OPIc.

RatersNine raters selected from the ACTFL’s cer-tified OPIc rater pool were recruited: sevenwere certified ACTFL OPI testers, six werecertified ACTFL OPI trainers, three werepart of the original OPIc developmentteam, and eight were members of theOPIc Quality Assurance team. Thestrength of using trained raters guaranteedthat the feedback provided was from ex-perts who know the scale intimately. Al-though the approach is susceptible to thecriticism of confirmation bias (raters’ feed-back could possibly have been based onthe language in the descriptors rather thanon unbiased observation), this susceptibil-ity was deemed acceptable (1) due to thelack of research into what trained ratersthink as they rate speech samples, and (2)

TABLE 2

Floor and Ceiling Performance of Intermediate Speakers

Level Floor (or IntermediateCriteria)

Ceiling (or Advanced Criteria)

Function Create with languageParticipate in simple

conversationsAsk and answer questions

Narrate and describe in major timeframes (past, present, and future)

Linguistically negotiate situations withcomplications

TextType

Sentences Paragraphs

Content SelfDaily life

SelfDaily lifeNonautobiographical topicsTopics of general interest

Accuracy Understood by peopleaccustomed to speaking tononnative speakers

Can be understood without confusionby monolinguals not accustomed tospeaking to nonnative speakers

88 SPRING 2017

because only highly experienced raters canprovide the necessary analysis of the inter-nal mechanisms that result in ratingsacross the Intermediate range. While raterstypically score OPIcs holistically (ACTFL,2012b), for this study, the raters scored thetests analytically by examining specific lin-guistics features and tasks, analyzing indepth the difficulty of the types of tasksassociated with the functions of the Inter-mediate and Advanced levels, and deter-mining the extent to which differentspeech features were present in thoselevel-specific tasks at each of the two ma-jor levels (Intermediate and Advanced). Togather qualitative data, raters also had theopportunity to share comments on thespecific tasks and on the speech samplesthat they rated.

Examinee DataTo control for the variance between nativeand target language learning, the study waslimited to Korean-speaking adults whowere learning English, who were takingthe English OPIc, and who were at differ-ent OPI levels. All exams were chosenfrom the existing pool of OPIc assessmentstaken by Korean test takers. To meet theselection criteria, each assessment had tohave been previously double or triple-rated by raters who had been in exactagreement on the sublevel awarded (e.g.,all raters independently rated an examineeas IM). From the list of double- and triple-rated exams, stratified random samplingwas used to select 100 exams at each sub-level (Low, Mid, and High) for a total of300 exams.

DesignA connected design was used, in which allthe raters analyzed a subset of examineesand tasks from existing OPIcs as a way toverify that all the raters were applying thesame criteria. Just a single rater then ana-lyzed the subsequent examinees/tasks toallow for the broadest possible survey ofexaminee response types.

To answer the first research question,the approach differed for each of the threesublevels (IL, IM, and IH).

� For IL speakers, five Intermediate tasksand one Advanced task were analyzed.The objective of examining more Inter-mediate tasks at this level was to deter-mine in which domains the ILs needed toimprove both the quantity and quality oftheir responses and on which functionstest takers needed to improve in order toreach IM.

� For IM speakers, four Intermediate tasksand three Advanced tasks were analyzed.The Intermediate tasks provided a basisof comparison between the IL’s thresholdperformance and the IM’s strong perfor-mance at the Intermediate level. The Ad-vanced tasks provided direct informationon what the examinees needed to do tomove up the scale to the IH rating andbeyond. It is important to note that onemoves from IM to IH primarily by focus-ing on and improving the ability withAdvanced-level tasks, although improv-ing performance in Intermediate-leveltasks happens as well.

� For IH speakers, six Advanced tasks wereanalyzed. A High rating indicates evi-dence of performance of all Advancedtask typesmost of the time yet an inabilityto sustain that performance. Therefore, todetermine the linguistic features of IH,the most useful information wouldcome from an analysis of learners’ perfor-mance with Advanced-level tasks.

To answer the second research ques-tion, a subset of task types was selectedfor detailed analysis from among the 15items on each form of the OPIc (NoviceHigh to IM or IM to Advanced). This servedto reduce the amount of time that wasneeded to analyze the profile of any individ-ual examinee, thus resulting in a broadersampling of different examinees fromwhichgeneralizations could be drawn. Since eachform was individually tailored to the exam-inee, it was not possible to analyze items at


the question level; however, since eachquestion represented a specific task type(see Table 3), the speech samples thatwere represented in all of the interviewswere fundamentally equivalent.

ProcedureThe raters were each assigned 30 examin-ees, including 10 examinees who had beenrated at each sublevel (IL, IM, and IH).Raters used a rubric, shown in Figure 1.For each examinee’s speech sample, raterswere asked to analyze six or seven individ-ual tasks and were instructed to listen toeach task twice. When listening the firsttime, they were to assess the response ho-listically on a five-point scale ranging from“does not meet expectations”—1 (e.g., totalbreakdown, i.e., the examinee could notproduce any language at the intended level)to “exceeds expectations”—5 (e.g., pro-duced language far above the intendedtask difficulty level). When listening forthe second time, the raters were asked toidentify any characteristics that would helpexplain their holistic assessment of the re-sponse. For example, a task that received aglobal assessment of “almost meets expec-tations”might include weakness in pronun-ciation, strength in vocabulary use, and

moderate ability with the other skills. Asecond rubric was adapted from one thathad been previously employed in a heritagelanguage study (Swender, Martin, Rivera-Martinez, & Kagan, 2014) with a commentfield so that raters could note issues (e.g.,technical difficulties, other speech charac-teristics, etc.) that were not easily addressedwith the five-point scale and offer any com-ments that would add more information tothe rating they had awarded.

Data AnalysisThis study was guided by two primary re-search questions. The first investigated themost common linguistic features of exam-inees at the Intermediate level by task level(Intermediate or Advanced). The secondinvestigated the way in which task type(see Table 3) affected examinee perfor-mance at the Intermediate level by task level(Intermediate or Advanced). These ques-tions were answered by looking at themean overall rating (see Figure 1) by tasklevel. For the first question, themeans of thedifferent linguistic features (e.g., fluency,pronunciation, etc.; see Table 1) were com-pared and 95% confidence intervals (95%CI) were calculated and graphed to deter-mine how the features differed. For the

TABLE 3

Descriptions of Task Type Analyzed by Intermediate Sublevel

Task Level Task Type Description IL IM IH

Intermediate Talk about thing or place 2 1Intermediate Talk about activity or routine 1 1Intermediate Ask questions 1 1Intermediate Intermediate role-play 1 1Advanced Past description 1 1 1Advanced Past narration 1 1Advanced Advanced role-play (situation with a complication) 1 1Advanced Role-play follow-up—Narration/description 1Advanced Narration/description—context beyond personal 1Advanced Report current event 1

Total Tasks Analyzed 6 7 6

90 SPRING 2017

second question, themeans of the task types(“talk about thing or place,” “talk aboutactivity or routine,” etc.; see Table 3) werecompared with 95% CIs calculated andgraphed as well. The 95% CI is an estimateof population parameter and is generally

represented by an error bar (I) with eithera dot or line in the middle indicating thepopulation mean. Examining the length ofthe error bars and the overlap among vari-ables of interest provided a visual represen-tation of the differences among the variables

FIGURE 1

Holistic Assessment Grid Example


92 SPRING 2017

and their effect sizes. Where there was nooverlap between error bars, the means werestatistically different from one another.Where there was total overlap, there mightnot be any difference between the variables.

Findings

IL SpeakersBy definition, IL speakers at minimum canaccomplish Intermediate-level functionsbut are not expected to successfully performthe functions that are required to completeAdvanced-level tasks. In the analysis of thesamples rated IL, a rating of 3 was indicativeof minimally meeting the requirements. For

the five Intermediate-level tasks, the meanrating of overall task performance was 2.78(see Table 4).

Linguistic FeaturesIn examining the linguistic features thatcontributed to overall task performanceon the Intermediate tasks, raters foundthat two of the features did not meet therequired threshold (a score of 3): “fluency”and “focus on topic and task.” For the oneAdvanced-level task, the overall mean was1.17—all of the linguistic features werescored between the criteria “does notmeet” (a score of 1) and “almost meets”(a score of 2). A MANOVA showed thatall of the linguistic features of the


Intermediate-level tasks were statisticallydifferent from those of the Advanced-leveltasks, F(1, 622)¼ 26.86, p< 0.0001; Wilk’sL¼ 0.79.

Figure 2 presents the means as well asthe 95% CIs as represented by error bars(I). When the speakers were performingIntermediate-level tasks, performancesacross the seven categories all clusteredaround the “minimally meets” thresholdof 3. With a difference of 0.32 betweenthe highest (pronunciation) and lowest(“focus on topic and task”) categories,the profile was relatively even. With theAdvanced-level task, scores in only onedomain (fluency) exceeded the “almostmeets” threshold (a score of 2). With adifference of 0.89 between the highest-rated domain (“fluency”) and the lowest(“focus on task and topic”), the profile wasmore disparate.

This indicates that the profile of an ILspeaker was one in which the ratings on thedifferent categories clustered around the“minimally meets” threshold for Intermedi-ate-level tasks. With the Advanced-leveltask, the developmental profile was lessequal. “Focus on topic and task” and “pro-nunciation” were the strongest areas andwere statistically equivalent. The weakestareas were “grammatical/structural,” “texttype (length),” and “discourse organiza-tion,” with “fluency” and “vocabulary use”just slightly higher than the other three.

Task TypesTo determine the effect of the task type onthe overall performance and holistic finalrating, the mean of examinees’ overall per-formance was examined by task type (seeTable 5). None of the Intermediate-leveltasks exceeded the “minimally meets”

TABLE 4

Speech Criteria Rating on IL Sample

Intermediate Tasks (n¼ 5) Advanced Tasks (n¼ 1)

N Mean SD 95% CI N Mean SD 95% CI

Overall 544 2.78 0.78 [2.73, 2.83] 107 1.17 0.48 [1.08, 1.26]Function—Focus on topic

and task540 2.87 1.03 [2.78, 2.96] 107 1.23 0.45 [1.15, 1.31]

Text Type—Length 541 3.06 0.72 [3.00, 3.12] 107 1.34 0.57 [1.23, 1.45]—Discourse

organization535 3.03 0.73 [2.97, 3.09] 107 1.69 0.85 [1.53, 1.85]

Content—Vocabulary use 539 3.12 0.65 [3.07, 3.17] 105 1.50 0.72 [1.36, 1.64]Accuracy—Fluency 541 2.97 0.65 [2.92, 3.02] 105 2.12 1.09 [1.91, 2.33]—Pronunciation 539 3.19 0.63 [3.14, 3.24] 107 1.30 0.57 [1.19, 1.41]—Grammatical/

structural541 3.02 0.71 [2.96, 3.08] 107 1.97 1.13 [1.76, 2.18]

Note: Please note that in some instances raters provided an overall rating but when therewas evidence of memorized material they did not rate the individual linguistic features.This is further discussed in the qualitative section of this article.

94 SPRING 2017

requirement (a score of 3). Among the In-termediate-level tasks, “intermediate role-play” had the lowest mean (mean¼ 2.62,SD¼ 0.90), and “talk about activity or rou-tine” had the highest mean (mean¼ 2.99,SD¼ 0.86). The Advanced-level task, “past

description,” scored substantially lowerthan the Intermediate-level tasks (mean¼ 1.17, SD¼ 0.45).

In Figure 3, the means as well as the 95%CIs as represented by error bars (I) showedsome interesting trends.With the Intermediate

FIGURE 2

Holistic Assessment of IL Linguistic Characteristics by Intermediate and

Advanced Task Level

TABLE 5

Overall Mean of IL Speakers on Task Type

Task Type Task Level N Mean SD 95% CI

Intermediate role-play Intermediate 107 2.62 0.90 [2.44, 2.80]Talk about thing or place

(two prompts)Intermediate 225 2.88 0.72 [2.79, 2.97]

Talk about activity or routine Intermediate 106 2.99 0.86 [2.83, 3.15]Ask questions Intermediate 106 2.80 0.80 [2.64, 2.96]Past description Advanced 107 1.17 0.45 [1.09, 1.25]


level, the performances across the tasks werenot significantly different from one another asdemonstrated by the error bars; however, “in-termediate role-play” and “ask questions” didappear to bemore difficult than the other threeIntermediate-level task types.

There were approximately 286 com-ments on the overall performance of theIL speakers. Many of the comments con-firmed what was observed with the quanti-tative analysis (improving fluency,accuracy, etc.); however, one trend thatemerged was the role that rehearsed mate-rial or canned/memorized responses had onraters’ ability to assess the sample of speechin a valid way. In nine instances, raters gavea holistic rating of “does not meet” but thenused the comments field to note why theydid not provide numerical ratings for someof the other linguistic features.

Approximately 20% of the rater com-ments noted that the responses to these

specific tasks sounded scripted or rehearsed.This could be an artifact of analyzing Koreanexaminees,wherememorization is often em-ployed as a test preparation strategy. Whilethis may be effective for success on tests ofcontent, sheer memorization and then reci-tation of such responses is a feature of theNovice level of oral proficiency and thereforewould not result in sufficient language for anexaminee to be rated at a level higher thanNovice. Official ACTFL rating protocols re-quire the raters to listen to the entire speechsample and not just individual tasks, as wasthe case in this study (ACTFL, 2012a); how-ever, when a single task type is learned andrehearsed, it does not provide evidence of anexaminee’s spontaneous, productive speech.

IM SpeakersBy definition, IM speakers fully meet therequirements needed to accomplish the

FIGURE 3

Holistic Assessment of IL Speakers on Intermediate and Advanced Task

Type—Qualitative Comments

96 SPRING 2017

functions that are required by the Interme-diate-level tasks but are not able to suc-cessfully sustain the functions that arerequired at the Advanced level. This wasfound to be the case for the IM speakers.For the four Intermediate-level tasks, themean rating for overall performance was3.59, an indication that test takers ex-ceeded the “minimally meets” thresholdand were approaching the “fully meets”level (see Table 6).

Linguistic FeaturesIn examining the linguistic features thatcontributed to the overall task performancescores on the Intermediate-level tasks,raters found that all of the features exceededthe “minimally meets” threshold (a score of3), though none exceeded the “fully meets”threshold (a score of 4). For the three Ad-vanced-level tasks, the overall mean was1.57 and with three of the linguistic features

(“focus on topic and task,” “vocabularyuse,” and “pronunciation”) exceeding the“almost meets” criterion of 2.

Figure 4 presents the means as well asthe 95% CIs as represented by error bars (I).When the speakers were performing theIntermediate-level tasks, their performanceacross the seven categories exceeded the“minimally meets” threshold of 3 but didnot reach the “fully meets” threshold of 4.With a difference of just 0.22 between themeans for the highest (“vocabulary use”)and lowest (“fluency”) characteristics, theprofile was relatively even. With the Ad-vanced-level tasks, none of the categoriesmet the “minimally meets” threshold andwith a difference of 0.70 between the meansfor the highest (“focus on topic and task”)and lowest (“grammatical/structural”) do-mains, the profile was more disparate.

This indicates that the profile of an IMspeaker was one in which the different

TABLE 6

Holistic Rating of IM Speakers on Task by Speech Characteristic

Intermediate Tasks (n¼ 4) Advanced Tasks (n¼ 3)

N Mean SD 95% CI N Mean SD 95% CI

Overall 440 3.59 0.73 [3.52, 3.65] 334 1.57 0.65 [1.50, 1.64]Function—Focus on topic

and task421 3.60 0.82 [3.52, 3.68] 347 2.39 1.07 [2.28, 2.50]

Text Type—Length 422 3.71 0.57 [3.66, 3.76] 345 1.73 0.74 [1.65, 1.81]—Discourse

organization422 3.66 0.61 [3.60, 3.72] 347 1.80 0.8 [1.72, 1.88]

Content—Vocabulary use 420 3.74 0.53 [3.69, 3.79] 343 2.25 0.9 [2.15, 2.35]Accuracy—Fluency 421 3.52 0.60 [3.46, 3.58] 345 1.97 0.89 [1.88, 2.06]—Pronunciation 421 3.68 0.55 [3.63, 3.73] 346 2.37 1.05 [2.26, 2.48]—Grammatical/

structural422 3.59 0.62 [3.53, 3.65] 345 1.69 0.75 [1.61, 1.77]

Note: Please note that in some instances, raters provided an overall rating but when therewas evidence of memorized material, they did not rate the individual linguistic features.This will be further discussed in the qualitative section of this paper.


categories easily exceeded the “minimallymeets” threshold for Intermediate-leveltasks. With the Advanced-level tasks, thedevelopmental profile across domains wasless equal. “Focus on topic and task,” “pro-nunciation,” and “vocabulary use” were thestrongest areas and were statistically equiv-alent. The weakest areas were “grammati-cal/structural,” “length,” and “discourseorganization.”

Task TypesTo determine the effect of the task type onperformance, the mean of overall perfor-mance was examined by task type (seeTable 7) Among the Intermediate-leveltasks, “talk about activity or routine” hadthe lowest mean (mean¼ 3.35, SD¼ 0.85),and “asking questions” had the highest

mean (mean¼ 3.61, SD¼ 0.71). With theAdvanced-level tasks, all were scored belowthe “minimally meets” requirement level,with the lowest mean being “past descrip-tion” (mean¼ 1.49, SD¼ 0.71) and thehighest being “advanced role-play” (mean¼ 2.46, SD¼ 0.68).

In Figure 5, the means as well as the95% CIs as represented by error bars (I)showed some interesting trends. With theIntermediate-level tasks, I across the differ-ent tasks overlapped, indicating that theperformances were not significantlydifferent. With the Advanced-level tasks,however, test takers’ scores on “advancedrole-play” were significantly higher thantheir scores on the other two Advanced-level tasks. This could be due to the factthat resolving situations with complications

FIGURE 4

Holistic Assessment of IM Linguistic Characteristics by Intermediate and

Advanced Task Level

98 SPRING 2017

at the Advanced level can often beaccomplished without paragraph-lengthdiscourse.

Qualitative Analysis of IM SpeakersThere were approximately 216 commentson the overall performance of the IMs.While many of the comments confirmed

what was observed with the quantitativeanalysis—test takers provided both goodquantity and quality of speech when com-pleting the Intermediate-level tasks—forAdvanced-level tasks, there was still aneed for improvement in all areas (e.g., inaccuracy, text type, and discourse organiza-tion). One trend that also emerged with the

TABLE 7

Overall Mean of IM Speakers on Task Type

Task Type Task Level N Mean SD 95% CI

Intermediate role-play Intermediate 111 3.60 0.72 [3.46, 3.74]Talk about activity or routine Intermediate 112 3.35 0.85 [3.19, 3.51]Talk about thing or place Intermediate 111 3.52 0.76 [3.38, 3.66]Ask questions Intermediate 111 3.61 0.71 [3.47, 3.75]Past description Advanced 111 1.49 0.62 [1.37, 1.61]Past narration Advanced 143 1.53 0.61 [1.43, 1.63]Advanced role-play Advanced 105 2.46 0.68 [2.32, 2.60]

FIGURE 5

Holistic Ratings of IM Speakers on Intermediate and Advanced Task Types


IM speakers was the role that “rehearsedmaterial” or “canned/memorized re-sponses” played. With the IL speakers, ap-proximately 20% of the comments indicatedthat test takers’ responses to these specifictasks sounded rehearsed; however, with theIM speakers the rate was much lower—only12% (or 26 total responses) were consideredby raters to constitute instances of rehearsedmaterial. As noted in the IL discussion,while memorization may be an effectivestrategy for tests of content, sheer memori-zation and then recitation of such responsesis a feature of the Novice level.

IH SpeakersBy definition, IH speakers fully meet therequirements that are needed to accomplishthe functions that are assessed by Interme-diate-level tasks (research question 1) andare able to successfully meet the functionsand other criteria of the Advanced levelmost of the time (research question 2). Be-cause the High sublevel is primarily definedin terms of performance at the next highermajor level, only Advanced-level tasks wereanalyzed. While it would have been

interesting to analyze performance onsome of the Intermediate-level tasks as apoint of comparison, that was beyond thescope of this study. For the six Advanced-level tasks, the mean for raters’ performancescore for the overall task was 2.13 (seeTable 8).

Linguistic FeaturesAs noted earlier, a holistic assessment of 3was indicative of minimally meeting therequirements. As shown in Table 8, noneof the IHs exceeded that minimum in any ofthe seven categories, with the lowest scorefor “length” (mean¼ 2.34, SD¼ 0.69) andthe highest for “focus on topic and task”(mean¼ 2.85, SD¼ 0.92).

Figure 6 presents the means as well asthe 95% CIs as represented by error bars(I). When these speakers were performingthe Advanced-level tasks, their perfor-mance across the seven categories all clus-tered between the thresholds of 2 and 3.With a difference of 0.51 between themeans for the highest domain (“focus ontopic and task”) and the lowest (“length”),their profile was relatively even, indicatingthat the profile of an IH develops evenly

TABLE 8

Holistic Rating of IH Speakers on Task by Speech Characteristic

Advanced Tasks (n¼ 3)

N Mean SD 95% CI

Overall 622 2.13 0.65 [2.08, 2.18]Function—Focus on topic and task 594 2.85 0.92 [2.78, 2.92]Text Type—Length 592 2.34 0.70 [2.28, 2.40]—Discourse organization 594 2.38 0.71 [2.32, 2.44]Content—Vocabulary use 588 2.64 0.68 [2.59, 2.69]Accuracy—Fluency 595 2.46 0.77 [2.40, 2.52]—Pronunciation 594 2.76 0.83 [2.69, 2.83]—Grammatical/structural 592 2.38 0.71 [2.32, 2.44]

100 SPRING 2017

across the required set of expectations butdoes not yet meet expectations at the nextlevel. “Focus on topic and task” and “pro-nunciation” were the strongest areas andwere statistically equivalent. The weakestareas were “grammatical/structural,” “texttype (length)” and “discourse organiza-tion,” with “fluency” and “vocabularyuse” just slightly higher than the otherthree.

Task TypesTo determine the effect of the task type onperformance, themeanof the raters’ scores ofoverall performance was examined by tasktype (see Table 9). Scores for all of the Ad-vanced-level tasks fell below the “minimallymeets” requirement level, with the lowestmean for “current event” (mean¼ 1.90, SD¼ 0.66) and the highest for “advanced role-play” (mean¼ 2.46, SD¼ 0.68).

In Figure 7, the means as well as the95% CIs as represented by error bars (I)show some interesting trends. With theAdvanced-level tasks, “advanced role-play” and “past narration” had the highestscores, indicating that these were the easiesttasks for the examinees. The next easiestwere “past description” and “role-play fol-low-up.” The most difficult were “narra-tion,” “description beyond the personal,”and “current events.”

Qualitative Analysis of IH SpeakersThere were approximately 41 comments onthe overall performance of the IH speakers.Whilemanyof the comments confirmedwhathas been learned from the quantitativeanalysis, the comments indicated that forAdvanced-level tasks, there was still a needfor improvement in all areas (e.g., improvingaccuracy, text type, and discourse

FIGURE 6

Holistic Ratings of IH Speakers on Advanced-Level Tasks


organization). One point to note on task typeis that as with the IM speakers, the ratersfound that “advanced role-play” was moreeasily performed successfully than the otherAdvanced-level tasks, probably because it of-ten does not require paragraph-level speech.

This observation was supported by the quan-titative analysis as well.

DiscussionThe purpose of this research project was toprovide empirical data on the profiles of

TABLE 9

Overall Mean of IH Speakers on Task Type

Task Type TaskLevel

N Mean SD 95% CI

Past description Advanced 108 2.11 0.56 [1.93, 2.29]Past narration Advanced 73 2.36 0.63 [2.22, 2.50]Advanced role-play Advanced 105 2.46 0.68 [2.30, 2.62]Role-play follow-up Advanced 106 2.10 0.58 [1.96, 2.24]Narration/description beyond

personalAdvanced 105 2.01 0.64 [1.85, 2.17]

Current event Advanced 105 1.90 0.66 [1.82, 1.98]

FIGURE 7

Holistic Ratings of IH Speakers on Advanced Task Types

102 SPRING 2017

examinees who were rated at the Intermediatelevel and in thisway tooffer an initial roadmapfor helping students to progress through Inter-mediate to Advanced levels of proficiency.Furthermore, since OPIc data were used, thisstudy is the first to examine the impact of thedifferent task types that are required at theIntermediate and Advanced levels.

Speech CharacteristicsThe first research question examined thelinguistic characteristics at each of the sub-levels. To answer that question, it is neces-sary to parse the speakers’ performance onIntermediate- and Advanced-level tasks.

Intermediate-Level TasksThe only way to understand what IL speak-ers need to do to improve to the IM sublevel

is to examine the change in linguistic char-acteristics between IL and IM speakerswhen they performed Intermediate-leveltasks. Figure 8 shows the 95% CI meansand error bars of the linguistic character-istics of five Intermediate-level tasks for ILspeakers and four Intermediate-level tasksfor IM speakers that were rated.1While bothgroups of learners could meet the linguisticdemands of the Intermediate level, the IMspeakers were stronger in all areas. Thisindicates an ease in performing Intermedi-ate-level functions and provides empiricalevidence that there is an increase in thequantity and quality of the language pro-duced between the sublevels.

As would be expected, the IL speakers’speech samples averaged near the “mini-mally meets” threshold (a score of 3), while

FIGURE 8

Linguistic Characteristic Ratings of IL and IM Speakers on Intermediate-Level

Tasks


the IM speakers’ speech samples demon-strated their ability to perform all of thefunctions that are associated with the Inter-mediate level using both good quantity andquality of language. Even though all speechsamples in this study had been originallydouble- or triple-rated, it is interesting thatat the IL sublevel, the samples were reratedjust below the “minimally meets” borderrating of 3. Some might argue that this isevidence that OPIc scoring has a compensa-tory element—the failure tominimallymeetthe requirements of any single task can becompensated for by stronger performanceon other tasks. Thus, the whole speechsample could be rated more highly thansome of the individual parts, although itmay also be an artifact of the use of re-hearsed speech.

Table 10 lists the mean order rank ofthe characteristics that IL speakers mustimprove on when moving toward the IMsublevel. Given that rehearsed materialcould be subsumed under “focus on taskand topic,” it is not surprising that thischaracteristic was the lowest for the ILspeakers. Rather than memorizing re-sponses, students must understand thatthey must be able to spontaneously (1)create with language, (2) perform simpletransactions, and (3) ask and answer ques-tions, and that they should practice tailor-ing responses to different circumstancesrather than going into the autopilot of

rehearsed material. While most speakershave a collection of anecdotes that theyshare in conversations, test takers calledattention to their inability to create withlanguage when they offered glibly fluent,memorized responses that did not addressthe question and were not adapted for theaudience.

This issue may be more endemic withthe OPIc than the OPI in that it is difficultfor OPIc raters to investigate whetherspeech is being spontaneously created oris simply being recited from memory. Forexample, when an interviewee struggles tocreate with the language (e.g., “This uh uhquestion uh about school uh uh very uhinterest. . .”) and then transitions to amore fluid response (e.g., “Built in the1940s, the school I attended was part ofthe Art Deco movement in which. . .”), anOPI interviewer can interrupt the soliloquyby asking follow-up and clarification ques-tions that guide the conversation in a newdirection.WithOPIcs, however, ratersmustlisten for telltale signs of rehearsed re-sponses and then exclude that sample asevidence that the examinee can createwith the language. The opportunity costof using rehearsed material is that thereare fewer chances for an examinee toshow what can be produced spontaneously.These results indicate that, rather than help-ing examinees to be rated at a higher level,the uneven juxtaposition of rehearsed

TABLE 10

Linguistic Characteristic on Intermediate Task From Weakest to Strongest

Mean Rank Order IL IM

7th Function: focus on topic and task Accuracy: fluency6th Accuracy: fluency Accuracy: grammatical/structural5th Accuracy: grammatical/structural Function: focus on topic and task4th Text type: discourse organization Text type: discourse organization3rd Text type: length Accuracy: pronunciation2nd Content: vocabulary use Text type: length1st Accuracy: pronunciation Content: vocabulary use

104 SPRING 2017

material with spontaneous language is ahallmark of the IL sublevel.

In addition, to be rated at the nexthigher sublevel, IL speakers also need toreduce disfluencies in sentence-level dis-course to increase both the quantity andquality of their speech. Often disfluenciesarise as learners search for words and self-correct errors in grammar—an indicationthat the recall of vocabulary and grammati-cal structures has not yet been automatized.For learners to progress from conceptualcontrol, which often entails conscientiouseffort to produce forms, to full control, inwhich production is automatized, learnersmust engage in ample, abundant, and variedconversational language practice. The ben-efit of varied conversational practice is thatit allows learners to practice recombining

and repurposing rehearsed and memorizedmaterial from the Novice level as well asappropriately adjusting and adapting it tonew circumstances. Engaged conversa-tional practice over a wide range of personaltopics will enable IL speakers to develop theease and fluency that is needed to progressto the IM sublevel.

Advanced-Level TasksTo understand what IM and IH speakersneed to do to move up to the next sublevel,one needs to examine the linguistic charac-teristics of speakers who performed Ad-vanced-level tasks. Figure 9 shows the95% CI means and error bars of the linguis-tic characteristics of the Advanced-leveltasks that were rated (one for IL speakers,three for IM speakers, and six for IH

FIGURE 9

Linguistic Characteristic Ratings of IL, IM, and IH Speakers on

Advanced-Level Tasks


speakers). None of the groups successfullymet the linguistic demands of the Advancedlevel; however, the higher the sublevel, thestronger their performance of each linguis-tic characteristic. Thus, progression towardthe Advanced level requires systematic im-provement among all linguisticcharacteristics.

Tomove to the next higher sublevel, IMand IH speakers need to show progress to-ward accomplishing Advanced-level func-tions. The speech characteristic areas thatwere found to be the weakest for both ofthese groups and thus in need of the mostimprovement were “grammatical/struc-tural,” “length,” and “discourse organiza-tion” (see Table 11).

Speakers must move beyond simplesentences to perform the functions thatare required at the Advanced level. Thus,as IM and IH speakers engage in descrip-tions and narrations, sentence complexitywill naturally increase. In the case of En-glish, it will involve moving toward com-plex sentences with embedded clauses (e.g.,“The girl over there wearing the red sweateris my cousin”) and will also include movingfrom partial to full control of tense andaspect when narrating or describing in

different time frames (e.g., “When I waswalking to school this morning, I ran intomy cousin”). The text type progresses alongthe continuum from “sentences” to “stringsof sentences” until it develops into para-graphs with discourse markers (e.g., first,next, then, however). The function of de-tailed description and narration cannot beattained without increasing length and or-ganizational tags. Thus, grammatical/struc-tural accuracy, length, and organization—the three characteristics that IM andIH speakers must work on—are allinterrelated.

Just as IL speakers need to enlarge theirlanguage base as well as adapt and transfer itto new and varied contexts, IM and IHspeakers must develop greater breadthand accuracy; in addition, they must funda-mentally reconfigure their speech habits—to add another floor on top of the Interme-diate-level girders that were mentioned atthe beginning of the article. They need tomove beyond conversational exchanges andpractice carrying out Advanced-level func-tions. While IM speakers would likely ben-efit from drawing content from familiar,autobiographic domains and adding morecomplexity and length to their utterances,

TABLE 11

Linguistic Characteristic on Advanced Task From Weakest to Strongest

Mean RankOrder

IM IH

7th Accuracy: grammatical/structural

Text type: length

6th Text type: length Text type: discourseorganization

5th Text type: discourseorganization

Accuracy: grammatical/structural

4th Accuracy: fluency Accuracy: fluency3rd Content: vocabulary use Content: vocabulary use2nd Accuracy: pronunciation Accuracy: pronunciation1st Function: focus on topic

and taskFunction: focus on topicand task

106 SPRING 2017

IH speakers may benefit from moving be-yond the autobiographical by acquiringmore content domains. Advanced-levelspeakers are often compared to news report-ers—they can describe the setting and nar-rate the details of stories over a wide rangeof topics. Thus, IH speakers would benefitfrom opportunities to practice sharing con-tent by describing settings and narratingstories in many different domains. The abil-ity to describe or narrate in all time framesrequires speakers to use enough language(text type) to paint a verbal picture (dis-course organization and vocabulary) withenough precision (accuracy) that a mono-lingual listener (accuracy) can visualize thescene.

Once again, a speaker’s inability to ful-fill these functions at the Advanced levelmay result from the failure to use enoughlanguage, to organize it meaningfully, or tooffer enough precision to communicatewithout causing confusion or misunder-standing. Thus, concentrating on differentaspects of the four construct axes can helplearners gain the deliberate practice(Ericsson, 2006) needed for incrementalgrowth. While this type of growth often isbest achieved during intensive immersionexperiences like study abroad (Pearson,Fonseca-Greber, & Foell, 2006), growthcan also be facilitated by instructors requir-ing vocabulary development and grammarlearning to be completed out of class anddeliberately allocating a very high percent-age of class time to extended communica-tion activities. For example, when focusingon past narration (an Advanced-level func-tion), learners could work on spontane-ously producing detailed paragraph-lengthdiscourse using a variety of sentence struc-tures and connecting devices without pen-alty for grammatical errors. Then, toimprove accuracy, learners could work atthe sentence level to correct their recordedspeech. The inverse (creating a written basetext and then spontaneously enhancing andelaborating on it by adding detail and con-tent and varying the sentence structure)would also help learners focus on

improving their language along all four ofthe required dimensions.

Task Type DifficultyThe second research question examined thedifficulty of the Intermediate- andAdvanced-level tasks at each of the suble-vels. To answer this question, it is necessaryto parse performance by major level.

Intermediate-Level TasksThe only way to understand what IL speak-ers need to do to improve to the IM sublevelis to examine the performance differencesbetween IL and IM speakers on the differentIntermediate-level tasks. Figure 10 showsthe 95% CI means and error bars of thelinguistic characteristics of four differentIntermediate-level task types that wererated. The IL speakers were at or just underthe “minimally meets” threshold of 3, whilethe IM speakers could clearly perform thedifferent Intermediate-level tasks. As oc-curred with the linguistic characteristics,it was expected that the IL speakers wouldreach the threshold; however, this resultcould have been another instance wherethe whole was greater than the sum of itsparts, or it could have been due to many ofthe responses being rehearsed and thus notable to be rated.

It is important to point out that theordering of task difficulty was different be-tween the IL and IM speakers (see Table 12).For IM speakers, their strengths were per-forming “intermediate role-play” and “askquestions,” both of which required transac-tional language. Yet those same tasks werethe most difficult for the IL speakers. En-gaging in role-plays is not part of everyday,spontaneous conversation; however, “inter-mediate role-play” in the OPIc was designedto allow examinees to demonstrate theirability to handle simple transactions or so-cial situations (e.g., make a purchase, acceptor propose an invitation) that are not readilyelicited through a conversational format.Because the successful completion of “inter-mediate role-play” required the speaker to


ask questions, it is not surprising that “askquestions” was the other function most inneed of improvement. Thus, for speakers tomove to the next sublevel, improving theability to ask questions spontaneously ininteractional contexts must take priority

while speakers focus on the linguistic fea-tures that have already been discussed.

Advanced-Level TasksTo understand what IM and IH speakersneed to do to improve up a sublevel, one

FIGURE 10

Holistic Ratings of IL and IM Speakers on Intermediate-Level Tasks

TABLE 12

Ordering of Intermediate Tasks From Weakest to Strongest

Mean Rank Order IL IM

4th Intermediate role-play Talk about activity or routine3rd Ask questions Talk about thing or place2nd Talk about thing or place Intermediate role-play1st Talk about activity or routine Ask questions

108 SPRING 2017

needs to examine the change in their differ-ences in their performance on Advanced-level tasks. Figure 11 shows the 95% CImeans and error bars of Advanced-leveltasks (one for IL speakers, three for IMspeakers, and six for IH speakers). Noneof the groups successfully performed theAdvanced-level tasks; however, the higherthe sublevel, the stronger the performance.

The easiest tasks for both the IMs andthe IHs was the “advanced role-play” (seeTable 13). While “intermediate role-play”was designed to elicit the language that isneeded for simple conversational ex-changes, the role-play at the Advanced leveladded a complication and placed the trans-action in a more formal setting. This re-quired that examinees use more preciselanguage and actively negotiate with the

interlocutor. As noted above, the “advancedrole-play” required test takers to add newgirders in their linguistic structure. How-ever, since such encounters are still trans-actional even at the Advanced level,paragraph-length discourse may not beneeded to accomplish the task. The orderingof the “role-play follow-up” for the IHspeakers was also somewhat surprisinggiven the relative ease with which they per-formed the role-play itself. The purpose ofthe follow-up was to provide another op-portunity for examinees to describe or nar-rate a personal instance in which they hadexperienced something similar to what wasin the role-play, and that does require para-graph-length discourse. This finding pro-vides empirical evidence that resolvingcomplicated situations may be the first trait

FIGURE 11

Holistic Ratings of IL, IM, and IH Speakers on Advanced-Level Tasks


that is acquired in the progression towardthe Advanced level, but discussion and nar-ration within the same topic domain is moredifficult.

That “past narration” was easier than“past description”was somewhat unexpected.Typically, “past narration” requires greatercommand of grammatical/structural accuracy,which intuitively would seem to bemore diffi-cult than offering a detailed description. Itcould be that autobiographical narrations“sound” better to raters because the rhetoricalstructure is different from that of a description.Furthermore, itwouldhavebeen interesting toexamine how IM speakers responded to theother Advanced-level tasks such as “currentevent” to see if the ordering of all tasks was thesame between IM and IH speakers. Clearly,more research must be conducted to explorethis phenomenon.

When certified testers attempt to gatherevidence of Advanced language proficiency,they often employ a three-prong strategy inwhich examinees are (1) asked to describe asetting or situation; (2) asked to elaborate,clarify, or expand the same topic; and (3)prompted to relate the story from the outsetto the conclusion (Swender & Vicars,2012). A similar strategy could be usedwith IM and IH speakers if they try to de-scribe, elaborate, and narrate in all the ma-jor time frames, moving from topics that arepersonal to those that are more general. Asthe most difficult tasks for IH speakers were“narration/description beyond personal”

and “current event,” it is evident that theincreased cognitive load of discussing gen-eral issues spontaneously could be impact-ing their linguistic control. Thus, havingstudents read or listen to authentic materialand then asking them to describe, elaborate,and recount the story will help examineeshave content they can incorporate and useas they gain the language skills and buildgirders that are needed to move along thecontinuum of the Intermediate sublevels tothe Advanced level.

Limitations and Future

DirectionsWhile this study identified common pat-terns of language growth, there are someissues that still must be taken into consid-eration. First, human performance is vari-able and not every learner will follow thesame path through the Intermediate suble-vels into the Advanced level. Thus, whilethis research reports general trends, itwould not be surprising for individual ex-ceptions to occur. Second, using trainedraters has both strengths and weaknesses.One strength is that it ensures that thosedoing the rating understand the scale welland know what to look for. However, thatfamiliarity could be a weakness as it maylead to confirmation bias in which thosesame raters use circular logic to justify theirratings. Thus, a rater listening to an exam-inee who is already known to be an IL

TABLE 13

Ordering of Advanced Tasks From Weakest to Strongest

Mean Rank Order IM IH

6th — Current event5th — Narration/description beyond personal4th — Role-play follow-up3rd Past description Past description2nd Past narration Past narration1st Advanced role-play Advanced role-play

110 SPRING 2017

speaker will only be looking for evidence tosupport that rating rather than simply ratingthe task on its merits alone; however, thisdoes not seem to invalidate the insights thatwere gained into test takers’ linguistic char-acteristics and the way in which task typesdifferentiate among proficiency levels. Fi-nally, since this study was conducted withKorean speakers learning English, it is un-known to what extent these findings wouldbe generalizable to other languages andlearners.

ConclusionUnderstanding the stages that learners gothrough as they progress through the Inter-mediate range into the Advanced levels us-ing the ACTFL proficiency framework hasreceived very little attention. Fortunately,the OPIc allows a better view into what maybe happening as learners progress throughthat major level as both linguistic character-istics and task types can be analyzed.

To return to the student who lan-guished at the Intermediate level for 3 yearsacross nine attempts to demonstrate Ad-vanced-level proficiency, it might havebeen helpful if her instructors had inter-vened and helped her specifically targetthe different linguistic areas and task typesat each developmental sublevel. For exam-ple, when she started as an IL speaker, shecould have been instructed to work onadapting her memorized language to differ-ent circumstances and to work on the spon-taneous back-and-forth that characterizestransactional language as well as askingand answering questions about personal ex-periences and daily life contexts. As shedeveloped into an IM speaker, the transac-tional language that used to be a weaknessshould now be a strength, and she couldpractice moving beyond simple sentences tomore complex strings of sentences. Record-ing and transcribing what she said couldprovide the foundation for learning howto combine simple sentences using subordi-nation and how to enrich the content byadding detail. For example, an instructor

could ask her to elaborate on what shewas speaking about by telling her that forevery person (or object) she mentioned(e.g., a cousin), she needed to think of threetraits (e.g., physical description, hobbies,occupation) that she could incorporateinto the description. Repeatedly rerecordingmore structurally complex versions thatwere also more rich in content could helpthe learner establish patterns of more com-plex grammar use, transition from shorterto longer text types, and increasingly incor-porate nonautobiographical and generalcontent. Such continued practice across avariety of contexts would force rehearsedmaterial to be adapted and allow her toconfirm that she could create with the lan-guage as she emerged into the IH sublevel.

As an IH speaker, this learner should beusing Advanced language most of the time,although she would be unable to sustain it.The learning approach that allowed her tomove into the IH sublevel would also allowher to move to the AL sublevel, but therewould be a few caveats. Successful commu-nication at the Advanced level requires thatlearners demonstrate the ability to createwith language in longer text types usingdiscourse markers and showing automa-tized fluency that incorporates more com-plex grammatical structures. BecauseAdvanced-level speech requires that en-tirely new girders be built in the learner’sspeech paradigm, it is often difficult to reachthe level of fluency that is needed withoutabundant opportunities to produce para-graph-length discourse. Thus, since class-room time alone is typically insufficient,other opportunities for extensive languagepractice must be incorporated. This couldinclude study abroad, foreign languagehousing, speaking partners, or technologi-cal solutions that would allow consistentpartnering with native speakers.

Offering feedback on grammatical andstructural errors that cause confusion ormisunderstanding is also essential. Per-haps having this learner record and tran-scribe a response, circle and identify errorsthat she was aware of, and then rerecord


herself would encourage her to notice andcorrect errors that might otherwise fossil-ize and thus limit the conjoint progressionthat is needed to reach the next majorlevel. Once again, the learner shouldseek to communicate with a level of auto-maticity that would lead to increased flu-ency and would help her move beyond thepurely autobiographical into narration anddescription that extends beyond the per-sonal frame to include topics of generalinterest as well as current events. Asboth language learners and instructorscome to understand the necessity of simul-taneous and interrelated, or conjoint, de-velopment in function, text type, content,and accuracy, they can structure learningso as to scaffold performance on each ofthese dimensions to help learners moreeasily progress through the Intermediatelevel into the Advanced range.

Note1. IH speakers did not have any Intermedi-

ate tasks analyzed.

AcknowledgmentsThis article was based on a research reportwritten for the ACTFL, and the originalmembers of that research team (ElviraSwender, CynthiaMartin, and Danielle Tez-can) provided valuable assistance through-out. I am extremely grateful for theirgenerosity and friendship.

References

ACTFL. (2012a). Oral proficiency interviewfamiliarization manual. Alexandria, VA:Author.

ACTFL. (2012b). Oral proficiency interviewcomputerized familiarization manual. Alexan-dria, VA: Author.

ACTFL. (2012c). Proficiency guidelines 2012.Alexandria, VA: Author.

ACTFL. (2012d). Performance descriptors2012. Alexandria, VA: Author.

ACTFL. (2016). ACTFL achieves milestone of1,000 certified ACTFL OPI testers. RetrievedFebruary 21, 2017, from https://www.actfl.org/news/press-releases/actfl-achieves-milestone-1000-certified-actfl-opi-testers

Brooks, F. B., & Darhower, M. A. (2014). Ittakes a department! A study of the culture ofproficiency in three successful foreign lan-guage teacher education programs. ForeignLanguage Annals, 47, 592–613.

Carroll, J. B. (1967). Foreign language profi-ciency levels attained by language majors neargraduation from college. Foreign Language An-nals, 1, 131–151.

Chambless, K. S. (2012). Teachers’ oral profi-ciency in the target language: Research on itsrole in language teaching and learning. For-eign Language Annals [Supplement], 45,s141–s162.

Clifford, R. (2016). A rationale for criterion-referenced proficiency testing. Foreign Lan-guage Annals, 49, 224–234.

Cox, T. (2015). Findings of the ACTFL-CREDUresearch project: Linguistic profiles of Koreanspeakers of English. White paper submitted toACTFL.

Cox, T. L., Bown, J., & Burdis, J. (2015).Exploring proficiency-based vs. perfor-mance-based items with elicited imitation as-sessment. Foreign Language Annals, 48,350–371.

Dandonoli, P., & Henning, G. (1990). Aninvestigation of the construct validity of theACTFL proficiency guidelines and oral inter-view procedure. Foreign Language Annals, 23,11–22.

Ericsson, K. A. (2006). The influence of expe-rience and deliberate practice on the develop-ment of superior expert performance. TheCambridge Handbook of Expertise and ExpertPerformance, 38, 685–705.

Gouoni, J. M., & Feyten, C. M. (1999). Effectsof the ACTFL OPI-type training on studentperformance, instructional methods, andclassroom materials in the secondary foreignlanguage classroom. Foreign Language Annals,32, 189–200.

Glisan, E.W.,& Foltz, D. A. (1998). Assessingstudents’ oral proficiency in an outcome-basedcurriculum: Student performance and teacherintuitions.Modern Language Journal, 82, 1–18.

Halleck, G. B. (1996). Interrater reliability ofthe OPI. Using academic trainee raters. For-eign Language Annals, 29, 223–238.

112 SPRING 2017

https://www.actfl.org/news/press-releases/actfl-achieves-milestone-1000-certified-actfl-opi-testers



Levine, M. G., & Haus, G. J. (1987). Theaccuracy of teacher judgment of the oral pro-ficiency of high school foreign language stu-dents. Foreign Language Annals, 20, 45–50.

Liskin-Gasparro, J. E. (1996). Circumlocution,communication strategies, and theACTFLpro-ficiency guidelines: An analysis of student dis-course. Foreign Language Annals, 29, 317–330.

Liskin-Gasparro, J. E. (2003). The ACTFL pro-ficiency guidelines and the oral proficiency in-terview: A brief history and analysis of theirsurvival. Foreign Language Annals, 36, 483–490.

Pearson, L., Fonseca-Greber, B., & Foell, K.(2006). Advanced proficiency for foreign lan-guage teacher candidates: What can we do tohelp them achieve this goal? Foreign LanguageAnnals, 39, 507–519.

Surface, E. A., & Dierdorff, E. C. (2003).Reliability and the ACTFL oral proficiencyinterview: Reporting indices of interrater con-sistency and agreement for 19 languages. For-eign Language Annals, 36, 507–519.

Surface, E., Poncheri, R.,& Bhavsar, K. (2008).Two studies investigating the reliability andvalidity of the English ACTFL OPIc with Ko-rean test takers: The ACTFL OPIc validationproject technical report. Retrieved August 15,2015, from http://www.languagetesting.com/wp-content/uploads/2013/08/ACTFL-OPIc-English-Validation-2008.pdf

SWA Consulting Inc. (2009). Brief reliabilityreport 5: Test-retest reliability and absoluteagreement rates of English ACTFL OPIc profi-ciency ratings for double and single rated testswithin a sample of Korean test takers. Raleigh,NC: Author.

Swender, E., Martin, C. L., Rivera-Martinez,M., & Kagan, O. E. (2014). Exploring oralproficiency profiles of heritage speakers ofRussian and Spanish. Foreign Language An-nals, 47, 423–446.

Swender, E., & Vicars, R. (2012). Oral profi-ciency interview training manual. Alexandria,VA: ACTFL.

Thompson, I. (1995). A study of interraterreliability of the ACTFL oral proficiency in-terview in five European languages: Datafrom ESL, French, German, Russian, andSpanish. Foreign Language Annals, 28,407–422.

Thompson, I. (1996). Assessing foreign lan-guage skills. Data from Russian. Modern Lan-guage Journal, 80, 47–65.

Thompson, G. L., Cox, T. L., & Knapp, N.(2016). Comparing the OPI and the OPIc: Theeffect of test method on oral proficiency scoresand student preference. Foreign Language An-nals, 49, 79–92.

U.S. Department of Education. (2016a, De-cember 16). Degree-granting institutions andbranches. Retrieved December 15, 2016,from http://nces.ed.gov//programs/digest/d02/dt244.asp

U.S. Department of Education. (2016b, Decem-ber 16). High school facts at a glance. RetrievedDecember 15, 2016, from http://www2.ed.gov/about/offices/list/ovae/pi/hs/hsfacts.html

Submitted November 4, 2016

Accepted January 20, 2011


http://www.languagetesting.com/wp-content/uploads/2013/08/ACTFL-OPIc-English-Validation-2008.pdf



http://nces.ed.gov//programs/digest/d02/dt244.asp

http://nces.ed.gov//programs/digest/d02/dt244.asp

http://www2.ed.gov/about/offices/list/ovae/pi/hs/hsfacts.html

Documents

Understanding Intermediate-Level Speakers’ Strengths and