24
Grammatical gender via lexical statistics: The case of Arabic-to-Spanish loanwords Mary Ann Walter Massachusetts Institute of Technology Allomorphy in the Spanish definite article results in ambiguity with respect to grammatical gender for some unfamiliar Spanish nouns. I examine the phonological form and morphological gender status of loanwords into Spanish from Arabic in order to shed light on how underlying representations (URs) are determined in cases of ambiguity. I attribute variation in the quality of final epenthetic vowels in such loanwords to an effect of lexical statistics on choice of noun gender. Speakers know morphological/statistical properties of the lexicon, and use them when grammar is uninformative. Other theories of UR selection, such as lexicon optimization, markedness-driven selection, and random selection, fail to account for the observed pattern. 1. Introduction Underlying representations of linguistic structures, as posited by linguists, may be underdetermined by the evidence available to speakers. In such cases, which form do speakers select, and how do they decide? One common assumption is that they assume maximal transparency and employ Lexicon Optimization, so that the simplest possible mapping relationship obtains between underlying and output forms. In this scenario, speakers minimize faithfulness violations (in optimality-theoretic terms) or the application of phonological rules (in derivational terms). This assumption has played an important role in theories of learnability, as well as in grammatical analysis. Another possibility is that, all else being equal, speakers select the least marked of possible URs (minimization of markedness violations, in OT terms). A third is that speakers simply ‘guess’ so that selection is distributed randomly among possible forms. Finally, speakers may guess in a more informed way, so that the distribution of such forms reflects patterns already present in the lexicon. Models for UR Selection 1. Lexicon Optimization 2. Markedness 3. Random guessing 4. Informed guessing Table 1: Models for UR Selection Consider how these possibilities play out in a case of final devoicing. For hypothetical output forms such as at and op, the corresponding UR may be

Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

  • Upload
    lenhi

  • View
    259

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

Grammatical gender via lexical statistics: The case of Arabic-to-Spanish loanwords

Mary Ann Walter Massachusetts Institute of Technology

Allomorphy in the Spanish definite article results in ambiguity with respect to grammatical gender for some unfamiliar Spanish nouns. I examine the phonological form and morphological gender status of loanwords into Spanish from Arabic in order to shed light on how underlying representations (URs) are determined in cases of ambiguity. I attribute variation in the quality of final epenthetic vowels in such loanwords to an effect of lexical statistics on choice of noun gender. Speakers know morphological/statistical properties of the lexicon, and use them when grammar is uninformative. Other theories of UR selection, such as lexicon optimization, markedness-driven selection, and random selection, fail to account for the observed pattern.

1. Introduction

Underlying representations of linguistic structures, as posited by linguists, may be underdetermined by the evidence available to speakers. In such cases, which form do speakers select, and how do they decide? One common assumption is that they assume maximal transparency and employ Lexicon Optimization, so that the simplest possible mapping relationship obtains between underlying and output forms. In this scenario, speakers minimize faithfulness violations (in optimality-theoretic terms) or the application of phonological rules (in derivational terms). This assumption has played an important role in theories of learnability, as well as in grammatical analysis. Another possibility is that, all else being equal, speakers select the least marked of possible URs (minimization of markedness violations, in OT terms). A third is that speakers simply ‘guess’ so that selection is distributed randomly among possible forms. Finally, speakers may guess in a more informed way, so that the distribution of such forms reflects patterns already present in the lexicon.

Models for UR Selection

1. Lexicon Optimization 2. Markedness 3. Random guessing 4. Informed guessing

Table 1: Models for UR Selection Consider how these possibilities play out in a case of final devoicing. For hypothetical output forms such as at and op, the corresponding UR may be

Page 2: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

either /ad/ or /at/, for the first; /ob/ or /op/, for the second. Morphological alternations, such as suffixation of vowel-initial suffixes, typically decide the issue. When such information is lacking in the language, however, or has not yet been encountered by the speaker, other means must be employed. In the remainder of this section, I outline predictions of the above models in this scenario, and relevant previous experimental results. In Section 2, I introduce another test case, involving a Spanish morphological alternation in grammatical gender. I discuss the bearing of this alternation on the morphophonological adaptation of a class of Arabic loanwords into Spanish in Section 3, and conclude by sketching out an analysis favoring the informed-guess UR selection model.

1.1 Final devoicing and UR selection models

Lexicon Optimization predicts consistent selection of underlyingly unvoiced final consonants, to maximize identity between UR and output. Markedness may make different predictions for different segments, depending on the ones involved. The existence of emergence of the unmarked effects provides some evidence for the influence of markedness considerations where grammars are otherwise uninformative. If language-specific patterns were found in UR choice, however, this would militate against a strong role for universal markedness considerations in their selection. Random guessing would result in an even distribution between the two alternatives. The result of informed guessing is unclear with a hypothetical example such as this one. However, numerous studies show that speakers prefer nonce words that conform to distributional patterns of their native lexicons in rating tasks (e.g. Zimmer 1969, for Turkish harmony). This suggests that they might also be guided by these preferences when assigning URs.

UR Predictions: Final Devoicing

Model Output Input

1. Lex. Opt. at, op 100% at, op 2. Markedness at, op 100% at, ob? 3. Rand. guess at, op 50% ad/50% at; 50% ob/50% op 4. Inf. guess at, op language-dependent

Table 2: UR Predictions: Final Devoicing

Precisely this sort of case has been investigated for two different final-devoicing languages, Dutch (Ernestus and Baayen 2003) and Turkish (Nevins and Yolcu-Kamali 2005). In both experiments, speakers of the language in question were prompted with finally-devoiced forms of nonce words, and asked to produce an inflected form of the nonce stem in which a vowel-initial suffix induces a voicing alternation. The task forces a choice of UR and reveals the choice, since the elicited form includes the underlying voicing specification of the segment in question.

For neither language did UR selection fall exceptionlessly into the voiceless category, as predicted by the Lexicon Optimization account. Thus in at

Page 3: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

least some cases, speakers posit URs that differ from surface outputs, even when it is unnecessary for them to do so.

Neither were UR selections evenly distributed between voiced and voiceless specifications. Thus the random guessing strategy appears not to be active. Rather, the percentage of voiced final consonant URs varied according to the place of articulation of that consonant. For each language, those percentages roughly reflect the relative frequency of UR voiced final consonants in the lexicon. Moreover, the place effect was modulated in Turkish by an effect of syllable count. In elicited nonce forms, as in the lexicon of Turkish, final consonants of polysyllabic ones are much more likely to be underlyingly voiced than voiceless.

In Dutch, on the other hand, syllabicity plays no role. The language-specificity of this effect undermines a markedness account. In addition, Dutch velar fricatives are highly likely to be classified as underlyingly voiced, a choice that is difficult to reconcile with a segmental markedness account given standard assumptions about voicing of velars.

The pair of studies outlined above focuses on segmental properties of speech sounds. In what follows, I pursue a similar approach to morphological properties of lexical items. Specifically, I investigate the predictions of these four models with respect to grammatical gender assignment in Spanish nouns borrowed from Arabic. I argue that a phonologically-driven allomorphic alternation in the Spanish definite article results in gender ambiguity, which is resolved in accordance with lexical statistics.

2. Spanish grammatical gender

Spanish nouns may be either of feminine or masculine gender. Feminine gender is often associated with a suffixal –a vowel (though there may be no suffixes or some other typically feminine one such as abstract –dad). Masculine gender may be indicated with no suffix or with –o. Articles and adjectives agree with the noun, as shown below: 1) a. la profesor-a guap-a the-F professor-F goodlooking-F b. el profesor guap-o the-M professor goodlooking-M ‘the goodlooking professor’ However, the singular definite article is subject to a productive hiatus-resolving restriction of the following type (with certain lexical exceptions and cyclic effects). When the noun begins with a stressed /a/ vowel, and the definite article immediately precedes it, the masculine form of the article is used, regardless of the noun’s grammatical gender (Harris 1987).

Singular definite article allomorphy: la (F) à el (M) / [N á…

Page 4: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

The examples below demonstrate the alternation. 2) a. el água *la água the-M water ‘the water’ b. el água suci-a *el água sucio the-M water dirty-F ‘the dirty water’ c. la mism-a água *el misma água, *el mismo água the-F same-F water ‘the same water’ Adjectives agree with the underlying gender, and a sequence of final /a/ + initial stressed /a/ is still permitted when the preceding word is not the article (i.e. not within the prosodic word). This alternation assumed its current form in the 1500s, before which it applied more generally to initial vowels, regardless of stress (Penny 2000).

Note that the use of the masculine article in these cases results in a sort of ambiguity similar to that seen above with respect to final devoicing. In the absence of previous knowledge of a noun’s grammatical gender, or a disambiguating gender suffix on the noun that is not always present, a speaker cannot know whether a given noun form is underlyingly masculine or feminine when used with the definite article. Below I consider /a/-initial loanwords from Arabic as a test case for the four models of UR selection.

If Lexicon Optimization decides, then all such nouns would be assigned masculine gender, which is transparently compatible with the form of the definite article that must be used in such cases. Markedness also predicts the consistent use of masculine gender, since masculine is generally considered to be the morphological default. A strategy of random guessing would result in 50% of ambiguous cases being classed as masculine and 50% as feminine.

The prediction of the fourth approach, informed guessing, depends on the preexisting distribution of gender in the Spanish lexicon. At first glance, this appears to be indistinguishable from that predicted by the random guessing approach. In the Lexesp corpus of modern Spanish (Sebastián et al. 2000), noun types are equally divided with respect to gender, 50% masculine and 50% feminine.

However, there are two problems with using these percentages. First, considering only /a/-initial words may yield a different result. When the set of Lexesp nouns considered is restricted to those beginning with /a/ (stressed or unstressed; n=1141), the percentage which bears (underlying) feminine gender drops to 41%. This is a more informative number than the overall distribution in the lexicon.

Second, the proportions may have been different at the time the words in question were borrowed. Recall that the source of the loanwords in question is the Arabic language, which was subject to the closest contact with Spanish in the centuries following the Arab conquests of much of the Iberian peninsula (8th century and thereafter). This linguistic coexistence diminished over time and was abruptly curtailed with the fall of the final Muslim principality of Granada in 1492, and the subsequent expulsion/persecution of the Arabic-speaking

Page 5: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

Spanish community. Most loans from Arabic into Spanish, then, predate this time. (They also, therefore, predate the restriction of the article allomorphy to stressed initial /a/.)

The following table gives the absolute numbers of /a/-initial nouns found in my searches of the Davies historical online corpus of Spanish. The percentage of these forms which bear (underlyingly) feminine gender is also given.

Gender of /a/-initial Spanish words

1200s 1300s 1400s N 154 119 198

%F 40 49 46 Table 3: Gender of /a/-initial Spanish words Note that this percentage remains relatively stable in the 40-50% range throughout the relevant time period, and that these percentages themselves are quite close to the one established for modern Spanish using the Lexesp corpus (41% feminine).

3. Arabic loanwords

Arabic is the source of a large number of Spanish lexical items, and at one time even more so, with the percentage of the Spanish lexicon of Arabic origin reaching at least 5% (Viguera Molins 2002). Despite large-scale borrowing and the centuries of linguistic coexistence enjoyed between the two languages, however, the degree of bilingualism among borrowers of Arabic words appears to have been small. One piece of evidence for this conclusion is that Arabic words were often borrowed intact into Spanish with the prefixal definite article /al/ still attached to the head noun (Odisho 1997), rather than the independent stem as a more informed borrower might be expected to do. This circumstance also means that a very high percentage of the noun borrowings begin with /a/, and should therefore be subject to the gender/article ambiguity detailed above.

In addition, lack of borrowers’ fluency in Arabic should minimize the influence of these nouns’ gender in Arabic itself on borrower behavior. Being unaware of the Arabic gender leaves the field clear for them to use one of the four strategies outlined in the introduction in making their decision. Another relevant property of these loanwords is the nature of coda repairs. Arabic is relatively liberal in terms of permissible word-final consonants/clusters, especially in comparison with Spanish. The typical repair in Spanish for these forms is to epenthesize a final /e/ vowel (cf. Ar. duff à Sp. adufe. Transcription is in standard written orthography for Spanish, and phonemic transcription of the Standard Arabic form for Arabic, largely according to the Encyclopedia of Islam system but with emphatic consonants capitalized rather than underdotted). Another, less frequent possibility is for the Arabic segment to be borrowed as one that is a permissible Spanish coda (cf. Ar.

Page 6: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

rabaD à Sp. arrabal; such transpositions occasionally occur even with phonotactically licit final consonants, especially between /l/ and /r/).

3.1 Corpus procedure

As a first step, a corpus was compiled of all the /a/-initial Spanish nouns of Arabic origin that are identified as such in the etymological dictionary of Corominas and Pascual (1997). This results in a set of 453 candidate nouns, given in the appendix with glosses for crucial forms. As Corominas and Pascual (henceforth CP) do not usually specify grammatical gender, another step was necessary to establish this. To do so, searches were carried out for each noun in the Davies historical online corpus of Spanish (attempts in additional dictionaries, including the Lexesp corpus, were abandoned after showing a much lower inclusion rate of the items in question, many of which are archaic).

Because the Davies corpus does not tag nouns for gender, but does provide the context in which items appear in the corpus texts, underlying grammatical gender was inferred from this context. Gender was considered unambiguously established only for nouns which cooccurred with an agreeing item other than the definite article (like the adjective of example 2b above) or with a non-adjacent definite article (as in example 2c). Alternatively, a noun with unstressed initial /a/ and an attestation later than 1500 was also considered sufficient if attested only with the definite article, since for later Spanish the article does disambiguate in these cases. This resulted in a set of 245 items.

Those items which remained were then classified as masculine or feminine using the dictionary of the Real Academia Espanola (henceforth RAE). Twenty-four remained indeterminate, leaving a set of 438 nouns. In what follows I first consider the Davies set, which is determined by actual corpus examples, then the Davies+RAE sample.

3.2 Corpus gender distribution

Of the restricted Davies set, 100 nouns can be identified as underlyingly feminine in grammatical gender, and 145 as masculine. The percentage feminine of the set is 41%, in perfect accord with the percentages documented in the text above for modern Spanish and in Table 3 for the contemporaneous lexicon. Within the set of feminine nouns, all but four forms end in a final /a/ vowel in Spanish. The four exceptions include two forms that end in a potentially pluralizing /s/, and one with the usual epenthetic /e/ (azumbre ‘liquid measure’). The fourth form involves an unexpected change from the Arabic form to a segmental sequence homophonous with the Spanish feminine suffix –ion (Ar. siyuur à Sp. acion ‘saddle piece from which stirrup hangs’). For those which do end in a Spanish /a/, the majority also show a final /a/ in Arabic. Ten exceptions exist, which are listed below in Table 4.

Exceptions to Arabic final /a/

Spanish Arabic Gloss

azotea suTayH building covering; human head atalaya Talaayic tower or elevated place arracada qarraT earring

Page 7: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

almarada mixraz iron weapon point or needle alhóndiga funduq market, warehouse alforja xurj type of carrying bag alcarraza karraaz cooling container alcaparra kabar type of plant aduana diwaan customs adárgama darmak type of flour

Table 4: Exceptions to Arabic final /a/. These ten examples are all cases in which an epenthetic final /e/ is expected, yet an /a/ vowel surfaces instead. No obvious semantic femininity or other commonality unites them. The Arabic gender of the forms is also highly disparate – one is unknown, one a collective, three feminine, and five ‘broken.’ Thus even if borrowers knew the gender of the Arabic form – unlikely, given the apparent ignorance of Arabic displayed by article retention – it does not appear to play a deterministic role in adaptation. In most cases, a straightforward relationship exists in which a final /a/ in Arabic implies feminine gender in Spanish, and where the final /a/, which indicates feminine gender, is retained. Of the exceptions, most involve substitution of a feminizing /a/ suffix for the phonologically expected /e/, which brings the form into line with canonical/prototypical expression of feminine gender. One additional form does so via modification to another feminine suffix, closer in phonological form to the original Arabic one than /a/ is (-ion).

The set of nouns (n=145) which are borrowed as masculine into Spanish show a far greater variety in their phonological form at word end. Many end in /e/ (n=52), while others end in phonotactically licit consonants (primarily /n, r, l, s/; n=76). Two of these (ajimez, alamar) have a final /a/ in Arabic, which has been unexpectedly lost. Four more Arabic final /a/ words were borrowed directly as such and are attested as masculine in Spanish in the Davies corpus, (two are human-male; though they are all classed as feminine in the Real Academia dictionary; adafina, albacea, almádena, almofalla).

A fifth final /a/ form (álgebra, Arabic jabr), is now canonically masculine but continued to fluctuate in gender until the 20th century (in the Davies corpus, 10 feminine attestations and 17 masculine through 1900). Finally, twelve forms added a final /o/ in Spanish, making them appear more canonically masculine, four of which are human-male (algarivo, algavaro, abalorio, alborozo, almuédano, asesino, azulejo, aljemifao, abelmosco, albarazo, ?albérchigo, alfónsigo, almarjo).

As expected for a default category, then, the masculine forms show greater variation and are subject to less modification than the feminine exceptional ones. Those few which do have a feminine-associated final /a/ may lose it, though not necessarily. Rarely, the canonically masculine /o/ suffix is substituted (for Ø), though not the same extent as the feminine /a/ for /e/, and for a quite restricted semantic range (human-male).

The primary finding, then, is the use of final /a/ in place of /e/ in a selected number of feminine loanwords. Though the number of words involved is not large, it represents roughly 10% of the feminine loanword set.

Page 8: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

Let us now consider the superset of 438 words with the RAE additions included. Of these nouns, 175 are feminine and 263 are masculine, so that the overall percentage feminine is 40% -- consistent with what has gone before. The following forms may be added to our set of feminized words.

Exceptions to Arabic final /a/

Spanish Arabic Gloss

ajaquefa shiqaaf upper part of a building; iron decoration albenda band patterned white hanging cloth alharma Harmal type of plant almijara ma’jal oil deposit atafea TafaH indulgence to excess almartaga martak ?

Table 5: Exceptions to Arabic final /a/ (RAE). Two additional forms take final epenthetic /e/ but are nonetheless classed as feminine (duqaaqà adutaque, xazzaajà azache). Note that in one case (alharma) the final /a/ is obtained by final consonant deletion rather than atypical epenthesis. The result is still the application of otherwise unexpected phonological changes, with the result of a final /a/ and feminine gender assignment. The observation holds for both the smaller and larger sets of loanwords. What motivates this pattern?

3.3 Motivating the exceptions

One possibility is that /a/ is the epenthetic vowel of Spanish. It surfaces straightforwardly in the set of ten feminine words, but is dispreferred for masculine ones because of its independent status as a feminine gender morpheme. The masculine final-epenthesized forms, then, are the ones which have undergone modification, from final /a/ to /e/.

This explanation seems unlikely, given the appearance of /e/ as the epenthetic vowel in non-final environments in Spanish in which it could not be associated with feminine gender (e.g. the onset cluster of escuela ‘school’). In addition, /a/ is an unlikely choice of epenthetic vowel given the inventory of Spanish, since shorter vowels are often favored for this purpose, and low vowels tend to be relatively long (Lehiste 1970).

A purely phonological explanation might rely on some sort of vowel harmony process that results in /a/ instead of /e/, especially as vowel harmony is known to operate to some extent in modern Andalusian Spanish (for tense/lax; Zubizarreta 1979). This too seems unlikely, since in the set of ten, /a/ cooccurs with each of the other four vowel phonemes of Spanish, including those which disagree in height and backness (alforja, azotea, aduana, alhóndiga). In addition, the masculine set of /e/-final words includes many stems with only /a/ vowels, where nonetheless epenthetic final /e/ is attested (n=12). While short non-high vowels in Arabic are known to be subject to variation in quality depending on the presence of neighboring guttural consonants, and this variation is reflected in loanwords, it too is not present in this data set in a way that could

Page 9: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

explain our set of exceptions, which includes forms both with and without gutturals. A final option is that /a/ is used in place of /e/ in some cases, so that the percentage of feminine nouns in this subset of the lexicon resembles that in the lexicon as a whole. To put it another way, this ensures that the borrowing of these forms does not lead to a change in the distribution of grammatical gender in the Spanish lexicon. Now consider the gender distribution in the (Davies) loanword set if these exceptions did not exist, or were borrowed in the expected way phonologically and then assigned masculine gender. If the former, the resulting set of loanwords (n=234) would fall to 38% feminine; if the latter, 36%. Both fall below the percentages of feminine nouns seen in the contemporary Spanish lexicon (41%) and in the centuries of heaviest borrowing (40, 49 and 46%). As we have seen, however, including the exceptions puts the percentage precisely at the lexical target of 41%. In the larger RAE set as well, without unexpected feminization, the percentage of feminine items drops to 36% rather than remaining at 40%. Again, shifting some forms into the feminine class is necessary in order to maintain the lexical ratio of 40+ percent feminine. I claim that the motivation for the exceptional forms observed is precisely to shift these percentages into the range previously existing in the lexicon. Let us now formalize the intuitive account laid out above, using the following optimality-theoretic constraints: (3) OT Constraints: NOCODA: Forms should not surface with consonant(s) in the syllable

coda. *F: Forms should not surface with feminine grammatical gender. *M: Forms should not surface with masculine grammatical gender. F=/a/: Feminine grammatical gender and final /a/ should be associated. Now consider a hypothetical borrowing of the form /atab/, and its incorporation into Spanish via the following tableau. The input is the form heard as an Arabic production by a Spanish speaker/borrower, while the output is both the form produced in Spanish and the one subsequently stored as the Spanish UR. /atab/ NOCODA *F

59% *M 41%

F=/a/

atab-F/M *! atabe-F *! * à atabe-M * ataba-F *! ataba-M * *! Table 6: Final /e/ epenthesis and masculine gender. The first candidate is immediately ruled out by the high-ranking NOCODA constraint, regardless of which gender it is assigned. The second violates the markedness constraint against feminine gender, as well as the constraint

Page 10: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

mandating an association between gender and form such that feminine forms should surface with final /a/, and final /a/ forms should be feminine. Note that the two markedness constraints *F and *M are ranked stochastically, in the sense of Boersma and Hayes (2001), in a way that reflects the lexicon as already acquired. This ranking ensures that for some items entering the language, feminine gender will be assigned instead of the masculine default. In such cases, the F=/a/ constraint becomes decisive, and we see a surface form that differs in both gender and final vowel from the output preferred above, as shown in the following tableau. /atab/ NOCODA *M *F F=/a/ atab-F/M *! atabe-F * *! atabe-M *! à ataba-F * ataba-M *! * Table 7: Final /a/ epenthesis and feminine gender. In order for the form to take hold, I assume that the computation for each item is performed only once for each speaker (possibly this is enforced by a form of the USE-LISTED constraint proposed by Zuraw 2000), and that the result must then be propagated via its status as the first (possibly only) adaptation of the form, or via the social influence of that particular speaker.

I leave aside the additional constraints involved in selecting the epenthetic vowel, as well as cases of final /a/ in which epenthesis is not otherwise necessary (e.g. almarada, alcaparra, alcarraza, aduana; to get these forms, F=/a/ must also outrank DEP-/a/).

In addition, note that the change involved in the epenthetic vowel alternation is minimal in both featural and acoustic terms: /e/ à /a/ necessitates a change of only one phonological feature ([-low] to [+low]), and the two segments are neighbors in acoustic space. (A strong implementation of the similarity factor might hold that gender-switching to final /a/ should occur only when an epenthetic final vowel is independently necessitated by a phonotactically unacceptable final consonant in the Arabic form; however, we see that this does not in fact hold, since four to five of the crucial ten forms end in phonotactically fine consonants). The special case of siyuur-acion also bears on this point, since the form surfaces as such and not as the alternative aciora. Intuitively, the change to the final consonant is more minimal than epenthesizing a (phonotactically unnecessary) final vowel. I abstract away from this complication, which is surely relevant.

Finally, although this account relies crucially on the (stochastic) ranking of a pair of markedness constraints, I argue that it falls into the informed-guessing model of UR-selection, rather than being a purely markedness-driven one. We may assume that in the language acquisition process undergone by Spanish speakers, the two constraints were originally ranked equally highly. This ranking then shifted stochastically as items of the Spanish lexicon were acquired. The ranking at the moment of Arabic loanword

Page 11: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

importation, then, reflects the pre-existing lexicon of Spanish as known by the speaker at that time. It is independent of language-internal or cross-linguistic markedness considerations other than Spanish lexical statistics.

4. Conclusions

Spanish article allomorphy presents speakers with ambiguity regarding grammatical gender for some unfamiliar items. In the case of loanword adaptation, speakers do not then randomly assign gender, or uniformly assign a single (masculine) gender in all cases, as predicted by both transparency/lexicon optimization and markedness considerations. These results join a growing body of evidence that lexicon optimization as classically formulated cannot capture observed patterns of lexicon formation, and that it may be contravened by knowledge of lexical statistics and alternations (Vaux and Nevins 2005 and citations therein). I do not conclude, with Vaux and Nevins, that the apparent influence of lexicostatistics (and other things other than LO) in UR selection means that such selection is outside the grammar – the alternative is for probabilistic mechanisms to be incorporated into the grammar, as in the solution that I have sketched out above.

This mechanism brings gender distribution in the borrowed lexicon into line with that of the native lexicon, and leaves the distribution in the lexicon as a whole unchanged. In this way, lexical statistics drive gender assignment.

References

Boersma, P. & B. Hayes. (2001). Empirical Tests of the Gradual Learning Algorithm. Linguistic Inquiry 32:45-86.

Corominas, J. & J.A. Pascual. (1997). Diccionario Crítico Etimológico Castellano e Hispánico. Madrid: Gredos.

Davies, M. http://www.corpusdelespanol.org. Ernestus, M. & H. Baayen. (2003). Predicting the Unpredictable: Interpreting Neutralized

Segments in Dutch. Language 79.1:5-38. Harris, J. (1987). Disagreement Rules, Referral Rules, and the Spanish Feminine Article el. Journal of Linguistics 23:177-183. Lehiste, I. (1970). Suprasegmentals. Cambridge: MIT Press.

Page 12: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

Nevins, A. & B. Yolcu-Kamali. (2005). Constructing Underlying Representations based on Informed Guesses: Turkish Evidence. Talk given at the meeting of the Central Eurasian Studies Society: Boston.

Odisho, E.Y. (1997). ‘al’-Prefixed Arabic Loanwords in Spanish: Linguistic Implications. Zeitschrift für Arabische Linguistik 33:89-99.

Penny, R. (2000). Variation and Change in Spanish. Cambridge: Cambridge University Press.

Real Academia Española. (2001). Diccionario de la lengua española. 22nd edition. Madrid. http://www.rae.es.

Sebastián, N. & F. Cuetos, M. A. Martí, M. F. Carreiras. (2000). LEXESP: LéxicoIinformatizado del Español. Edición en CD-ROM. Barcelona: Edicions de la Universitat de Barcelona (Colleccions Vàries, 14).

Vaux, B. & A. Nevins. (2005). Formal and Empirical Arguments for Morpheme Structure Constraints. Talk given at the annual meeting of the Linguistic Society of America, Oakland, CA.

Viguera Molíns, M.J. (2002). Lengua Árabe y Lenguas Románicas. Revista de Filología Románica 19:45-54.

Zimmer, K.E. (1969). Psychological Correlates of Some Turkish Morpheme Structure Conditions. Language 45.2:309-321.

Zubizarreta, M.-L. (1979). Vowel Harmony in Andalusian Spanish. In K. Salé (ed.), Papers on Syllable Structure, Metrical Structure, and Harmony Processes. Cambridge, MA: MIT Working Papers in Linguistics.

Zuraw, K. (2000). Exceptions and Regularities in Phonology. UCLA dissertation.

Department of Linguistics and Philosophy E39-245 MIT 77 Massachusetts Avenue Cambridge, MA 02139 USA

[email protected]

Appendix

Transcriptional notes: The Spanish forms are in standard orthography. Arabic forms are given roughly phonemically, with capitalization for pharyngealized segments, and some indicated by digraphs when not hyphenated – e.g. ‘sh’ as in English ‘ship’, ‘dh’ as in ‘this’, ‘th’ as in ‘thick’, ‘gh’ as the dorsal voiced fricative, and finally, ‘s’ as the pharyngeal glide.

Spanish

Gender

Gender

source Spanish Arabic Gloss (of Sp)

m RAE albérchigo apricot m CP asesino Hashshaashii assassin

Page 13: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

m CP azulejo zulayj blue tilework f CP azotea suTayH; RAE -a building covering; human head m RAE ataire daa'ira circle f CP alcarraza karraaz; RAE -a cooling container f CP aduana diwaan customs f CP arracada qarraT; RAE raqqaada earring m CP albacea waSiiya executor m CP alborozo buruuz great pleasure or disorder

f RAE atafea TafaH; RAE w/ Ar. /a/ suffix indulgence to excess

m RAE arrejaque rashaaqa iron fishing trident; type of bird m CP almádena maTana iron tool to break stones f CP almarada mixraz iron weapon point or needle m CP adafina dafiina Jewish type of jar to keep Sabbath food in m RAE albarazo baraS leprosy f CP azumbre thumn liquid measure m RAE atarraga tarraaqa little hammer f CP alhóndiga funduq market, warehouse m CP almuédano mu'adhdhin muezzin

f RAE almijara ma'jal; RAE maybe mish'ala oil deposit

m CP aljemifao jamiic Hawayj peddler of small items m CP almofalla maHalla people or place of war f RAE albengala benkaala placename (Bengal) m RAE alquezar qaSaara; RAE qiSaar river water catchment f CP ación siyuur saddle piece from which stirrup hangs m RAE adúcar maybe Hadduuqa silk thread m CP abalorio billauri (adj) small ornament f CP atalaya Talaayic tower or elevated place f CP alforja xurj type of carrying bag f CP adárgama darmak; RAE -a type of flour m RAE algavaro ghawwaar type of insect m RAE abelmosco Habb al-musk type of plant

f CP alcaparra kabar; RAE kapparra from Lat/Gk type of plant

f RAE alharma Harmal type of plant m RAE almarjo marj type of plant m RAE alfónsigo fustaq type of tree f RAE ajaquefa shiqaaf upper part of a building; iron decoration

f RAE albenda band white hanging cloth decorated with patterns and figures

Page 14: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

m RAE ademe dicma wooden guard in mines m RAE abitaque Tabaq amb. Ant.? RAE acebibe zabiib m CP acebuche zabbuuj m RAE aceche zaaj f RAE aceifa Saa'ifa m CP aceite zayt m RAE aceituní zaytuunii f CP acelga silqa f CP acémila zaamila m CP acemite samiid f CP aceña saaniya f CP acequía saaqiya f RAE acerola zacruura m CP acetre saTl m CP acial ziyaar m CP acíbar Sibar verb RAE acicalar Saqal m CP acicate sikkaat m CP acimut sumuut m CP acirate SiraaT m CP adalid daliil f RAE adaraja daraja f CP adarga darqa m CP adarme dirham m CP adarve darb f RAE adaza daqsa f RAE adefera Dafiira f RAE adehala daxaala f CP adelfa difla ademna dimna f RAE adivas di'ba m RAE adive dhi'b m CP adobe Tuub m CP adoquín dukkaan m RAE ador dawr m CP aduar dawwaar m CP adufe duff adverb RAE adunia dunyaa f RAE adutaque duqaaq

Page 15: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

m RAE aguajaque wushshaq f RAE ajabeba shabbaaba f RAE ajaraca sharaka m CP ajarafe f CP ajedrea shaTriya m CP ajedrez shitranj m CP ajimez shimaasa m RAE ajomate jummaat m RAE ajonjoli juljulaan f CP ajorca shurka m CP ajuar shuwaar f CP alacena xazaana m CP alacrán caqrab m RAE aladar cidhaar m RAE aladroque azraq f RAE alafia caafiya f RAE alahilca cilqa m RAE alajor cushuur m CP alamar camaara m CP alambique anbiiq m RAE alambor m RAE alamín amiin m RAE alamud camuud f RAE alaqueca caqiiqa m CP alarde carD m CP alarife cariif f RAE alaroza caruusa m RAE alatar caTTaar m CP alazán azcar m RAE alazor cuSfur f CP albacora baakuura albadén ? f CP albahaca Habaqa f CP albaida bayDa m CP albalá baraa'a m CP albañal ballaaca f CP albanega baniiqa m CP albañil bannaa' f CP albarda bardaca m CP albardán bardaan

Page 16: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

m RAE albardín bardii m CP albaricoque birquuq f CP albarrada barraada m RAE albarrán barraanii m CP albarraz Habb al-ra's m CP albayalde bayaaD m CP albéitar bayTar f CP alberca birka m RAE albihar bihaar f RAE albitana bitaana m CP albogue buuq f RAE alboheza xubbayza m RAE albohol Hubuul f CP albóndiga bunduqa f RAE albórbola walwala f CP albornía burniiya m RAE albornoz burnuus f CP alboronía buuraaniiya m CP alboroque buruuk m RAE albotín buTm f CP albricias bishaara f RAE albudeca buTTayxa f CP albufera buHayra m CP albur buurii f CP alcabala qabaala m RAE alcabor qabw f RAE alcabtea qabtiya m CP alcacer qaSl f CP alcachofa xarshuufa alcadafe qadaH m RAE alcáfar kafal m RAE alcahaz qafaS m CP alcahuete qawwaad f CP alcaicería qaysaariya m CP alcaide qaa'id m CP alcalde qaaDi m CP álcali qily m CP alcaller qallaal m RAE alcamiz xamiis f CP alcamonías kammuuniya

Page 17: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

f CP alcancía kanziya f CP alcándara kandara f RAE alcandía qaTniya f CP alcandora qanduura f RAE alcanería qannaariya m CP alcanfor kaafuur f RAE alcántara qanTara m CP alcaraván karawaan f CP alcaravea karawia f CP alcarceña karsanna m RAE alcartaz qarTaas m RAE alcatenes kattaan f CP alcatifa qaTiifa m CP alcatraz ghaTTaas m CP alcaucil qabSiil f CP alcavera qabiila f CP alcazaba qaSaba m CP alcázar qaSr f CP alcoba qubba f CP alcohela kuHayla m CP alcohol kuHl f RAE alcolla qulla m CP alcor quur f CP alcora kura m CP alcorque qurq f CP alcorza qurSa m RAE alcrebite kibriit m RAE alcribís qawaadiis f RAE alcubilla kuuba f CP alcurnia kunya f CP alcuza kuuza m CP alcuzcuz kuskus f CP aldaba Dabba f CP aldea Dayca f RAE aldiza diisa aldrán Da'n f RAE alejija dashiisha m CP alerce arz f RAE aletría iTriya m RAE aleve cayb

Page 18: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

f RAE alfaba Habba f RAE alfadía hadiiya f RAE alfaguara fawwaara m CP alfajeme Hajjaam m CP alfajor Hashw f CP alfalfa faSfaSa alfalia ghaaliya m CP alfaneque fanaak m CP alfanje xanjar m RAE alfaque fakk m CP alfaqueque fakkaak m CP alfaquí faqiih m RAE alfaquín Hakiim m CP alfar faxxaar m RAE alfaraz faras f CP alfarda farDa m RAE alfareme Haraam m RAE alfarje Hajar m CP alfayate xayyaaT m RAE alfazaque fassaas m CP alféizar fas-ha m CP alfeñique faanid f CP alferecía faarisiyya/faalijiiya m CP alférez faarisiyya/faalijiiya m RAE alferraz farraas m RAE alficoz fuqquus m CP alfil fiil m CP alfiler xilaal m RAE alfitete fitaat m CP alfolí hury f CP alfombra xumra f CP alforza Huzza m CP alfoz Hawz f RAE algaba ghaaba f RAE algaida ghayDa m RAE algar ghaar f CP algara ghaara f CP algarabía carabiya adj RAE algarivo ghariib f CP algarrada carraada

Page 19: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

f CP algarroba xarruuba f CP algazara ghazaara m RAE algazul ghaasuul m CP álgebra jabr m CP algodón quTn f CP algorfa ghurfa m CP alguacil waziir alhadida Hadiida alhaita xayT f CP alhaja Haaja m RAE alhamar Hanbal m RAE alhamel Hammaal m RAE alhandal HanZal f RAE alhanía Haniya f CP alharaca Haraka f RAE alhavara Huwwaara m CP alhelí xayri alhema Hima f CP alheña Hinna f RAE alholva Hulba alhorma Hurma m RAE alhorre Hurr f CP alhucema xuzaama f RAE alhuceña xushayna f RAE alhurreca Hurrayqa m RAE aliacán yaraqaan m RAE alicante from alacran m CP alicates laqqaaT f CP alidada ciDaada m RAE alifafe nafax, liHaaf m RAE alijar jishaar m RAE alinde hind m RAE alioj yashb m RAE alizace isaas m RAE alizar izaar f CP aljaba jacba m RAE aljabibe jabbaab f CP aljama jamaaca f RAE aljamía cajamiiya m RAE aljaraz jaras

Page 20: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

f RAE aljarfa jarraafa m CP aljibe jubb m CP aljófar jawhar f CP aljofifa jaffiifa f RAE aljuma jumma f RAE alloza lawza m CP almacén maxzan f RAE almacería maSriya f CP almáciga maSTakaa m RAE almadén macdin f CP almadía macdiya f CP almadraba maDraba m CP almadraque maTraH m CP almagre maghra m CP almaizar mi'zar m RAE almajar micjar f CP almalafa milHafa f RAE almanaca mixnaqa m CP almanaque manaax m RAE almancebe manSib m RAE almarbate mirbaT m RAE almarrá miHlaaj f RAE almarraja mirashsha f RAE almártaga martaca, martak m RAE almatriche maTriish m RAE almatroque maTruuH almayal mayyaar f CP almazara macSara m RAE almazarrón miSr; RAE almagra, rust f CP almea mayca f CP almejía maHshiya f CP almenara manaara, manhar m RAE almez mays m CP almíbar miiba m RAE almicantarat muqanTaraat almifor mifarr m RAE almijar manshar m RAE almimbar minbar m CP alminar manaara, manhar m CP almirante amiir

Page 21: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

m CP almirez mihraas m RAE almizate muusaT m CP almizcle misk m RAE almocadén muqaddam m CP almocafre mukaffir m RAE almocárabe muqarbaS almocati muxxaat m RAE almocrebe murakkib m RAE almocrí muqri m RAE almodón madhuun m CP almodrote madruus m CP almófar mighfar f CP almofía muxfiya m RAE almofrej mafraash m CP almogávar mughaawir f CP almohada mixadda adj RAE almohade muwaHHid f CP almohaza miHassa f RAE almojábana mughabbana m CP almojarife mushrif m RAE almojatre mushaadir f RAE almona mu'na f CP almoneda munaada m RAE almoraduj mardaquush almorávid muraabiT m CP almotacén muHtasib almotalefe mustaHlaf f RAE almozalla muSalla m CP almud mudd f CP almudena mudayyina f RAE almunia munya m CP aloque xaluuqi m RAE aloquín waqii m CP alquequenje kaakanj f CP alquería qarya m CP alquerque qirq m RAE alquez qaysaariya; RAE qisT, no a f CP alquibla qibla m CP alquicel kisaa' f CP alquimia kiimiyaa'

Page 22: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

m CP alquinal qinaac f CP alquitara qaTTaara f RAE alquitira kathiira m CP alquitrán qiTraan f RAE altabaca Tabbaaqa f RAE altamía Tacaamiiya m CP altramuz turmus f CP alubia lubya m CP aludel uthaal aluneb cunnaab f RAE ama amma (umm) m CP ámbar canbar f RAE añacea nazaaha f RAE anafaga nafaqa m CP anafe naafix añafea nafaaya m CP añafil nafiir m CP anaquel naqqaal m RAE añazme nazm m RAE andaraje daraj f RAE andorra ghanduura f CP anea naaya m RAE anejir nashiid f RAE aniaga nafaqa m CP añil niil f RAE anúteba nudba m CP arancel anzaal? m RAE arar carcar m CP arcaduz qaaduus m RAE argamandel xirqa mandiil f RAE argamula Hamuula, Haluum adj RAE argel rijl f CP argolla ghulla m RAE arimez cimaad? m RAE arjoran arjuwaan m RAE arrabá rabaac m CP arrabal rabaD m RAE arraclán from alacran m CP arrayán rayHaan m CP arrecife raSiif

Page 23: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

m CP arrelde raTl m CP arriaz ri'aas m RAE arricés rizaaz m RAE arrocabe rukkaab m CP arrope rubb f RAE artanita carTaniithaa m CP atabal Tabl m CP ataharre thafar f RAE atahorma taafurma m CP ataifor Tayfuur m CP atanor tannuur f RAE atanquía tanqiya f CP atarazana daar aS-Sinaaca f CP atarjea tajriya f CP atarraya TarraaHa m CP ataúd taabuut f CP ataujía tawshiya m CP ataurique tawriiq m RAE atifle athaafii f RAE atijara tijaara m RAE atincar tinkaar atoque Tawq m CP atun tun f CP atutía tuutiyaa' m CP auge awj m CP azabache zabaj/sabaj m CP azacán saqqaa' f CP azacaya siqaaya f RAE azache xazzaaj m CP azafate safaT m CP azafrán zacfaraan f CP azagaya Berber zaghaaya m CP azahar zahr m RAE azalá Salaa f RAE azamboa zanbuuca m RAE azaque zakaa f RAE azaquefa ? m CP azar zahr m RAE azarbe sarb m CP azarcón zarquun

Page 24: Grammatical gender via lexical statistics: The case …faculty.wcas.northwestern.edu/~maw962/docs/walter-spanloans.pdf · Grammatical gender via lexical statistics: ... Spanish nouns

f RAE azarja saaraja m RAE azarnefe zirnix azarote anzaruut m CP azófar Sufr f RAE azofra suxra m CP azogue zaa'uq m CP azor suur m CP azote sawT m CP azúcar sukkar f CP azucena suusana m CP azud sudd f RAE azufaifa zufayzafa m CP azul laazaward m RAE azúmbar sunbul