relativised minimality

CHAPTER 3

Minimality

3.1 Relativized Minimality

Relativized Minimality (RM) captures the intuition that a local struc-tural relation is one that must be satisfied in the smallest possible envi-ronment in which it can be satisfied. The original definition from Rizzi(1990), is given in (2).

(1) . . . X . . . Z . . . Y . . .

(2) Relativized Minimality : X α-governs Y iff there is no Z suchthat

i. Z is a typical potential α-governor for Y,ii. Z c-commands Y and does not c-command X.1

iii. α-governors: heads, A Spec, A Spec.

What is crucial for the purposes of our discussion is the actual implemen-tation of this definition; in particular that of typical potential α-governor.In other words, the focus will be on the definition of what counts as apotential intervener. As shown in (2-iii) above, the different types of α-governors the system cares about are heads vs. specifiers, and in the lat-ter case A vs. A . Empirically, (2) provides a unified account of: Wh is-lands (Huang, 1982), Superraising, Head Movement Constraints (Travis,

1 Note that intervention for minimality can and does exceed strict c-command andis commonly found in simple linear order. Gapping is the case at stake: John sellsbooks; Mary buys records and Bill V newspapers Rizzi (2004a, ex. 9, attributed toKoster (1978)); where the elided V can only be to buy.

57

58 3.1. RELATIVIZED MINIMALITY

1984; Chomsky, 1986; Baker, 1988), Pseudo-Opacity effects (Obenauer,1976, 1984), and Inner Islands (Ross, 1983). How RM accounts for theseeffects is briefly shown in the following examples (3)-(7).

• Wh-islands in (3) are banned as instances of A over A movement:

(3) a. *Howj do you wonder [whati [Andrea could sing ti tj ]]?

b. How do you think [ t’ [ Andrea could sing this song t]]?

• In Superraising long A movement of the non local subject (Jumain (4)) is blocked by an intervening subject hosted in another ASpecifier:

(4) a. *Juma seems that it is likely [ t to win].b. It seems that Juma is likely [t to win].

• Example (5) illustrates the Head Movement Constraint: inAux inversion the non-local head have cannot skip any other in-tervening head on its path.

(5) a. They have left.b. Have they left t ?c. They could have left.d. Could they t have left?e. *Have they could t left?

• Pseudo-Opacity effects: examples (6)a-b show that the whquantifier combien in the Spec of an NP can undergo A move-ment either alone (6)b (pace the Left Branch Constraint), or bypied-piping the whole NP. The first option however is not availableif an adverbial quantifier intervenes (6)d. Here beaucoup is takento occupy an A Specifier, and thus it qualifies as an intervener forA movement.

(6) a. [CombienHow-many

deof

livres]books

a-t-ilhas-he

consulteconsultpast

t

b. CombienHow-many

a-t-ilhas-he

consulteconsultpast

[of

tbooks

de livres]

How many books did he consult?

CHAPTER 3. MINIMALITY 59

c. [CombienHow-many

deof

livres]books

a-t-ilhas-he

beaucoupa-lot

consulteconsultpast

t

d. *CombienHow-many

a-t-ilhas-he

beaucoupa-lot

consulteconsultpast

[tof

debooks

livres]

How many books did he consult a lot?

• Inner Islands, exemplified in (7), are explained given the as-sumption that negation qualifies as a potential A binder.2 Nonextractability of the adjunct in the presence of Negation is thusforbidden because the antecedent will not be able to govern itstrace.

(7) a. It is for this reason that I believe that Alexis waselected t

b. *It is for this reason that I don’t believe that Alexis waselected t

3.1.1 Argument/Adjuncts asymmetries

Example (8) exemplifies the phenomenon (first noted by Huang 1982)of argument/adjuncts asymmetry with respect to extraction from WeakIslands. The classical distinction states that arguments can extract butadjuncts cannot.

(8) a. Which songi do you wonder [howj [Andrea could sing ti tj ]]?b. *Howj do you wonder [[which song]i [Andrea could sing ti

tj ]]?

A classical way to account for this distinction is to refer to the EmptyCategory Principle (ECP), which requires empty elements to be (prop-erly) governed (9). That is empty elements must be selected by a lexicalcategory or locally governed by its antecedent (for details, see the orig-inal analysis in Huang 1982; Lasnik and Saito 1984, 1992).

(9) ecp: Every trace must be properly governedProper Government: α properly governs β if:

i. α θ-governs or

2See Frampton (1991) and below for critical discussion about the status of Adverbsand Negation as A Specifiers.


ii. α antecedent governs β.

In this line of analysis Arguments, which are lexically selected by a headand thus theta governed, while adjuncts, which are not lexically selected(and thus lacking theta government) need to satisfy the second clauseof the ECP, i.e. via antecedent government.Under this view, successful extraction of the moved object in (8-a) isexplained because even if the trace of which song is not antecedentgoverned, it is theta governed by the verb. The trace of the adjunctin (8-)b, on the other hand, does not fall under either of the clausesin (9): [i.] it is not lexically selected by the verb, thus it is not thetamarked; and [ii.] it is not governed by its antecedent, given that thelatter occupies a position too far away from it. Non local movement ofadjuncts is thus correctly predicted to be problematic.Rizzi, capitalizing on work by Guglielmo Cinque (1990) and Ileana Co-morovski (1989), shows however that the argumental status of the movedelement alone does not allow felicitous extraction out of Weak Islands.The relevant cases here are non-referential elements such as lexically se-lected adverbs, measure phrases, idiom chunks . . . Rizzi shows that someadditional properties, i.e. referentiality and D-linking (in the sense ofPesetsky 1987), are needed. See Cinque (1990); Comorovski (1989) formore detailed discussion.Rizzi (1990) notices that there are cases of bona fide arguments thatbehave like adjuncts when it comes to extraction from weak islands.The example in (10) serves to illustrate this point.

(10) What did the cannonball woman weigh t ?

Given the ambiguity of the verb weigh between an agentive and a sta-tive reading, the question in (10) can be felicitously interpreted as con-structed on both a theme, as in (11-a), and a measure phrase (11-b).

(11) a. The hiker weighed apples/his backpack before checking in/thearguments against his hypothesis.

b. The hiker weighed 120 kilos.

The crucial observation here is that, as (12) illustrates, only the readingin (11-a) survives extraction out of weak islands.

(12) What did Remedios wonder whether the hiker weighed t ?


An explanation of these facts based on the argument/adjunct distinctionfails to make the correct predictions, given that both 120 kilos andapples/backpack/arguments qualify as arguments.Rizzi proposes to draw a line between thematic roles assigned to refer-ential arguments (those arguments which refer to a participant in theevent), dubbed referential θ-roles, and thematic roles assigned to non-referential quasi -argument (expressions which, though being lexicallyselected, do not participate in the event described by the predicate).Rizzi further assumes that only referential arguments are assigned a ref-erential index at D-Structure, which they will be able to carry alongwhen moved. The availability of a referential index opens up the pos-sibility for traces of referential arguments only to enter both a binding(13) and a government relation with their antecedents. This option isnot available to traces of quasi-arguments given that they lack a refer-ential index, which is a necessary condition for a binding relation to beestablished.

(13) Binding: x binds y iff:

i. x c-commands y, and

ii. x and y share the same referential index.

Taking the availability of binding to be the caveat opens the way for astraightforward explanation for the extraction out of weak island facts.In fact, we know independently that pronominal binding does not respectlocality restrictions, as shown in (14)

(14) a. [Every professor]i wondered whether John weighed all thearguments against hisi hypothesis.

b. [No professor]i can predict how many students will choosehimi (as supervisor).

c. [Which politician] appointed the journalist who supportedhimi?

Moreover Rizzi (1990), following Jaeggli (1982) and along the lines dis-cussed in Rizzi (1986), proposes that the ECP must be restated in aconjunctive form, in order to satisfy the two requirements imposed onnull elements. These two requirements are formal licensing, “which char-acterizes the formal environment in which the null element can be found,and a principle of identification, which recovers some contentive propertyof the null element on the basis of its immediate structural environment”(Rizzi, 1990, p.32). The conjoined ECP is presented in (15), from Rizzi


(1990, ex. 11, p.32).

(15) A nonpronominal empty category must be

i. properly head-governed (Formal licensing)

ii. Theta-governed, or antecedent-governed (identification)

Given the formulation in (15), traces of manner adverbials (and thoseof measure phrases, idiom chunks...), given their non-referential nature,cannot be co-indexed with their antecedent: their only option for ex-traction is thus via successive-cyclic movement as in (16).

(16) a. How do you think [t’ that [Berit will sing the song t]]b. *How do you wonder [which song [Berit will sing t t]]

In (16)a each trace is governed by its antecedent, but not in (16-b)where CP is occupied by the intervening which song that qualifies as apotential bearer of the relevant relation.Referential arguments (see (8)a), on the other hand, beside having thepossibility to move successive-cyclically, also have the additional optionto undergo long distance movement. This is so because their referentialnature allows them to carry a referential index and then to be boundby their antecedent.3 Since binding is not constrained by locality prin-ciples (it can hold across strong and weak islands), referential/D-linkedarguments can violate ECP and move over weak island inducers.In short, the argument/adjunct asymmetry reduces to the possibility forreferential D-linked argument to undergo either successive-cyclic move-ment or, in cases like (8) long-distance movement (binding), while ad-juncts (not being referential) are restricted to the successive-cyclic op-tion4. We will come back to this issue in 3.1.7.

3.1.2 Conceptual/Empirical Problems and solutions

In the years following the publication of Rizzi (1990), larger sets of datawere taken into account and specific empirical problems were addressed,mostly maintaining the underlying insight brought up by that seminalwork. We will review some of the empirical problems in section 3.1.3

3On conceptual and empirical problems with referentiality as used here see Framp-ton (1991).

4This account also provided a simple explanation for the better extractabilityof temporal and locative (not θ-selected but still having referential properties) withrespect to manner adjuncts.


and 3.1.7. On top of these empirical issues there are other conceptualproblems which are tied to recent changes in the principles and param-eters framework (Chomsky, 1993, 1995, and much related work).5 Anytheory of locality (and for that matter any theory of single aspects of abroader theory) may be formulated in a particular framework and makesfull use of the theoretical objects, tools and explanations available inthat framework. Moving to different frameworks requires adjusting thetheory, taking into account possible changes in the status of those the-oretical objects. Of course this adjustment is a complex phenomenonand sometimes the specific theory may indicate new directions in thedevelopment of a new framework. It might also be the case that no ad-justment is possible and a new theory of those specific phenomena haveto be newly formulated.The Government and Binding theory, that served as a framework toRM, has been gradually abandoned in recent years while a new Min-imalist perspective has been establishing its primate in the principlesand parameters framework (see Chomsky 1993, 1995 and related work).A discussion of the reasons behind this change lays beyond our presentinterest. Nevertheless, it is important for us to understand how compat-ible RM is to the minimalist ideas that are shaping the field and whatadjustment are needed in view of this change.From a minimalist perspective, RM seems to be the ideal explanationof complex issues such as locality. Rizzi states:

“RM can be intuitively construed as an economy principle inthat it severely limits the portion of structure within which a givenlocal relation is computed: elements trying to enter into a localrelation are ‘short sighted’, so to speak, in that they can only seeas far as the first potential bearer of the relevant relation. Theprinciple reduces ambiguity in a number of cases: whenever twoelements compete for entering into a given local relation with athird element, the closest always wins. So, whatever its preciseimplementation, RM has desirable properties and appears to be anatural principle of mental computation. It is the kind of principlethat we may expect to hold across cognitive domains: if locality isrelevant at all for other kinds of mental computation, we may wellexpect it to hold in a similar form: you must go for the closestpotential bearer of a given local relation.”(Rizzi, 2004a, p.224).

5The division made here between empirical and conceptual issues is biased bythe particular choice of presentation one makes. Many conceptual issues lie behindquestions addressed in section 3.1.3 and 3.1.7 such as the right characterization ofe.g. D-linking or specificity, or the necessity to make reference to Class of featuresrather than single features among the same class.


A fast survey of the minimalist literature will show that indeed someversion of RM is generally assumed (for an overview on derivationalapproaches to locality in minimalism see Fitzpatrick 2002). On the otherhand, working minimalism got rid of some of the theoretical objects RMused. These old tools (the Empty Category Principle, government andreferential indices are the most important examples) have not alwaysbeen replaced by new ones with similar characteristics. Establishingthe place of the old theory in the new framework is a complex thoughnecessary task when one foresees the underlying harmony between thetwo.In the following sections we will mostly concentrate on some empiricalproblems raised by RM as formulated in the original work and considerhow more recent approaches deal with those problems. In section 3.1.3we will consider problems for the treatment of Island effects in terms ofthe A/A distinction; we will follow Rizzi (2004a) and adopt a feature-class based approach that allows deriving the empirical patterns. Insection 3.1.6 we will address the issue of extraction from Weak Islandsand the problems raised by the referentiality approach and we will adoptStarke (2001)’s feature geometric model extending the feature-class ap-proach previously introduced. These modifications on the one side solveimportant empirical problems and on the other allow replacing some ofthe tools (such as referential indices) used in previous formulations ofthe theory. These changes result in a clearer status of RM theory in thenew minimalist framework.

3.1.3 Locality and the Left periphery

A substantial amount of evidence, accumulated over the years followingthe publication of RM, shows that the distinctions made by the principle,as formulated in Rizzi (1990), are too coarse and that the resultingpicture is too restrictive. A short list of the relevant facts is providedbelow.

• Not all elements moved to an A specifier are subjected to RMeffects: e.g. wh-phrases with special interpretive properties (D-linking, specificity, . . . )6 are not, as shown in (17-a,b).

• Not all adverbs affect wh-movement in the same way (17-d,e).

6As it will be shown below, this pattern exceeds the classical argument-adjunctasymmetry discussed earlier.


• Not all intervening A specifiers trigger minimality effects on Achains. For example, an intervening left-dislocated phrase deter-mines only a mild degradation on both argument and adjunct (andon both referential/non referential) movement in Italian (17-f,g).

(17) a. ?Which problem do you wonder how to solve<which problem>?b. *How do you wonder which problem to solve <how>?c. Combien

How-manydeof

livresbooks

a-t-ilhave-he

attentivementcarefully

consulteconsulted?

d. CombienHow-many

a-t-ilhave-he


consulteconsulted

[<combien>t

deof

livres]?books

‘How many books did he carefully consult’?e. *Combien

How-manya-t-ilhave-he

beaucoupa-lot

consulteconsulted

[<combien>t

deof

livres]?books‘How many books did he consult a lot?’

f. NonNot

soknow1stsing

ato

chiwhom

credithink2ndsing

che,that,

questathis

storia,story,

dovremmoshould

raccontaretell1stpl

<questa<this

storia>story>

<a<to

chi>.who>.

‘I don’t know to whom do you think that we should tell thisstory’.

g. ?NonNot

soknow1stsing

comehow

pensithink2ndsing

che,that,

ato

Matteo,Matteo,

glicl.

dovremmoshould1stpl

parlaretalk

<a<to

Matteo>Matteo>

<come>.<how>

‘I don’t know how you think that, to Matteo, we shouldtalk to him’.

The examples above clearly show that a definition of intervention basedon the traditional distinctions (heads vs. specifiers and in the latterclass A vs. A ) is too restrictive and under-generates.Chomsky’s (1995) Minimal Link Condition provides a different viewwhich capitalizes on the idea that syntactic movement is driven by thenecessity to check morphosyntactic features.

(18) minimal link condition: K attracts a only if there is no b,b closer to K than a, such that K attracts b. (Chomsky, 1995,p.311)


However, as Rizzi (2004a) points out, an approach along the lines ofChomsky’s Minimal Link Condition is too permissive. The MinimalLink Condition in fact defines intervention in terms of total identity offeature structure. However, quantificational adverbs and negation aredifferent in their featural make-up from wh-elements and yet they bothblock wh-movement (see (17-e.) and the inner island examples in (7)respectively).

3.1.4 Feature classes

Summing up, the original formulation based on the A/A distinction istoo restrictive and under-generates. As (17) shows, this formulationblocks the construction of well formed chains. On the other hand, asolution centered on the distinction between single features is clearlytoo permissive: it over-generates allowing non well formed syntactic ob-ject to be built. Rizzi (2004a) shows that the correct pattern seem toemerge once we recognize the existence of natural classes of syntacticfeatures. The solution he proposes makes fundamental use of a sophisti-cated approach to the structure of the clause along the lines of the recentCartographic Approach. The Cartographic Approach is the attempt todraw maps of syntactic configurations as detailed and precise as possible(see Belletti 2004b; Cinque 1999, 2002; Rizzi 1997, 2004b). Rizzi showsthat the cartographic studies offer a series of positions, which we cancontinue to define as A for convenience, but which can provide us withthe required more fine grained distinctions.

(19) Force Top* Int Top Focus Mod* Top* Fin IP (Rizzi, 1997,2004a)

Each of these positions can be defined by the particular set of mor-phosyntactic features that can occupy it. Such features can be catalogedby referring to the ‘class’ they belong to:

(20) a. Argumental: person, gender, number, Case.b. Quantificational: Wh-, Neg, measure, focus . . .c. Modifiers: evaluative, epistemic, Neg, frequentative, celer-

ative, measure, manner . . .d. Topic.

Formulating minimality in terms of this classification allows us to avoidthe excessive freedom of movement generated by the Minimal Link Con-dition on one hand, and the extreme restriction generated by the simple


A/A distinction on the other. (21) is a formalization of this idea (takenfrom Rizzi 2004a, p.).

(21) minimal configuration: Y is in a Minimal Configuration(MC) with X iff there is no Z such that

i. Z is of the same structural type as X, and

ii. Z intervenes between X and Y.

→“Same structural type” = Spec licensed by features of thesame class in (20).

Given the above formulation, we expect RM effects only between featuresthat belong to the same class, but not among features that belong todifferent classes. This formulation solves the problems listed in (17) andseveral other problematic facts listed below/

• Only quantificational adverbs affect wh-movement of adjuncts (22).This is predicted by the definition in (21) since these elements be-long to the same Quantificational class. On the other hand, ad-verbials such as attentivement in (22-b.), which do not have anyintrinsic quantificational property, belong to the class of Modifiers.As predicted by (21), this class does not interfere with Quantifi-cational movement.7

(22) a. *CombienHow-many

a-t-ilhas-he

beaucoupa-lot

consulteconsultpast

[<combien>[<how-many>

deof

livres]?books]?

‘How many books did he consult a lot?’b. Combien

How-manya-t-ilhave-he


consulteconsultpast

[<combien>[<how-many>

deof

livres]?books]?

‘How many books did he carefully consult?’

• All intervening adverbs block simple (non-focal) preposing of anadverb to the left periphery.8 Preposing would require moving amember of the Modifier class over another member of the sameclass.

7Here and below, capitals are used to refer to classes of feature, small caps areused for single features.

8Rizzi cites Koster (1978) who discusses similar data in Dutch.


(23) a. I tecnici hanno (probabilmente) risolto rapidamenteil problema.The technicians have probably resolved rapidly theproblem.

b. *Rapidamente, i tecnici hanno probabilmente risolto ilproblema.Rapidly, the technicians have probably resolved theproblem.

• As pointed out by Cinque (1999), blocking by a higher adverb doesnot occur if a lower adverb movement targets a focus position.Crucially, we must look at the relation which is currently beingbuilt: focus movement qualifies as Quantificational and as suchcannot be blocked by Modifiers. 9 (24) illustrates this point inGerman and (25) in Italian.

(24) SEHR OFT hat Karl Marie wahrscheinlich<SEHR OFT>gesehen.VERY OFTEN has Karl Marie probably seen

(25) RAPIDAMENTE i tecnici hanno probabilmente<RAPIDA-MENTE> risolto il problema (non lentamente).RAPIDLY the technicians have probably solved the prob-lem (not slowly)

• Negation belongs to both the Quantificational and the Modifiersclass. It is predicted that it will block both simple adverb prepos-ing and movement of an adverb to a focus position.

(26) a. Rapidamente, i tecnici (*non) hanno risolto il prob-lema.Rapidly, the technicians have (not) solved the prob-lem.

b. RAPIDAMENTE i tecnici (*non) hanno risolto il prob-lema.RAPIDLY the technicians have (not) solved the prob-lem.

9Alternatively one can think that the presence of a focus feature on the movingadverb qualifies it as a member of the Quantificational class. See also Szendroi (2001);Reinhart (2006) for a very different view on Topic, Focus and the Syntax-PhonologyInterface. See also the recent work by Slioussar (2007) on the topic.


• However, if the adverb has topical properties (having being men-tioned in the preceding discourse) neither intervening adverbs nornegation block its movement.

(27) A : Credo che i tecnici abbiano rapidamente risolto en-trambi i problemi.‘I believe that the technicians have rapidly solvedboth problems’.

B : Ti sbagli: Rapidamente, i tecnici hanno probabil-mente risolto IL PRIMO PROBLEMA, ma non il sec-ondo, che era piu difficile.‘You are wrong: rapidly, the technicians have proba-bly solved the first problem, but not the second, whichwas more difficult’.

In sum, the characterization of RM given in (21) reshapes the minimalityputting classes of features at the center of the explanation. Provided thisdefinition, we have been able to address most of the empirical problemslisted in (17).

3.1.5 More asymmetries in extraction

The problematic cases considered so far could easily be dealt with oncethe relevant level of application of RM was individuated. The case ofthe argument/adjunct (referential/non-referential) asymmetry consid-ered above, however, cannot be easily handled in these terms. As wehave seen above Huang (1982), Lasnik and Saito (1984, 1992) take therelevant distinction to be that between argument and adjuncts, whereonly the latter are claimed to be sensitive to Weak Islands (WI) essen-tially for reasons related to the ECP. Ross (1983), Kroch (1989), Co-morovski (1989), Rizzi (1990), Cinque (1990) on the basis of exampleslike those in (28) claim that argumenthood by itself is not a sufficientproperty and identify the ‘something more’ with referentiality and D-linking.

(28) a. *What didn’t John say that the fish weigh <what>? (Ross)b. *The fish weigh.c. *How did John ask whether to behave <how>? (Rizzi)d. *John behaved.

According to the interpretation in Rizzi (1990), amount and mannerphrases can be arguments but they do not participate in the event in


any way, thus they lack a referential θ-role. Locative and temporalmodifiers, on the other hand, are not arguments but can be consideredat least partially referential, since events occur in a specific place andtime. This is what makes them partially extractable from Weak Islands.

(29) a. ?Where did Bill asked whether to park the car <where>b. ?When did Bill ask whether to fix the car <when>

Cinque (1990) refines the distinction with the observation that, in ad-dition to referentiality, D-linking (in the sense of Pesetsky 1987) is alsorequired for successful extraction from Weak Islands (eWI).

(30) a. *How many problems are you wondering whether to solve<how many problems> before dark?

b. How many problems from this list are you wondering whetherto solve <how many problems> before dark?

As Starke (2001) noticed, there seems to be a general consensus aroundthe fact that elements that are successfully extractable from WI have‘something more’ with respect to non extractable elements. The spe-cial property is identified with case or DPhood in Manzini (1992); Rizzi(2000), while in Frampton (1990), Szabolcsi and Zwarts (1993), Cresti(1995), Dobrovie-Sorin (1994), the distinction is argued to be the onebetween individual/non individual status of the trace (i.e. having se-mantic type <e> ). Szabolcsi and Zwarts’s (1997) approach is a possibleexception to this rule, even if Starke noticed that also in this accountan additional property of a good extractee is identified, namely in therichness of internal semantic structure.Starke capitalized on this observation and proposed a feature geometricaccount of these phenomena. The importance of this solution is that itallows the derivation of the problematic examples without enriching theminimality principle itself.

3.1.6 Starke’s feature trees approach

Starke (2001) is an important step toward a unified approach to localityin syntax. Starke’s main aim is to provide a unified analysis of WeakIslands, extraction out of weak islands and Strong Islands under Rela-tivized Minimality. As he points out:

“. . . despite long-standing efforts to produce a unified the-ory of these phenomena, every current approach treats them


with a disjunction of tools. A typical situation is that weakislands are explained by a version of Relativized Minimality(Minimal Link Condition, Attract Closest, etc.), extractionsout of weak islands by some form of binding relationships,and strong islands by a version of ‘barriers’.”

The novelty of his approach, which allows him to capture the three casesunder RM, lays in the intuition that this simple constraint might oper-ate on morphosyntactic feature trees rather than on simple unorganizedbundles of features.Successful extraction from Weak Island (the traditional argument/ ad-junct asymmetry) in particular, requires referring to different mecha-nisms not reducible to RM itself (see Rizzi 2000). Deriving the relevantpatterns from the same basic principle without making reference to anyadditional mechanisms, like coindexation and binding, (no matter hownatural or logical) would clearly be preferable in terms of economy ofthe theory.Starke’s claim is that indeed we do not need to add anything to theprinciple itself but rather we should proceed with a closer investigationof the structure of the data the principle operates on. Proceeding inthis way we might find out that the cases of apparent violation of theprinciple constitute in fact further evidence in favor of the principleitself. To illustrate the idea let’s take a look at (31):

(31) *C . . . C . . . C

(31) illustrates the point made in the preceding section: movement trig-gered by a feature α of a category C is blocked by the presence of anintervening element of the same class C. We can thus follow Starke inhis notation:

(32) *α . . . α . . . α

We know that this configuration will be ruled out by RM. Let’s imaginenow that the class C has a subclass SC and that we can identify afeature β, such that β ∈ SC. Given that SC is a subclass of C β will bea member of both class C and SC. Let’s follow Starke again rewriting βas αβ for convenience. Now consider the two configurations in (33) and(34) below:

(33) αβ . . . α . . . αβ


Since αβ in (33) is a member of both C and SC it is able to move as amember of the C or the SC class. Although this particular configurationmakes the C movement unavailable because of the intervention of α(another member of the C class), αβ will still be able to move as anmember of the SC class. Crucially, no minimality effect arises. Starkeobserves that in extraction from WI we are dealing with something likethe abstract configuration in (33). That is to say the role played by theadditional feature β is the same role played in extraction from WI by the‘extra property’ (discussed above) distinguishing the moving wh-phrasefrom the intervening wh-element (34).

(34) [Which car]αβ do you wonder whetherα to fix <which

SC

car>αβ?

The situation is reversed in (35). Here intervention of αβ blocks move-ment of α. α, in fact, can only move as a member of the C class andthe intervening αβ also belongs to this class (being a member of a SC,a subclass of C).

(35) *α . . . αβ . . . α

This is the case of unsuccessful extraction from WI, illustrated in (36).

(36) [How]α do you wonder which carαβ to fix <how>α?×

Starke claims that the additional property distinguishing between ex-tractable and non-extractable elements is specificity, and, more impor-tantly, that there is subset/superset relation between the Quantifier class(the class of normal wh-elements, as in Rizzi 2004a) and the SpecificQclass (a subset of the Quantifier Class, represented by those elementsbelonging to this class but having the additional property of being spe-cific in a sense to be made more explicit below). Starke uses a fea-ture geometric approach similar to those developed for phonology byClements (1985); Sagey (1986) and morphology by Ritter and Harley(1998); Harley and Ritter (2002b,a). Starke’s feature tree is illustratedin (37). The nodes in the tree stand for Classes of features (φ: person,gender and number features).


(37)

Quantifier

SpecificQ

M A[φ]

In the following pages we will follow the basic arguments that lead tothe postulation of exactly these subclasses.

3.1.7 Extraction from Weak Islands

Weak Islands differ from Strong Islands in that extraction is totallybanned from the latter but marginally possible from the former. Itis important to stress that the weak/strong labeling of these syntacticenvironments is not related to a degree of ungrammaticality.As already mentioned above, the question of what makes extraction outof WI partially possible, or the issue of the right characterization ofthe distinction between elements that can or cannot survive eWI, hasreceived different answers. The set of phenomena that goes under the WIlabel has increased considerably over the years and more importantly,the detailed observation of the different behavior of various kinds ofelements in the WI environment has sharpened the characterization ofwhat properties are to be considered essentials in eWI characterization.10

Starke capitalizes on the fact that there is a general consensus aroundthe fact that elements that are successfully extractable from WI have‘something more’ with respect to non extractable elements. The issueof course becomes more complex when it comes to the definition of thisadditional property.Starke takes the ‘additional property’ allowing an element to extractfrom a WI to be specificity, intended as a special case of existentialpresupposition. Let’s consider some examples11 to show how this specialproperty interacts with Q type interveners in eWI. Consider (38).

(38) Giorgos and Anca love the TV series South Park ; Giorgos how-ever is luckier than Anca in that he gets to watch all the episodesway ahead of her, for this reason he gets to know all the newestjokes from the series before Anca.Berit on the contrary doesn’t like South Park and doesn’t want

10For a comprehensive critical review of the WI literature see Szabolcsi (2002)11All examples in this section are adapted from Starke’s original work.


to know anything about it. One day Berit tells Giorgos thatapparently Anca discovered some new joke about the series. Atthis point Giorgos says: ‘I wonder what Anca discovered! ’. Afterthis, Berit asks:a. . . . and what do you think that Anca discovered?b. #. . . and what do you wonder whether Anca discovered?

(I will follow Starke in using # to indicate a sentence which is grammat-ical but inappropriate in a given context) Crucially a little variation inthe context makes the eWI perfectly felicitous:

(39) Suppose that after hearing Berit’s comment on Anca’s new dis-covery Giorgos exclaims: I wonder what Anca discovered . . . couldit be. . . ? and stops in the middle of the sentence. And now Beritasks:a. so? what do you think that Anca discovered?b. so? what do you wonder whether Anca discovered?

The (b) examples in (38) and (39) show that eWI is possible only if thereare reasons to believe that there exists some entity which the interlocutorhas in mind as a referent for the wh-phrase. This requirement is notnecessary for grammatical extraction in the (a) examples.Consider now (40), also adapted from Starke.

(40) Berit puts a lot of fantasy in her cooking and it is customaryto eat food from every country when she cooks. Tonight we aregoing to have dinner at her place. I know that you do not haveany idea about what Berit will cook tonight, and that you arecurious, so while we bike there I ask:a. what do you hope that Berit will cook tonight?b. #what do you wonder whether Berit will cook tonight?

Starke also notes the interesting fact that if we insert the two questionsin (40) into a cleft makes them become uniformly odd (given the samecontext):

(41) a. #what is it that you hope that Berit cooked?b. #what is it that you wonder whether Berit cooked?

(41-a) is not a weak island, nevertheless the semantics of clefts (which re-quires existential presupposition of the clefted element) is not compatiblewith a context that does not allow a specific reading of the wh-element.


Again the (b) example in (41) can be made felicitous if I have any rea-son to believe that you have something in your mind about Berit’s newrecipe. Starke take these facts to show that the same property (speci-ficity) that underlies eWI and acceptability of cleft sentences in this kindof context. This property turns an element of the Quantifier class intoa member of the SpecificQ subclass and makes it extractable over otherQ interveners.The existential presupposition can take generally take wide scope (42-a)or narrow scope (42-b) with respect to an intervening epistemic predi-cate, as shown in (42).

(42) a. what is it that you think that Anca discovered?b. what do you think it is that Anca discovered?

wh in-situ

To show that the relevant property allowing eWI is indeed specificityStarke brings French wh in-situ into the picture. In French, wh in-situis a grammatical option:

(43) a. tuyou

ashave

ditsaid

qu’ilthat-he

ahas

mangeeaten

quoi?what?

what did you said that he has eaten?

b. tuyou

croisbelieve

qu’ellethat-she

s’appelleis-called

commenthow?

what do you think her name is?

French wh in-situ differs from echo questions in intonation and interpre-tation (for different approaches to wh in-situ see Chang (1997); Cheng(1991, 2003a,b); Cheng and Rooryck (2003); Reinhart (1998); Polettoand Pollock (2000), a.o.). We will come back to wh in-situ in the follow-ing chapter; our main interest in the present discussion is focused on theinteraction of wh in-situ with syntactic islands. Strangely enough, wh-phrases in-situ can occur inside strong islands, as shown in (44) (fromnow on I will follow Starke in using ‘situ-wh’ as a shortcut for ‘wh-phrasein-situ’).

(44) a. tuyou

croisbelieve

qu’ellethat-she

ahas

ditsaid

cathis

pourto

inciterincite

JeanJ

ato

seduireseduce

qui?whom?


b. tuyou

croisthink

qu’ilsthat-they

vontwill

inviterinvite

ceuxthose

quithat

onthave

faitmade

quoi?what?

(44-a) shows the grammaticality of situ-wh inside an adverbial clause;(44-b) that situ-wh can occur inside a Complex NP Island.The crucial point here is that the presence of a situ-wh in a WI con-text leads to sharp ungrammaticality (45), unless a special intonationis added, which triggers the presuppositional interpretation necessary ineWI.

(45) #tu crois qu’elle a pas fait quoi?you think that-she has not done what?

Example (46) shows that the relevant factor making eWI possible isspecificity and not range.12

(46) a. *TuYou

ameraiswould-like

avoirto-have

cette/mathis/my

photopicture

deof

qui?whom?

b. TuYou

ameraiswould-like

avoirto-have

uneone

desof-the

photospictures

deof

qui?whom?

In (46-a) situ-wh is blocked by an intervening specific NP; (46-b) how-ever, where an NP expressing range intervenes, is perfectly grammati-cal.13

12The possibility to include both range and specificity extending further the featuretree is not undertaken in (Starke, 2001). It does not seem too unnatural to think thatthey can also be thought of in terms of a subset/superset relation, where specificificity,of course, would be a subset of range.

13Valentina Bianchi (p.c.) points out that the facts in (46) can be interpreted asintervention effects under a dominance based approach but not under c-command. Forthis and other reasons (quite independent from the present issue) the former approachis adopted by Starke. As (45) and (i-a,b) (taken from Mathieu 1999, 2002) show, thesame effects are found under c-command. Nevertheless, as Marjo van Koppen (p.c.)points out, these examples can also be explained under dominance, while (46) cannotbe explained by c-command.

(i) a. *TuYou

n’aneg-have

pasnot

vusee

qui?who?

Who didn’t you see?b. *La

Thefillegirl

neneg

dortsleeps

pasnot

suron

qui?who?

Whom didn’t the girl sleep on?


From the data shown above Starke concludes that situ-wh undergoespure Q movement (they are blocked by Q intervener unless they havean additional property) and that the property involved in eWI is indeedspecificity and not range. This, he claims, also provides a natural ex-planation for Definite Islands. As (47) shows, wh-movement out of adefinite NP generates a much sharper degradation than extraction froman indefinite NP.

(47) a. who did you want to buy a picture of.b. ?*who did you want to buy the picture of.

Starke capitalizes on Enc (1991), who shows that specificity is again thecrucial factor at stake here. What is generally referred to as ‘definite-ness’ effect is in fact an effect of specificity: as (48) shows, non-specificdefinites do not trigger the effect while specific indefinites do.

(48) a. who did they announce the birth of?b. which film did you miss the first part of?c. ?*who did you want to buy a picture of?

In sum, there are good reasons to support Starke’s proposal. In par-ticular, specificity seems to be the relevant property at the base of theasymmetric behavior of different elements in extraction from WI. Aswill become clear below, Starke’s account is of primary relevance forthe present work because many of the intuitions expressed here mirrorin a non-trivial sense those explored in his work. Both accounts try toextend the set of phenomena that fall under principles of anti-identity.Starke shows the positive effects on extractability of the representation,through dedicated features, of a rich discourse/context information. Inthe following section I investigate the negative effects on extractabilitythat arise from the underspecification of the representation of those samefeatures.

3.2 Generalized Minimality

Given the revised formulation of RM introduced above, it follows thatthe possibility to form a chain over an intervening element will dependon the nature and the number of features activated in a given syntacticrepresentation. Starting from the familiar configuration in (49) we knowthat no relation can be built between X and Y if Z (Z an element c-commanding Y and c-commanded by X) is of the same structural type

78 3.2. GENERALIZED MINIMALITY

as X.

(49) X . . . Z . . . Y

Following Rizzi (1990, 2004a); Starke (2001) we have given a definitionof ‘same structural type’ in terms of the particular type or class of theset defined by the particular morphosyntactic features associated withit. Rizzi (2004a) provides evidence for the partition of the system in atleast four classes: Argumental, Quantificational, Modifier, Topic.Minimality effects arise only in the presence of an intervening elementwhose feature set belongs to the same class the probe belongs to. Wecan thus rewrite the schema in (49) as in (50).

(50) (α, β, γ, δ)Class∆ (α, β, γ)ClassΓ (α, β, γ, δ)Class∆

X . . . Z . . . Y∆

In (50) a particular set of morphosyntactic features (represented withGreek letters) is associated with every node. Given this configurationRM should permit the formation of an abstract relation ∆ between Xand Y. The presence of the feature δ suffices for RM to ‘see’ the differencebetween X and Z and therefore to authorize the movement of Y overZ. Recall from the discussion in the previous chapter that a differenceon a single feature of the same class (e.g. a mismatch in gender ornumber features) is not enough to avoid a minimality effect unless thatfeature introduces a change of class. It is necessary then to think aboutδ as the distinctive feature of the particular head within the relevantrelation we are considering. To illustrate, the distinctive feature couldbe a wh- feature in the relevant head of the CP layer, whose presencesuffices to imply a change of class of the relevant set from Argumentalto Quantificational.

(51) (α, β, γ,wh-)ClassQ (α, β, γ)ClassA (α, β, γwh-)ClassQ

X . . . Z . . . YQ

Changing the nature and number of the features associated to a partic-ular node in the syntactic tree, we should expect a variation in termsof legitimacy to form a chain, especially if such modifications imply thechange of class in the sense defined above.


(52) (α, β, γ)ClassA (α, β, γ)ClassA (α, β, γ)ClassA

X . . . Z . . . Y

*

If for any reason the wh-feature is missing (as in (52)), X and Y wouldnot be in a local configuration anymore: its absence in fact turns X intoa member of the Argumental class and thus Z qualifies as a potentialintervener.

3.2.1 Processing derived structural deficit

In the last section of Chapter 2 I hypothesized that a (temporal orpermanent) reduction of processing capacities can lead to an underspec-ification of the morphosyntactic feature sets normally associated withthe elements in the syntactic tree.

(53) feature underspecificationagrammatic aphasics cannot represent the full array of mor-phosyntactic features associated with syntactic categories.Underspecification selectively targets scope-discourse features;that is, features at the edge of nominal, verbal and clausal do-mains.

Selective minimality effects can be expected to arise as a natural conse-quence of this underspecification. The main consequence of this assump-tion is that comprehension patterns in Broca’s aphasia can be thoughtof as the consequence of underspecification, or an impoverishment inthe number and quality of morphosyntactic features in their syntacticrepresentation. Underspecification can in turn give rise to selective min-imality effects.The representation of an object-cleft in normal adult speakers is schema-tized in (54).14 RM authorizes the formation of the relevant chains be-tween the moved NPs and their traces in virtue of the difference betweenthe features set associated with the subject and that associated with theobject NPs.15

14Non crucial details are omitted; indices are used for explanatory purposes only.15As Martin Everaert (p.c.) made me notice, theta specification is irrelevant for

minimality, i.e. an object would always be different from a subject and no minimalitywould ever occur. In these, and the following, examples theta specification is used forexplanatory purposes only, specifically to show that, because of RM, theta assignmentfails.


(54) (D,N,θ2,φs,acc,wh)ClassQ (D,N,θ1,φs,nom)ClassA (D,N,θ2,φs,acc,wh)ClassQ

It is the boyi [whoi [the girl]j [ <the girl>j kissed < who >i]]

The presence of the wh-feature defines the object <who> as a memberof a class distinct from the one to which the subject <the girl> belongsto. The former belongs to the Operators class while the latter belongsto the Argumental class.In (55) the representation of the same structure by an agrammatic apha-sic is schematized.

(55) (N,θ?,φs,. . . )ClassA (N,θ?,φs,. . . )ClassAIt is the boyi [whoi [the girl]j [< . . .>? kissed <

×. . .>?]]

The impoverishment of the set of features, and more specifically theabsence of the wh-feature leads RM to block chain formation. It is con-sequently impossible to assign the correct theta-role to each argument,which leads to poor comprehension16.This analysis predicts a different pattern to arise with subject relatives,which are in fact correctly interpreted by agrammatic patients. In thesestructures no DP intervenes between the moved constituent and its trace,hence no RM effects are expected to arise (56).

(56) It is the boyi [whoi [ <the boy>i loved the girl]]

Crucially, underspecification does not always have to generate compre-hension (or production)17 difficulties. Even an underspecified represen-tation of a subject cleft allows us to form the relevant chain and torecover the thematic information, since no potential binders intervenebetween the moved element and its trace. Only in certain precisely de-finable conditions underspecification will give rise to minimality effects:for example, minimality effects arise in structures that require movementof a DP over another DP, or more generally structures that establish adependency over a potential intervener. Here ‘potential intervener’ losesthe connotations it carries in the treatment of standard minimality ef-fects (describing an element of the ‘same structural type’) to acquire a

16See below for a more detailed review of comprehension patterns in agrammatismand for additional discussion of these points.

17For a discussion on similar effects in production see Garraffa and Grillo (2008)and the discussion in Chapter 6.


more literal interpretation of potentiality: an element which qualifies asan intervener under certain conditions (i.e. in case of underspecifica-tion).The argument proposed here is essentially the inverse of the argumentdiscussed in Starke (2001).

(57) αβ . . . α . . . αβ

Standard (unimpaired) representation of object relative clauses or cleftsis like (56): a Q feature, authorized by special discourse conditions, suchas in the case of the boy that the girl kissed, requires construction of acontext including a set of boys. Licensing of this Q feature makes theobject NP different from the intervening subject NP and derives theabsence of minimality effects.In the examples from Starke considered above, a rich contextual back-ground was required for the licensing of specificity marking on the mov-ing element. Satisfaction of these discourse conditions is not requiredonly for the representation of specificity but also for other operationsthat involve the syntax-discourse interface. When the sentence is pre-sented in absence of context, or when there is a mismatch between thecontext and the value encoded at the interface, (re-)construction of thecontext and/or activation of the relevant feature will have a higher pro-cessing cost with respect to cases in which the syntactic representationmatches the context (see Crain and Steedman 1985 for detailed discus-sion of this issue). Following our present hypothesis, the representationof the morphosyntactic features encoding this interface information iscompromised in agrammatism. This means that we have to take thelegitimate representation in (57) and consider the consequences of theinactivation (or late activation) of the β feature. A delay in activationof the β feature generates the configuration in (58), which represents aminimality violation.

(58) *α . . . α . . . α

Clearly there is no agrammaticality in the way the anti-identity principleapplies in the underspecified example in (58). RM always applies in thesame ‘grammatical’ way; it could hardly be otherwise for a principle ofsuch generality. The problem lays in how the principle is ‘fed ’ in thecase of potential intervention (in the sense defined above). To illustrate,if the whole array of features are correctly activated at the momentin which chains are computed than no problems are expected to arise;if however (to paraphrase Starke) the feature tree loses one leaf then


(a)grammatical minimality effects should be expected.18

Minimality and Complexity

The generalized minimality approach shares the intuition of much pre-vious work in the psycholinguistics tradition. The idea that non-localrelations are more difficult to compute goes back to Frazier’s (1987)active filler strategy and was carefully developed, and combinedwith Fodor’s (1979) superstrategy by Marica De Vincenzi (1991) inher minimal chain principle. The same idea was recently reformu-lated from a different perspective in Gibson’s (1998) Dependency Lo-cality Theory (DLT). The relation between the present approach andthese predecessors should be the object of more intensive work in thefuture. Despite their similarity it should already be possible to identifya number of differences between the present proposal and e.g. (DLT).On the other hand, I take the similarity with the Minimal Chain Prin-ciple to be much deeper. In this context I would like to briefly considera problematic issue raised by the DLT which might be solved by thepresent approach. Gibson develops a unified account of several com-plexity factors including multiple center embedding and subject/objectasymmetries. According to the DLT, sentence comprehension requiresthe use of (at least) two components of computational resources: (i)structural integration, needed to connect an input word into the struc-ture being processed and (ii) structural storage, which keeps track of theincomplete structural dependencies that are being built.These costs should increase with any increment of the linear distance,and the insertion of new discourse referents between an antecedent andits gap. However, as Gibson himself recognizes, “Once past a certaindistance, there is no noticeable complexity difference between differentdistance predicted category realizations, and this point occurs well be-fore memory overload”(Gibson, 1998, p.30).

(59) a. Luke gave the beautiful pendant that he had seen in thejewelry store window to his wife.

18It is an important question if, given an underspecified feature set, differencesthat were not relevant in the normal case are taken into account to derive a correctrepresentation. In other words, if it is possible that the system adapts to the newsituation and tries to make the most of what it has at its disposal (the impoverishedfeature sets) looking at distinctions inside the same class to find distinctions (e.g.different person or number marking, if any are available) between the probe and theintervener. I will further consider this point in Chapter 6 where I discuss the effectsof a mismatch in animacy on production of wh-movement.


b. Luke gave the beautiful pendant that he had seen in thejewelry store window next to a watch with a diamond wrist-band to his wife.

(60) a. Andrea sent the friend who recommended the real estateagent who found the great apartment a present.

b. Andrea sent the friend who recommended the real estateagent who found the great apartment which was only $500per month a present.

The facts illustrated in the examples (59) and (60) (originally ex. 30/31Gibson, 1998) constitute a serious problem for his distance based anal-ysis. According to his calculations, the memory cost associated withthese structure should exceed the processor’s computational resources.Nevertheless, although (60-a) and (60-b) are more difficult to processthan (59-a) and (59-b), “all these examples are much more processablethan a multiply center-embedded example like (61)”.

(61) #The administrator who the intern who the nurse supervised hadbothered lost the medical reports.

Gibson assumes that this problem can be solved with the assumptionthat the memory cost function heads asymptotically toward a maximalcomplexity, i.e. that it is not linear. It seems fair to say that this as-sumption clashes with his whole system of assumptions, which predictsa linear increment of processing cost. From the minimality perspectiveadopted here, however, this results are expected. Consider the case of(59-a). Successful extraction of the object DP the beautiful pendant in(59-a) requires activation of the full array of morphosyntactic featuresassociated with it. As we have assumed throughout this work, this ac-tivation has a cost (which generally cannot be payed by agrammaticaphasics). Once the cost is payed, and in particular once the cost ofactivating the relative feature is payed, then it is possible to distinguishthe moving DP (which belongs to the Operator class) from any interven-ing element that does not belong to this class. From this perspective thecomputational cost of (59-a)[b] is exactly the same as that of (59-a)[a],since nothing crucial changes in the two structures, the new interveningelements all belong to the argumental class and their addition does nothave any effect on the feature structure of the moving element, nor canit cause a minimality effect. The non-linear increment of complexity inthese cases can thus be easily predicted.Notice that this explanation does not exclude that additional memory


costs might be associated with an increase in the distance between anantecedent and its gap. The important point is that this constitutesa complexity factor which should be distinguished from the complexityassociated with representing a fully fledged feature structure.19 Moregenerally, and more to the point of the present discussion of aphasia,it is clear that several factors are responsible for the overall complexitylevel of a sentence: agrammatics, however, are not equally sensitive toall of them. In the following chapters I present and discuss empiricalsupport for this claim.

19Notice also that the present hypothesis has nothing to say about multiple centerembedding, see Sadeh-Leicht (2007) for a recent critique of the DLT and for ananalysis of multiple center embedding structures as Strong Islands which opens upimportant connections with the present hypothesis, especially considering Starke’s(2001) treatment of Strong Islands under RM.

Documents

relativised minimality