Concepts: Conceptualizing the world Cognition represents things “under a description” – it represents them as something. (we say it “conceptualizes” the

Concepts: Conceptualizing the world

• Cognition represents things “under a description” – it represents them as something. (we say it “conceptualizes” the world).

• What do concepts represent? i.e. what determines the content of concepts?

– Complex concepts get their content from their constituents (and the way they are put together)

– Simple (basic) concepts get their content from…..?

Several possible forms of representation of a Necker Cube

Conceptualizing and “parsing”; “picking out” things-in-the-world

• The most basic cognitive operation involves the picking out that occurs in the formulation of predicate-argument structures of judgments. This is closely related to figure-ground differentiation.

• The predicate-argument structure logically precedes judgments (or thoughts about the perceived world) since judgments presuppose that what judgments are about has been picked out.

• A better way to put this is that the arguments of perceptual predicates P(x,y,z,…) must be bound to things in the world in order for the judgment to have perceptual content.

How do we “pick out” things that will serve as the arguments of visual predicates?

• Simple empirical examples: Attentional selection; multiple-attentional selection.Picking out is different from discrimination.Pick out the n’th element to the right of a grating or pick out elements forming a line or a square from a random texture pattern or wallpaper tessellation. (Intrilligator examples; Trick example of subitizing).

• “Visual routine” examples from Ullman. (Collinearity)• Picking out or individuating-plus-indexing is one of the

most basic functions of perception.

A central thesis of this course is that “picking out” is preconceptual and not mediated by the prior detection and encoding of any visual properties!

Example of the need for multiple individuation of visual tokens

Subitizing (and when it fails)Visual Indexing theory provides an account of

the relation between subitizing and counting In subitizing only active indexes need be counted;

there is no need to consult the display so patterns and locations should not matter

Effect of cue-validity is strong in counting range, weak in subitizing range subitizing always occurs when (n < 4) range, even with invalid location cues

There is no subitizing when focal attention is required to individuate items (e.g., connected-points, embedded-rectangles, conjunction-feature targets)

Individuation and subitizing

The “subitizing” phenomenon only occurs with figures that can be preattentively individuated (as on the right).

Subitizing graphs for concentric and conconcentric squares

Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.

Another case in which items that require attention to individuate cannot be subitized

•Can subitized: Count all black squares. •Cannot subitize: count all Squares on the S-grid

Other “visual routines” that require objects to be picked out

Individuating is different from discriminating

Assumptions sketched so far

• Primitive individuation is preconceptual• When visual properties are detected they are

detected as properties-of-individuated-things. We call these individuals Primitive Visual Objects. The relation between PVOs and bona fide physical objects or “Spelke Objects” will be discussed later.

• Only individuated visual objects, either primitive or complex, can be the subject of visual judgments or the subject of motor commands.

• The mechanism for individuation and argument-binding is called a Visual Index or FINST.

Another argument for preconceptual individuation and indexing: Incremental

construction of perceptual representations

• Percepts do not come into existence all at once; they are built up over time (with or without saccadic eye movements).

• This is an example of one of several related phenomena that gives rise to a “correspondence problem”:– When P(x) is encoded we need to assign this new property to

the appropriate previously-encoded object.

– Without a preconceptual reference we would be faced with an intractable matching problem

What is the evidence for (or against) the assumption that individuation does not involve the prior detection of some property? The strongest contender for the role of mediating property in picking out and binding token individuals (PVOs) is location. The pro-location argument

Treisman’s Feature Integration Theory and the “binding problem” assumes location across feature maps provides the means for accomplishing property-conjunction binding. Two properties are conjoined if the have the same location.

Balint syndrome patients are very poor at spatial tasks and are also very poor a finding property conjunctions. They suffer from conjunction illusions (CI).

A number of people, including Mary-Jo Nissen, have explicitly tested the location-mediation assumption.

Evidence for the priority of location in property encoding

• Nissen (1985) measured the probability of correctly reporting shape and location, given a color cue, P(S & L | C). If retrieving shape and location given a color cue depends on using the color cue to first find the location and then using the location to find the shape, then we should have:

P(S & L | C) = P(L | C) x P(S | L)Nissen estimated P'(S | L) from other data and concluded that the above relation did hold.

• Paschler (1998, p 98-99) reviewed the literature on feature encoding in search and concluded that location is special because “… when an observer detects a target defined by other dimensions, this provides information about the location of the stimulus being detected.” Also location is special in that errors in reporting the identity of cued letters tend to consist of the report of nearby letters, and when the task is to report a letter of a particular color or shape together with as many as possible other letters, the latter tend to be nearby letters (Snyder, 1972).

But….Although there is a great deal of evidence, such as that described earlier, for the priority of location in accessing features, in (almost) every case “location” is confounded with individuality because objects have fixed locations. Being a different individual usually entails being at a different location.

• Consequently the mediation might be by individual rather than by location.1

There are two possible ways to control for the possible mediation of location:

1. use moving objects2. use objects that are not defined spatially

1 This overstates the case. There is clearly some location-focusing of attention since we can look at different parts of a display. But this may be a second-order effect.

1. Moving objectsPriming across moving objects

(“reviewing object files”)

Inhibition of Return

Multiple Object Tracking (MOT)

2. Spatially coincident objectstracking in “feature space”

Inhibition-of-return is object-based

Klein (2000) showed that when an item is attended and attention shifts away from it, then after about 300 ms – 900 ms, it takes longer to shift attention back to that item than to shift to a new item. This is called inhibition of return (IOR) and is thought to be useful for searching and foraging.

Tipper, Driver & Weaver (1991) showed that IOR is largely object-based by showing that if the object that was attended moves, IOR moves with it.

Typical Inhibition-of-return experiment

Inhibition of return moves with the object that is inhibited

Inhibition of return aids “foraging” and search

Object File Theory(Kahneman, Treisman & Gibbs, 1992)

• Information is encoded and stored in files that are specific to particular individual objects.

• When an object is encountered, an attempt is made to solve the correspondence problem and assign it to an existing object file, based mainly on spatiotemporal properties.

• When an assignment to an existing object file succeeds, the information in the existing file is first reviewed and is used as the default properties of that object. Thus there is a processing benefit for recognizing those properties that are listed in the object file for that object.

Kahneman & Treisman’s study of “Object File” priming

Demonstration of the Object File display: Positive prime

Demonstration of the Object File display: Negative prime

Multiple-Object Tracking

The MOT paradigm was not invented primarily to test the thesis that objects can be selected based on their individuality rather than on their featural (or local) properties. But it has turned out to be particularly useful for that purpose because it appears to provide an illustration of selection and re-selection (tracking or individuality-maintenance) that uses only the continuing history of an enduring object as the basis for its execution. The reason is that the only thing that defines the targets to be tracked is their history as individuals over time.

Try to think of what other basis there might be for tracking the targets in MOT.

A possible location-based tracking algorithm

1. While the targets are visual distinct, scan attention to each target in turn and encode its location on a list.

2. When targets begin to move, check the n’th position in the list and go to the location encoded there: Loc(n)

3. Find the closest element to Loc(n).

4. Update the actual location of the element found in #3 in position n in the list: this becomes the new value of Loc(n).

5. Move attention to the location encoded in next list position, Loc(n+1).

6. Repeat from #3 until elements stop.

7. Report elements whose locations are on the list.

Use of the above algorithm assumes (1) focal attention is required to encode locations (i.e., encoding is not parallel), (2) focal attention is unitary and has to be scanned continuously from location to location. It assumes no encoding (or dwell) time at each element.

Predicted performance for the serial tracking algorithm as a function of the speed of movement of attention

A new argument against location being the basis for access to properties

• It might be argued that location is being recorded somehow (e.g., in parallel) and that it therefore might be used to track.

• But Blaser, Pylyshyn & Holcombe, 2000, showed that one can track dynamic patterns that are superimposed over the same fixed location.

How does one explain the capacity to track in MOT?

Assume that individuating and maintaining individuality is a primitive operation of the encapsulated early vision system.

This is one of the basic ideas behind the FINST Visual Index Theory.

Some Assumptions of the Visual Indexing Theory (FINST Theory)

1) Early vision processes segment the visual field into feature-clusters automatically and in parallel. The ensuing clusters are ones that tend to be reliably associated with distinct token individuals in the distal scene. The distal counterparts of these clusters are referred to as Primitive Visual Objects (or sometimes FINGs, or things indexed by FINSTs), indicating our provisional assumption that these clusters typically correspond to the proximal projections of physical objects in the world.

2) The clusters are activated (also in parallel) to a degree that depends on such properties as their distinctiveness within a local spatiotemporal neighborhood, including distinctiveness sudden onsets.

3) Based on their degree of activation, these clusters compete for a finite pool of internal Indexes (FINSTs). These indexes are assigned in parallel and in a stimulus-driven manner. Since the supply of Indexes is limited (to about 4 or 5), this is a resource-constrained process.

Assumptions of FINST Theory (continued)

4) Although assignment of indexes is primarily stimulus-driven, there are certain restricted ways in which cognition can influence this process. One way is by scanning focal attention until an object with specified properties is located, at which time an index may spontaneously get assigned to it.

5) An index keeps being bound to the same visual object as that object changes its properties and its location on the retina (within certain constraints). In fact this is what makes it the “same” visual object. On the assumption that Primitive Visual Objects are reliably associated with prototypical real distal objects, the indexes can then functionally "point to" objects in a scene without identifying what is being pointed to — serving like the demonstratives "this" or "that".

Assumptions of FINST Theory (continued)

6) It is an empirical question what kinds of patterns can be indexed. It appears that they need not be spatially punctate. Current evidence suggests that the onset of a new visual object is an index-grabbing event. Perhaps the appearance of a new object within focal attention while the latter is being scanned is another such event (thus allowing for some cognitive control of index assignment, as in assumption 4).

7) Only indexed tokens can enter into subsequent cognitive processing: e.g., relational properties like INSIDE(x,y), PART-OF(x,y), ABOVE(x,y), COLLINEAR(x,y,z),... can only be encoded if tokens corresponding to x, y, z,... are bound by indexes.

8) Only indexed tokens can be the object of an action, such as moving focal attention to it – except when the attention movement is guided by strategies such as moving in a certain direction, which do not make reference to an target object.

Subset-search (Burkell & Pylyshyn, 1998)

Three cue “single feature” (or popout) subset

Three-cue “conjunction feature” subset search

Tracking without Keeping Track

A puzzle about tracking!

(or is it?)

We have argued that, under certain assumptions about the movement of unitary focal attention, tracking could not be accomplished by encoding and storing target locations, and serially visiting the stored locations. Since each target’s location (and trajectory) is the only unique property that distinguishes a target from a distractor, this raises the question of how a target can be tracked. What target property allows it to be tracked?

We have proposed that MOT is accomplished by a primitive mechanism that directly keeps track of an object’s individuality, as such. The mechanism is called a Visual Index (or FINST).

What is at issue in the present context is one crucial aspect of the theory. We call this assumption:

More on how tracking is accomplished

“The Internal Name Assumption”

A logical requisite for trackingWhatever the exact nature of the mechanism used for tracking in MOT, it must be able to keep track of the individuality or enduring objecthood of particular token items. But the only thing that makes object Xn(t) a target is that it traces its history back to

an object that at the outset was visibly a target. In other words: Xn is a target if it satisfies the following recursive definition:

(1) Xn(0) is visually identified as a target

(2) If Xn(t) is a target, then Xn(t+Δt) is a target

But this means that there must be a mechanism for determining whether the second (recursive) step holds – i.e. whether an object Xn(t) is the same-individual-object as Xn(t-Δt). This known as

the correspondence problem.Solving this correspondence problem is equivalent to assigning a distinct internal name n to each target object.

Successful tracking of a particular target implies that a unique internal identifier has been (temporarily) assigned to that target

If a distinct internal name (say n1, n2, n3, or n4) has been assigned to each target, it would then be possible to assign overt labels to the targets by merely learning a list of pairs: n1-r1, n2-r2, n3-r3, n4-r4, where ri are external labels or overt responses.

Thus if a target is initially given a unique label, and if that target is successfully tracked, an observer should be able to report its label – simply as a consequence of having tracked it.

When you track several objects in MOT, do you automatically know which one is which?

Notice that if you had assigned an internal label to each object, and if you remembered the 4 responses associated with the 4 distinct labels, then you should be able to correctly recall the labels associated with the successfully tracked objects.

Open research question

ID & Tracking

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2 5 10

Trial Duration

Perc

enta

ge C

orre

ct Track Corners

Track Names

ID Corners

ID Names

Tracking & ID Only

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%

2 5 10

Trial Duration

Perc

ent C

orre

ct Tracking Corners Only

Tracking Names Only

ID Names Only

Avg ID Performance when tracking 4 objects

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2 5 10

Trial Duration (seconds)

Perc

ent C

orre

ct ID

ing

ID Names (Track at 100%)

ID Corners (Track at 100%)

What does this mean?

What are some possible reasons why an objects’ ID is not retained as well as its membership in the category “target”?

Some possible reasons for these results

● Unique identifiers were initially assigned to each individual object but the observer forgot some of them

☺If an identifier is forgotten or lost then that particular object ceases to be tracked, which should decrease both tracking and ID performance.

● The set of targets is tracked as a whole rather than as individual objects. To do the MOT task, all you need is to keep track of the target set, not the individual objects forming the set.

☺Because objects move independently, tracking them as a whole logically requires that each object be individually tracked at some level of the system.

It might be argued that all one needs is to tag each target object with the same tag (say T) and keep track of which objects have this tag. If one did not give a different name to each individual target object then it might explain why one fails to distinguish among the T-tagged objects.

The problem with this way of putting it is that object tagging is a metaphor since one cannot actually tag the display. Instead one must keep a mental tag on the targets. But this still requires that one solve the correspondence problem for each target from one instance to another. One must still be able to specify that this particular object on the screen now is the same object as some particular object that was on the screen an instance earlier — and that is tantamount to giving each target object a distinct label.

Why require a label rather than a simple tag?

Another possibility: The “local name” alternative

Although the tracking system assigns an internal name or ID to individual objects, the names are not made available to the cognitive system — they act like local variables in a programming language. If they are not passed outside the early vision module then they cannot be used to select a response.

If identifiers are not available outside the module, then why is ID performance as high as it is?

Performance in this experiment depends not only on the availability of identifiers used in tracking. The Visual Index (FINST) theory claims that what we have called identifiers are actually indexes and they can be used to direct focal attention to indexed objects. If that is so, then indexes can also be used to visit objects serially, to order them, and generally to adopt one of many serial mnemonic strategies for encoding and recalling ID information. Subjects who are particularly good trackers can and do use various mnemonic strategies to obtain higher ID scores.

Some additional possible reasons Errors in tracking arise primarily through switching

identifiers among objects… ?

☺This has a certain face validity since it does feel like one loses the ID of targets when they come close to one another. But since all the objects move randomly and independently, there is actually a 30% higher probability that a target ID will be switched with a nontarget ID (thus lowering tracking performance) than that a target ID will switch with another target ID.

•But we now have data showing that this is in fact alikely reason. The “interaction score” between name- switched items is higher than non-switched items

But we now have data showing that the “interaction score” between name- switched items is higher than non-switched items, suggesting that names switch when items come close together.

• The “interaction score” between targets and nontargets that have exchanged identities (resulting in a tracking error) is no higher than interaction scores between a randomly selected pair of target-nontarget pair. This suggests that tracking failures, unlike ID failures, are not due to items coming too close together. Surprising and still under investigation!

Relation to work on infants’ sensitivity to the numerosity of objects:

Alan Leslie’s “Object Indexes”

Infants as young as 4 months of age show surprise (longer looking time) when they watch two things they have seen together being placed behind a screen and then the screen is lifted to reveal only one thing. Below 10 months of age they are in general not surprised when the screen is lifted to reveal two things that are different from the ones they saw being placed behind the screen.

In some cases, infants (age 12 months) use the difference in color of the objects they see one-at-a-time to infer their numerosity, but they do not appear to record the colors and use them to identify the objects that are revealed when the screen is lifted.

How does indexing differ from selective attention?

• Several indexes can be available simultaneously• Assignment of indexes is primarily data-driven• Indexes are object-based and remain with the object as it moves

(even when it moves briefly behind an occluding surface)• Indexes provide the following functions:

– They individuate token objects– The allow direct access (which may be serial or parallel).– Indexed objects are picked out qua individuals – they need

not be specified by any object properties nor is it mandatory that any of their properties be encoded

– Indexes bind multiple objects to elements of representations

Why do we need indexes?• Indexical or demonstrative reference is ineliminable in cognitive

science (cf Perry).

• They are needed in order to specify where focal attention is assigned

• They are needed to execute visual routines

• All arguments to a visual predicate or operation must be bound to objects through indexes before they can be evaluated

• To allow the environment to be used instead of memory in carrying out tasks (using external memory in arithmetic <Van Lehn>, in recognition-by-parts <Biederman>, in copying patterns <Ballard>).

• To provide a facility for cross-modal binding (arguments of motor commands). In general to “situated” vision for action.

• To avoid having to assume a metrical image to explain trans-saccadic integration and visual stability (and many imagery phenomena).

John Perry gives the following nice example of how the decision to take a particular course of action can arise from the realization that a particular object in a description and a particular thing one sees are one and the same object.

The author of the book, Hiker’s Guide to the Desolation Wilderness stands in the wilderness beside Gilmore Lake, looking at the Mt. Tallac trail as it leaves the lake and climbs the mountain. He desires to leave the wilderness. He believes that the best way out from Gilmore Lake is to follow the Mt. Tallac trail up the mountain … But he doesn’t move. He is lost. He is not sure whether he is standing beside Gilmore Lake, looking at Mt. Tallac, or beside Clyde Lake, looking at the Maggie peaks. Then he begins to move along the Mt. Tallac trail. If asked, he would have to explain the crucial change in his beliefs in this way: “I came to believe that this is the Mt. Tallac trail and that is Gilmore Lake”. (Perry, 1979, p 4)

The point of this example is that in order to understand and explain the action of the lost author it is essential to use demonstrative terms such as this and that. Without a way to directly pick out the referent of a descriptive term and to link the perceived object token with its cognitive representation, people would not be able to act on their knowledge and we, the theorists, would not be able to explain the people’s actions.

Ballard, Hayhoe et al.’s proposal for a “deictic strategy”

Observers appear to use their direction-of-gaze as their reference point in encoding patterns and would prefer to make more eye movements rather than devote extra effort to memorizing a simple pattern.

Ballard, Hayhoe, Pook & Rao (1997): “deictic codes” and “embodied cognition”

Schematic illustration of the connection between FINSTs, cognitive representations, and the world

A more pictorial illustration of Visual Indexes as direct causal reference or demonstratives

“Situated vision” in Artificial intelligence:How indexicals make reference more efficient

Documents

Concepts: Conceptualizing the world Cognition represents things “under a description” – it represents them as something. (we say it “conceptualizes” the