16
943 British Journal of Psychology (2011), 102, 943–958 C 2011 The British Psychological Society The British Psychological Society www.wileyonlinelibrary.com Mental representations of familiar faces A. Mike Burton 1, Rob Jenkins 2 and Stefan R. Schweinberger 3 1 School of Psychology, University of Aberdeen, UK 2 University of Glasgow, UK 3 Friedrich Schiller University, Jena, Germany The Bruce and Young (1986) framework makes a number of important distinctions between the types of representation needed to recognize a familiar face. Here, we return to these, focussing particularly on face recognition units. We argue that such representations need to incorporate idiosyncratic within-person variability, asking questions such as ‘What counts as a picture of Harrison Ford?’. We describe a mechanism for achieving this, and discuss the relation between image variability and episodic face memories, in the context of behavioural and neurophysiological data. In this paper, we will address the issue of representation in face recognition. Bruce and Young (1986) go to some lengths to emphasize the importance of thinking clearly about representation in their classic paper. They are careful to lay out the importance of different types of codes involved in face processing, and link their theorizing to the earlier work of David Marr (Marr, 1982, Marr & Nishihara, 1978). In some ways, this emphasis reflects the concerns of the time. In 1986, there was great enthusiasm for artificial intelligence accounts of perception and cognition, and the revival of connectionism was poised to influence the thinking of many researchers. Neuropsychological evidence was key to constraining cognitive models, and guided the Bruce and Young model in important ways, but during that period, cerebral localization was not the popular topic that it is once more today. The recent expansion in research on brain imaging has to some extent shifted the focus away from what is represented in face recognition, but here we wish to return to this topic, arguing that it has lost none of its fundamental importance in the ensuing 25 years. However, we will do so in the contemporary scientific context. We will examine some of the representational issues laid out in the Bruce and Young model, focussing particularly on the notion of face recognition units (FRUs), and drawing on modelling, behavioural, and neurophysiological evidence. Bruce and Young make an important distinction between pictorial codes and structural codes. Pictorial codes are formed for any visual pattern or picture but are not simply iconic representations. Instead, they incorporate information that is integrated Correspondence should be addressed to Mike Burton, School of Psychology, University of Aberdeen, Aberdeen AB24 2UB, UK (e-mail: [email protected]). DOI:10.1111/j.2044-8295.2011.02039.x

Mental representations of familiar faces

Embed Size (px)

Citation preview

943

British Journal of Psychology (2011), 102, 943–958C© 2011 The British Psychological Society

TheBritishPsychologicalSociety

www.wileyonlinelibrary.com

Mental representations of familiar faces

A. Mike Burton1∗, Rob Jenkins2 and Stefan R. Schweinberger3

1School of Psychology, University of Aberdeen, UK2University of Glasgow, UK3Friedrich Schiller University, Jena, Germany

The Bruce and Young (1986) framework makes a number of important distinctionsbetween the types of representation needed to recognize a familiar face. Here,we return to these, focussing particularly on face recognition units. We argue thatsuch representations need to incorporate idiosyncratic within-person variability, askingquestions such as ‘What counts as a picture of Harrison Ford?’. We describe a mechanismfor achieving this, and discuss the relation between image variability and episodic facememories, in the context of behavioural and neurophysiological data.

In this paper, we will address the issue of representation in face recognition. Bruceand Young (1986) go to some lengths to emphasize the importance of thinking clearlyabout representation in their classic paper. They are careful to lay out the importance ofdifferent types of codes involved in face processing, and link their theorizing to the earlierwork of David Marr (Marr, 1982, Marr & Nishihara, 1978). In some ways, this emphasisreflects the concerns of the time. In 1986, there was great enthusiasm for artificialintelligence accounts of perception and cognition, and the revival of connectionismwas poised to influence the thinking of many researchers. Neuropsychological evidencewas key to constraining cognitive models, and guided the Bruce and Young model inimportant ways, but during that period, cerebral localization was not the popular topicthat it is once more today.

The recent expansion in research on brain imaging has to some extent shifted thefocus away from what is represented in face recognition, but here we wish to returnto this topic, arguing that it has lost none of its fundamental importance in the ensuing25 years. However, we will do so in the contemporary scientific context. We will examinesome of the representational issues laid out in the Bruce and Young model, focussingparticularly on the notion of face recognition units (FRUs), and drawing on modelling,behavioural, and neurophysiological evidence.

Bruce and Young make an important distinction between pictorial codes andstructural codes. Pictorial codes are formed for any visual pattern or picture but are notsimply iconic representations. Instead, they incorporate information that is integrated

∗Correspondence should be addressed to Mike Burton, School of Psychology, University of Aberdeen, Aberdeen AB24 2UB,UK (e-mail: [email protected]).

DOI:10.1111/j.2044-8295.2011.02039.x

944 A. Mike Burton et al.

over successive fixations to form a representation that is a little more general. Thesecodes ‘may contain details of the lighting, grain and flaws in a photograph, as well ascapturing the static pose and expression portrayed’ (Bruce & Young, ibid., p. 307). Therepresentations can be used, for example, to make a recognition memory judgement: didyou see this picture before? Such judgements do not rely on identical retinal images – theviewer does not need to fixate an image in the same place to solve the task. However,performance is based on information highly specific to the image viewed.

Structural codes, on the other hand, are the more abstract representations thatmediate everyday recognition of familiar faces. The now-famous diagram of the facerecognition model contains a layer of representations known as ‘face recognition units’,and these are described as follows:

‘Each face recognition unit contains stored structural codes describing one of the facesknown to a person. When a face is seen, the strength of the recognition unit’s signal tothe cognitive system will be at a level dependent on the degree of resemblance betweenits stored description and the input provided by structural encoding’. (Bruce and Young,p. 311–312).

and

‘A face recognition unit will respond when any view of the appropriate person’s face isseen, but will not respond at all to his or her voice or name’. (p. 313).

So, this description captures a representation that is entirely visual, but not tied toa particular instance of a viewed face. This allows Bruce and Young to separate outprocesses of recognizing a person by face (FRU) or other routes (by putative voice orname recognition units), and to allow a more general, modality-independent level ofclassification beyond these: the person identity nodes (PINs). These PINs mediate accessto semantic codes specific to that individual – they support the recognition of a personrather than recognition of a face.

How might we begin to capture the nature of the FRUs and their connections toother representations? The embedding of these units in the more general model offace recognition demonstrates their purpose but does not give any clues about howthey might work. In fact, previous models had employed such units (Ellis, 1986; Hay& Young, 1982), as influenced by the notion of logogens in contemporary models ofreading (Morton, 1969, 1979). However, their operation represents a major problem.Note that for those interested in automatic face recognition, understanding FRUs isthe entire challenge: if one can find a representation that is triggered by any view ofsomeone’s face, and only that person’s face, then most issues of automatic recognitionare solved. To date, they are not (e.g., Phillips et al., 2010). In the next section, we willconsider this problem.

Within-person facial variabilityFigure 1 shows 14 different images of the same person’s face. Notice that these picturesdo not vary across most of the dimensions typically included in liturgies of difficulties forface recognition: all the photos are taken in similar poses, in good lighting conditions,and within a relatively short period of time (about 5 years). Nevertheless, they varyconsiderably, partly due to variability in the person (haircut, weight, expression), andpartly due to changes in the image-capture conditions (camera, lighting). The problem

Mental representations of familiar faces 945

Figure 1. Fourteen photos of one person, with their average at the centre.

for face recognition is to understand how a representation (FRU) could be built thatwould accept all these images as the same person, while admitting no images of anotherperson. Even with the relatively constrained images shown here, this is a problem thatcontinues to elude both psychologists and engineers interested in providing practicalface recognition systems.

What properties would an adequate representation have? One clue to their nature isthe interaction between performance and familiarity. Over the past decade, it has becomeclear that our abilities with face tasks such as matching two different photos of a personare surprisingly poor for unfamiliar faces (e.g., Bruce et al., 1999; Bruce, Henderson,Newman, & Burton, 2001; Henderson, Bruce, & Burton, 2001) and surprisingly goodfor familiar faces (e.g., Burton, Wilson, Cowan, & Bruce, 1999). Indeed, a sequence ofpapers by Clutterbuck & Johnston (2002, 2004, 2005) suggests that the association isso strong that one’s ability to match different photos of the same person is an excellentpredictive index of one’s familiarity with a face. So, it appears that to conform to the‘ideal’ recognizer above (recognizing all and only the photos of a single person) the seenperson must be highly familiar. The question then becomes, what happens during theprocess of familiarization that allows for such performance improvements? We take this

946 A. Mike Burton et al.

Figure 2. The average of 48 different images of Harrison Ford.

to mean that face recognition involves a transition during learning from relying largelyon pictorial codes, to relying largely on structural codes.

In recent years, we have been exploring the potential of ‘face averages’ as a suitablerepresentation for familiar face recognition (Burton, Jenkins, Hancock, & White, 2005;Jenkins & Burton, 2008, in press). The centre of Figure 1 shows the average of theother photos, in the sense of morphing all faces together. Experiments with a numberof different matching techniques show that this is a good representation for recognizingnew photos of a person: in short, the pairwise similarity between a novel photo anda person’s average is generally higher than the similarity between individual photosof that person, for all ways of measuring similarity we have tested. This means that ifone is to match a photo to a single image of a person (e.g., in a computer-based facerecognition system), it is preferable to match it to an average image, rather than toanother photograph. In terms of human capabilities, we have proposed that the processof learning a new face corresponds to the successive refinement of the person’s average,as new examples are added through experience of seeing that person. In this way, wehave suggested that the average, or ‘canonical’ image of a person’s face, may work as astructural code, in essence an FRU. Figure 2 shows an example of this for a familiar face.

The notion of an averaging process is very appealing, because it suggests a way ofeliminating information irrelevant to identity from one’s representation of a face. Allphotos contain some properties that are due to environmental factors such as lightingdirection, and averaging gives a way of eliminating these, because their effect willsimply cancel out across many photos. Note that this proposal is directly in contrast tosome engineering approaches, in which such variability is explicitly modelled, and thenfactored out (e.g., Georghiades, Belhumeur, & Kriegman, 2001) – an approach that hasso far proved unable in practice to cope with the huge variability in unconstrained imagesets.

Mental representations of familiar faces 947

Despite some success with this proposal, we would like here to suggest that therepresentation using averages is insufficient. In common with other approaches, it treatsvariations of a particular object (a face here), as being ‘noise’, at best inconvenientdeviations from a ‘true’ representation. The problem of recognition then, becomeshow to strip away this irrelevant information, this noise, in order to make the truematch. There is an alternative to this notion. It is possible that the variation in differentmanifestations of the same object (face) could form part of one’s representation. It ispossible that the variability is actually informative, and should be embraced, embeddedin the representation, and not eliminated prior to a matching process. Below, we willoffer a practical way to do this. However, we should pause to note that the suggestionexists in the literature already, in a paper by Bruce (1994).

‘Moreover, given that the task of any such statistical system is to distinguish the “significant”variations between individuals from irrelevant variation within individuals, experience ofvariations of, say, expression within an individual may actually help rather than hinder theencoding process, just as an increase in the sample size in any analysis makes it more likely toreveal genuine differences between samples. This raises the interesting possibility (cf. Bruce,Doyle, Dench, & Burton, 1991) that non-rigid variations created by expressive movementsof the face may not actually make face recognition a more difficult problem than other kindsof (within-class) object recognition, as we have always assumed, but may actually facilitatediscrimination within a class of objects that all share the same overall structure’. (Bruce,1994, p. 24).

By characterizing face recognition as a problem of statistical inference, this proposalencourages one explicitly to consider variability of facial images within person. Todo this, we have applied principal components analysis (PCA) to images of the sameperson’s face. PCA is a very common statistical technique in face recognition (Kirby &Sirovich, 1990; Turk & Pentland, 1991; Zhao, Chellappa, Phillips, & RosenWeld, 2003)and is almost always applied in order to elucidate the dimensions along which differentfaces vary. However, we have found it informative to use the same technique to extractdimensions of variation within a person. Using this technique with many photos ofthe same person makes it possible to extract dimensions corresponding to all the waysin which photos of that person can vary – not only variations due to short- or long-term changes in the person (e.g., expression and age) but also superficial variationsdue to camera position, setting, and lighting. It is precisely these dimensions that makeautomatic face recognition so difficult, and so understanding how they vary with respectto a particular face is very useful.

Statistical analysis of faces always relies on some separate treatment of shape andtexture (e.g., Craw, 1995; Troje & Bulthoff, 1995). This is a form of normalization thatallows one to average together disparate images in a principled way. We achieve thisby dropping a grid marking key points onto each face image (e.g., positions of cornersof mouth, eyes etc.). All images are then morphed to the same shape. This allows usto combine and compare all the images (because eyes now align with eyes, mouthswith mouths, and so on). It also allows us to consider variations in shape, because the‘starting-point’ grid for each morph, shows the unique shape of that particular image.For full methodological details of this procedure, see Burton et al. (2005).

To explore variations of pictures of particular individuals, we collected multiplephotographs of well-known celebrities. Here, we illustrate the procedure with theexample of the actor Harrison Ford. Using Google-Images, we selected the first 48images returned by a search for Harrison Ford, constrained such that the face was not

948 A. Mike Burton et al.

Figure 3. Texture components derived from a PCA on 48 photos of Harrison Ford. Columns showthe first five components (left to right), with values z = + 1 above, and z = −1 below. Each texturehas been mapped to the average Harrison Ford shape (Figure 2).

occluded, and the pose was roughly frontal (as with Figure 1). For copyright reasons, wecannot illustrate the starting images, though the reader will be able to replicate the rangeof photos that is produced simply by repeating it with any search engine. We droppeda standard grid, marking key facial points, onto each of these images, and morphedthem to a standard shape. We then examined the statistical properties of Harrison Ford’s‘texture’ (the 48 standard-shape images), by applying PCA to these standardized images.Separately, we examined the statistical properties of his face ‘shape’ (the 48 differentstarting grids), by applying PCA to the constituent xy-coordinates. This allows us toexamine how pictures of Harrison Ford vary, in a formal statistical sense.

Figure 3 shows the first five dimensions of variability in the texture of HarrisonFord pictures. It is clear that early components tend to code superficial variations suchas direction of light (dimension 1) and overall image colouration (dimension 3). Ourexperience is that this is a reliable finding. This is particularly interesting because,by definition, early components capture the most variance, meaning that the biggestdifferences between images of the same face are typically not due to changes in the faceitself, but due to changes in the world (camera and ambient lighting conditions).

Figure 4 shows the first five dimensions of shape in the Harrison Ford images. PCAis performed on the point-locations within each grid. To view these, we simply map theaverage Harrison Ford texture onto grids representing positive and negative coefficients( + 1 and −1 SD) on the derived dimensions. Once again, early components (capturingmost variability) correspond to changes in the world, rather than changes in the person.For example, dimension 1 is simple left–right rotation. The clarity of this is quite striking,since we had no intention of varying viewpoint in any systematic way in the inputimages. By the time we reach dimension 4, there are clear person-specific dimensions ofvariability, in this case a rather clear expressive change.

Having performed initial PCA on shape and texture, it is then possible to extractstatistical associations between these. It has been clear for some time that variabilityin shape and texture are not independent (e.g., Hancock, Burton, & Bruce, 1996). Forexample, if someone smiles broadly, this will be coded explicitly in a representation ofshape (e.g., raising of the corners of the mouth), but will also be captured in textural

Mental representations of familiar faces 949

Figure 4. Shape components derived from a PCA on 48 photos of Harrison Ford. Columns show thefirst five components (left to right), with values z = + 1 above, and z = −1 below. The average texture(Figure 2) has been mapped to these shapes in each case.

changes (e.g., exposure of white teeth), even when the textures conform to a standardshape. To explore this inter-dependence between shape and texture, we can nextperform a second-order PCA that takes as its inputs the images’ coefficients on theoriginal shape and texture dimensions. This allows us, for the first time, to extract theprincipal dimensions of variability within a face, where those dimensions incorporateboth shape and texture. The aim here is to generate a representation that captures notonly the average image of a particular face, but a range of variability around the averagethat captures all its possible variants. We could think of this representation as describingthe whole distribution of possible images of the person. As with any distribution, oneneeds to incorporate information about spread, as well as central tendency.

Figure 5 provides an illustration of a representation of Harrison Ford derived inthis way. None of these are photographs. Instead, each is a combination of statistical

Figure 5. New images of Harrison Ford, created by assigning novel values to dimensions derived fromPCA of 48 images.

950 A. Mike Burton et al.

dimensions derived from a set of photos of Harrison Ford. None of the originals looklike these images, but they all lie within the space carved out by dimensions of variationaround the Harrison Ford average. If this statistical description is correct, it perfectlydescribes the range of images that count as Harrison Ford. This representation is trulystatistical: its structure is constrained by the photos used to derive it, and its utility isbounded by these. So, for example, viewers who have known Harrison Ford throughhis films will have a space derived from images sampled over that period of his life. Hisfamily will have a wider range of images, and a corresponding representation that ismore generalizable than that of cinema-goers. In this way, we propose that the notionof an FRU does not correspond to some gold standard, canonical picture of a person,but is tied to the exposure: the representation truly corresponds to the experience ofthe beholder. Importantly, the representation is best thought of as a region, not a point,and the variability of the input is a vital part of the derivation of this representation, notinconvenient noise.

Prototypes and exemplars: Recognition of people and eventsThe proposal outlined here, that recognition of familiar faces involves an understandingof person-specific variance, could be implemented in a number of different ways. Indescribing face ‘averages’, we have tended previously to use the language of prototypes.However, consideration of variability lends itself just as easily to theoretical accountsbased on collections of instances. In the general category learning literature, a long-running debate between those favouring prototype accounts and those favouringinstance-based accounts has not resolved. In short, it seems possible to take any accountbased on one of these approaches and reformulate it in the other (e.g., see Zaki, Nosofsky,Stanton, & Cohen, 2003). In implementation, such differences tend to converge overlarge numbers. So, in the face recognition case, one has experienced thousands ofencounters with some personally familiar faces (e.g., family or colleagues). A prototype-with-variance account could very simply be implemented through descriptive statisticsbased on these encounters, as described above, or equally well on storing all encountersseparately. Both produce clear advantages for well-learned faces, the first becauseprevious experience has scoped the range of this person’s possible manifestations well,and the second because any new encounter is likely to share much in common with aprevious encounter.

In general, models of face processing have tended not to emphasize episodic aspectsof recognition. Bruce and Young’s notion of pictorial codes does, in principle, offer away of understanding how we can remember the details of a particular encounter withsomeone. However, this has not been the main focus of theoretical development withinthe field, and subsequent accounts, such as the interactive activation and competition(IAC) model (Burton, Bruce, & Hancock, 1999; Burton, Bruce, & Johnston, 1990; Young& Burton, 1999) or Haxby’s formulation (Haxby, Hoffman, & Bobbini, 2000), offer no wayto capture our memory for a particular meeting, and how someone looked at the time.This is perhaps surprising, since one of the most-investigated phenomena informing facemodels is repetition priming: faces are recognized more easily (faster, more accurately),if they have been seen earlier in the experiment (Bruce & Valentine, 1985; Ellis, Young,& Flude, 1990; Ellis, Young, Flude, & Hay, 1987). It is very well known that this effectis modulated by similarity between prime and test instances, such that repetition of theidentical image produces most priming, whereas non-identical images produce reliable,but smaller, effects. While accounts of this phenomenon have been provided in structural

Mental representations of familiar faces 951

terms (e.g., Bruce, Burton, Carson, Hanna, & Mason, 1994; Ellis, Flude, Young, & Burton,1996), it is possible that episodic accounts could also be recruited here.

In considering these issues, it is very useful to consider the neurophysiologicalevidence on face representation. The Bruce and Young model, and particularly itsdistinction between pictorial, structural, and semantic codes, has been influential inneurophysiological research using event-related brain potentials (ERPs) and functionalmagnetic resonance imaging (fMRI), as well as developments of these techniques, suchas fMR-adaptation (e.g., Davies-Thompson, Gouws, & Andrews, 2009). Work using fMRIis extensively reviewed elsewhere in this volume (Natu & O’Toole, 2011), and so wewill concentrate here on ERP research.

The mid-1990s saw two new developments in face ERP research: first, ERP studieswere published that used a priming approach to reveal different stages in familiarface recognition (Begleiter, Porjesz, & Wang, 1995; Schweinberger, Pfutze, & Sommer,1995). These found that a right-lateralized posterior temporal ERP (∼200–350 ms) wasmodulated by face repetitions. The effect – later termed N250r (‘r’ for repetition) –was consistently larger for familiar than unfamiliar faces and was originally speculated toreflect FRU activation. Second, a right-lateralized occipitotemporal N170 (∼150–200 ms)selective to faces (compared to other visual stimuli) was demonstrated (Bentin, Allison,Puce, Perez, & McCarthy, 1996; Botzel, Schulze, & Stodieck, 1995). More recent researchincludes ERP correlates of face perception in the P1 component (∼80–140 ms; seeHerrmann, Ehlis, Ellgring, & Fallgatter, 2005), the occipitotemporal P2 (∼180–240 ms;Latinus & Taylor, 2006; Stahl, Wiese, & Schweinberger, 2010), and later ERPs in anN400-like component (∼300–600 ms; see Eimer, 2000; Schweinberger, 1996; Wiese &Schweinberger, 2008). We will focus on the N250r component here, as it appears to bethe earliest component with sensitivity to individual face recognition.

The N250r is highly sensitive to repetitions of familiar faces (Schweinberger,Pickering, Jentzsch, Burton, 2002). Corresponding effects also appear in intracranialrecordings (Puce, Allison, & McCarthy, 1999), and magnetoencephalography (MEG)(Schweinberger, Kaufmann, Moratti, Keil, & Burton, 2007). When the primes are task-irrelevant, this ERP response can be highly selective for repetitions of faces, whilebeing completely absent for repetitions of pictures of objects (Neumann, Mohamed, &Schweinberger, 2011; Schweinberger, Huddy, & Burton, 2004). The N250r is elicited byrepetitions across different face images but is consistently larger for repetitions acrossidentical images (Schweinberger, Pickering, Burton, & Kaufmann, 2002), mirroringperformance facilitation of face priming (Ellis et al., 1987). Thus, the representationsthat mediate N250r exhibit a degree of image specificity.

When repetitions are immediate (i.e., without intervening faces between repetitions),unfamiliar faces also elicit an N250r (Itier & Taylor, 2004), though a consistently smallerone compared to familiar faces (Begleiter et al., 1995; Herzmann, Schweinberger,Sommer, & Jentzsch, 2004; Schweinberger et al., 1995). N250r to unfamiliar facescan be eliminated by backward masking of the prime (Dorr, Herzmann, & Sommer,2011) or by other faces intervening between prime and target face (Pfutze, Sommer,& Schweinberger, 2002). Although well preserved under those conditions, N250rto familiar faces is also a somewhat transient phenomenon: while surviving a fewintervening stimuli (Pfutze et al., 2002), N250r is reduced with increasing numbersof faces held in working memory (Langeslag, Morgan, Jackson, Linden, & Van Strien,2009), and is very nearly eliminated at very long lags between repetitions, despite thefact that behavioural priming persists under those circumstances (Schweinberger et al.,2002; but see also Graham & Dawson, 2005). Moreover, though one repetition may be

952 A. Mike Burton et al.

insufficient to elicit long-term effects, an N250 has now been established as a correlate ofrepeated face learning (Gordon & Tanaka, 2011; Kaufmann, Schweinberger, & Burton,2009; Tanaka, Curran, Porterfield, & Collins, 2006). Collectively, these observationssuggest that the N250(r) reflects the reactivation of a facial representation in workingmemory (Schweinberger & Burton, 2003).

How can these findings inform our understanding of the FRU concept? Comparedto other cognitive modelling, these findings force us to address the sensitivity ofperception to specific instances of face recognition. It certainly seems to be the casethat to understand face recognition effects over short intervals, one must incorporatethe strong effects of particular instances. A recent perceptual encounter with a specificphoto of Harrison Ford, such as in short-term priming experiments, may temporarilyexert a disproportionately strong influence on the active representation. However, it isimportant to note that ‘instance’ here is not restricted exclusively to repetitions of theidentical stimuli. Hole, George, Eaves, & Rasek (2002) demonstrated that linear distortion(stretching) does not much affect face recognition, and this has also been demonstratedin the neurophysiological literature: Bindemann, Burton, Leuthold, & Schweinberger,(2008) showed that compared with identical-image priming, prime faces with massivelinear distortions elicit equivalent N250r repetition effects. So, the effects of specificinstances must be understood with more abstract conceptions of an instance than asimple iconic representation. This brings us back to Bruce and Young’s notion of apictorial code: although low level, a representation of a specific encounter has somedegree of abstraction. Perhaps for the case of familiar faces, Hole’s important findingwill require further examination to establish what exactly ‘counts’ as an instance of facerecognition, and more particularly, what may change between presentations in orderfor the encounter to have the same effect as simple repetition.

It seems clear that our earlier attempts to understand FRUs were not sufficientlycomplex, and miss highly important information – both variability, and recency ofspecific experience. Within-person variability is critical in trying to understand facerecognition. If our theoretical accounts ignore it, they overlook the self-evident fact thatpeople do not always look the same, and possibly ignore critical data that will help tosolve the problem.

Multimodal integrationSo far, we have focused entirely on recognizing people via their faces. Our emphasis hasbeen on ‘telling faces together’ rather than telling them apart: we have argued that a keyproblem in face recognition is how we learn to combine very different superficial imagesinto a robust representation of a single person. In fact, it is possible that our experimentalprocedures make this a more difficult problem than it is in daily life. For example, whenwe meet new people, we typically see them over a range of viewing angles, expressions,and lighting changes – all dimensions that can change over just a few seconds. Duringthis process, we have very good evidence that a person retains the same identity, whilegiving rise to different visual experiences (Bruce, 1994). We should also acknowledgethat one’s representation of a new person is almost never limited to the face. Instead, wehear people’s voices, observe their manner of movement and dress, and learn materialfacts about them. These sources of information may combine to support the integrationof a single-person representation over a range of variability. For this reason, it may bethat the relatively novel field of multimodal integration is an important topic of study,even for those whose primary interest lies in face recognition.

Mental representations of familiar faces 953

In this final section, we will consider interactions between visual face informationand auditory voice information. In the Bruce and Young framework, the earliestpoint of convergence at which person recognition, as opposed to face recognition,is achieved is the PIN level, after extensive unimodal processing. However, recentresearch supports the idea that multisensory influences may penetrate to an earlierperceptual processing stage. During communication, dynamic facial and vocal signalsare perceived simultaneously – auditory and visual signals from speaking faces are insystematic and precise spatio-temporal relationship. This presents an opportunity toexploit the regularities between these two streams.

In fact, audiovisual integration (AVI) in speech perception, as exemplified by theMcGurk illusion, may be the best known example of cross-modal stimulus identification(Calvert, Brammer, & Iversen, 1998). More recently, mounting evidence has suggestedthat multisensory integration is invoked for the perception of non-verbal social signalsin faces and voices (Campanella & Belin, 2007; Schweinberger, Robertson, & Kaufmann,2007) and involves multiple brain regions, including those originally thought to berestricted to unimodal processing (Ghazanfar & Schroeder, 2006; von Kriegstein,Kleinschmidt, Sterzer, & Giraud, 2005).

Several studies of face-voice integration have used either static faces (Joassin et al.,in press; Hagan et al., 2009) or unimodal stimuli ( von Kriegstein et al., 2005). Ourown present experimental approach, similar to the McGurk paradigm, demonstratesthat audiovisual correspondence in time is required for optimal perceptual integration.Audio-visual correspondence entails crossmodally synchronous neuronal encoding andmay trigger the brain to attribute multimodal stimuli to the same underlying event(Welch & Warren, 1980). In line with this idea, we showed systematic benefitsand costs for the recognition of familiar voices when these were combined withtime-synchronized articulating faces, of corresponding or non-corresponding speakeridentities, respectively. Moreover, these effects were clear for familiar but not unfamiliarspeakers, suggesting that they depend on an established multimodal representationof a person’s identity. Crucially, the effects were reduced or eliminated when voiceswere combined with static faces (Schweinberger et al., 2007). Other data by Robertsonand Schweinberger (2010) suggest that AVI in person recognition is optimal with aslight auditory lag relative to the facial articulation. A temporal window for AVI inperson recognition was found between approximately 100 ms auditory lead and 300 msauditory lag, which was qualitatively similar but quantitatively extended relative tosimilar studies on AVI in speech perception (cf. Munhall, Gribble, Sacco, & Ward, 1996;van Wassenhove, Grant, & Poeppel, 2007). While the above studies used personallyfamiliar speakers as stimuli, recent research by von Kriegstein et al., 2008) raises thepossibility that multimodal audiovisual representations of a person’s identity may beestablished quickly within one experimental session. At present, we are unaware of directcomparisons with personally familiar speakers, or evidence on the rate of acquisition ofaudio-visual representations of speaker identity. However, understanding the acquisitionof representations of familiar people through learning is now a key issue in face research(Kaufmann et al., 2009; Tanaka et al., 2006), and we anticipate that future research willsee a similar focus on learning in audio-visual person perception too.

Conclusions: Is the FRU still a useful concept?In the 25 years, since Bruce and Young published their paper, the core theoreticaldistinctions it draws have survived remarkably intact. This is perhaps surprising in

954 A. Mike Burton et al.

such a popular research area. These concepts have also driven forward research, beingused as sources of predictions for experimental research across many different types ofmethodology. Our paper is one of many that derives its core theoretical position from theBruce and Young model. However, we are suggesting that some specific modificationsmay be necessary.

The key distinction between pictorial and structural codes may come under strainin the light of the research described here. We have argued that FRUs need totake into account aspects of specific experience with faces – not just the abstractrecord of having seen Harrison Ford on the TV, but the particular characteristics ofthat experience. The incorporation of variability into an FRU seems to require thatpictorial, as well as structural codes are processed specifically for each individual.Furthermore, if episodic memory for people is to be understood, and not regardedby modellers as an inconvenience, then the specifics of recent encounters needsomehow to be understood, and this may involve pictorial, as well as structuralcodes.

Our key concern is that accounts of face recognition need to acknowledge thatpeople display quite extensive variability, and this is well known in daily life. How oftendo people remark that they look nothing like their passport photos, or reject recentlytaken digital images as unflattering, or poor likenesses? If we regard face recognition asa statistical process (following Bruce, 1994), then the issue of within-person variance,as compared to between-person variance, should become the focus of our studies.This cannot be achieved unless we seriously address the measurement of within-personvariability, so often ignored in our experimental procedures.

At the least, this suggestion implies that we should perhaps be less constrainedin choosing stimuli for some of our face recognition experiments. Across a largeliterature (our own work included), researchers expend considerable effort controllingimage characteristics. In its most extreme version, researchers use the same imagein different experimental stages (e.g., as a learning and a test item). Even whenthese are varied, they are often varied minimally (i.e., same camera, same day).In contrast to this, everyday experience suggests that people are recognized overmassive variation. In short, we are suggesting that it is this variability that shouldbecome the focus for face recognition research. Highly constrained stimulus setslead to a particular emphasis in research, whereas research with naturally varyingimages (‘ambient images’) may lead researchers to a more realistic conception of facerecognition.

To be clear, our championing of more naturalistic stimulus sets is not a simplisticappeal for ecological validity. Instead, our proposal brings us back to the original notionof an FRU, a representation that ‘will respond when any view of the appropriate person’sface is seen’ (Bruce and Young, ibid.). The promotion of a research focus on within-person variability is completely consistent with Bruce and Young’s approach to familiarface recognition. So is more research on face recognition using dynamic stimuli, whichhas been actively promoted by Vicki Bruce and her group (e.g., Lander & Bruce, 2003).Bruce and Young may not have anticipated that new audiovisual experiments wouldprovide evidence for perceptual face-voice integration (Campanella & Belin, 2007) andhence would put certain constraints to the domain-specificity and autonomy sensu Fodorof face recognition – but, it is clear that research on multimodal face-voice integrationis still in its early phase. The question ‘How can these different photos all be recognisedas Harrison Ford?’ is the problem of face recognition and brings us back to the power ofthe theoretical FRU concept.

Mental representations of familiar faces 955

ReferencesBegleiter, H., Porjesz, B., & Wang, W. Y. (1995). Event-related brain potentials differentiate

priming and recognition to familiar and unfamiliar faces. Electroencephalography and ClinicalNeurophysiology, 94, 41–49.

Bentin, S., Allison, T., Puce, A., Perez, E., & McCarthy, G. (1996). Electrophysiological studies offace perception in humans. Journal of Cognitive Neuroscience, 8, 551–565.

Bindemann, M., Burton, A. M., Leuthold, H., & Schweinberger, S. R. (2008). Brain potentialcorrelates of face recognition: Geometric distortions and the N250r brain response to stimulusrepetitions. Psychophysiology, 45, 535–544. doi:10.1111/j.1469-8986.2008.00663.x

Botzel, K., Schulze, S., & Stodieck, S. R. G. (1995). Scalp topography and analysis of intracranialsources of face-evoked potentials. Experimental Brain Research, 104, 135–143.

Bruce, V. (1994). Stability from variation: The case of face recognition the MD Vernon memoriallecture. Quarterly Journal of Experimental Psychology, 47A, 5–28.

Bruce, V., Burton, M., Carson, D., Hanna, E., & Mason, O. (1994). Repetition priming of facerecognition. Attention and Performance XV , 15, 179–210.

Bruce, V., Doyle, T., Dench, N., & Burton, M. (1991). Remembering facial configurations.Cognition, 38, 109–144.

Bruce, V., Henderson, Z., Greenwood, K., Hancock, P., Burton, A. M., & Miller, P. (1999).Verification of face identities from images captured on video. Journal of ExperimentalPsychology: Applied, 5, 339–360.

Bruce, V., Henderson, Z., Newman, C., & Burton, A. M. (2001). Matching identities of familiar andunfamiliar faces caught on CCTV images. Journal of Experimental Psychology: Applied, 7,207–218. doi:10.1037//1076-898X.7.3.207

Bruce, V., & Valentine, T. (1985). Identity priming in the recognition of familiar faces. BritishJournal of Psychology, 76 , 363–383.

Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology,77, 305–327.

Burton, A. M., Bruce, V., & Hancock, P. J. B. (1999). From pixels to people: A model of familiarface recognition. Cognitive Science, 23, 1–31.

Burton, A. M., Bruce, V., & Johnston, R. A. (1990). Understanding face recognition with aninteractive activation model. British Journal of Psychology, 81, 361–380.

Burton, A. M., Jenkins, R., Hancock, P. J. B., & White, D. (2005). Robust representa-tions for face recognition: The power of averages. Cognitive Psychology, 51, 256–284.doi:10.1016/j.cogpsych.2005.06.003

Burton, A. M., Wilson, S., Cowan, M., & Bruce, V. (1999). Face recognition in poor quality video:Evidence from security surveillance. Psychological Science, 10, 243–248.

Calvert, G. A., Brammer, M. J., & Iversen, S. D. (1998). Crossmodal identification. Trends inCognitive Sciences, 2, 247–253.

Campanella, S., & Belin, P. (2007). Integrating face and voice in person perception. Trends inCognitive Sciences, 11, 535–543. doi:10.1016/j.tics.2007.10.001

Clutterbuck, R., & Johnston, R. A. (2002). Exploring levels of face familiarity by using an indirectface-matching measure. Perception, 31, 985–994. doi:10.1068/p3335

Clutterbuck, R., & Johnston, R. A. (2004). Matching as an index of face familiarity. VisualCognition, 11, 857–869. doi:10.1080/13506280444000021

Clutterbuck, R., & Johnston, R. A. (2005). Demonstrating how un- familiar faces becomefamiliar using a face matching task. European Journal of Cognitive Psychology, 17, 97–116.doi:10.1080/09541440340000439

Craw, I. (1995). A manifold model of face and object recognition. In T. Valentine (Ed.), Cognitiveand computational aspects of face recognition (pp. 183–203). London: Routledge.

Davies-Thompson, J., Gouws, A., & Andrews, T. J. (2009). An image-dependent representationof familiar and unfamiliar faces in the human ventral stream. Neuropsychologia, 47, 1627–1635. doi:10.1016/j.neuropsychologia.2009.01.017

956 A. Mike Burton et al.

Dorr, P., Herzmann, G., & Sommer, W. (2011). Multiple sources of priming effects for familiar faces:Analyses with backward masking and event-related potentials. British Journal of Psychology,102, 765–782. doi:10.1111/j.2044-8295.2011.02028.x.

Eimer, M. (2000). Event-related brain potentials distinguish processing stages involved in faceperception and recognition. Clinical Neurophysiology, 111, 694–705. doi:10.1016/S1388-2457(99)00285-0

Ellis, A. W., Flude, B. M., Young, A. W., & Burton, A. M. (1996). Two loci of repetition primingin the recognition of familiar faces. Journal of Experimental Psychology: Learning Memoryand Cognition, 22, 295–208.

Ellis, A. W., Young, A. W., & Flude, B. M. (1990). Repetition priming and face processing:Priming occurs within the system that responds to the identity of a face. Quarterly Journal ofExperimental Psychology, 42A, 495–512.

Ellis, A. W., Young, A. W., Flude, B. M., & Hay, D. C. (1987). Repetition priming of face recognition.Quarterly Journal of Experimental Psychology, 39A, 193–210.

Ellis, H. D. (1986). Processes underlying face recognition. In R. Bruyer (Ed.), The neuropsychologyof face perception and facial expression (pp. 1–27). Hillsdale, NJ: Erlbaum.

Georghiades, A. S., Belhumeur, P. N., & Kriegman, D. J. (2001). From few to many: Illuminationcone models for face recognition under variable lighting and pose. IEEE Transactions onPattern Analysis and Machine Intelligence, 23(6), 643–660.

Ghazanfar, A. A., & Schroeder, C. E. (2006). Is neocortex essentially multisensory? Trends inCognitive Sciences, 10, 278–285. doi:10.1016/j.tics.2006.04.008

Gordon, I., & Tanaka, J. W. (2011). The role of name labels in the formation of face representationsin event-related potentials. British Journal of Psychology, 102, 884–898. doi: 10.1111/j.2044-8295.2011.02064.x

Graham, R., & Dawson, M. R. W. (2005). Using artificial neural networks to examine event-relatedpotentials of face memory. Neural Network World, 15, 215–227.

Hagan, C. C., Woods, W., Johnson, S., Calder, A. J., Green, G. G. R., & Young, A. W. (2009).MEG demonstrates a supra-additive response to facial and vocal emotion in the right superiortemporal sulcus. Proceedings of the National Academy of Sciences of the United States ofAmerica, 106 , 20010–20015. doi:10.1073/pnas.0905792106

Hancock, P. J. B., Burton, A. M., & Bruce, V. (1996). Face processing: Human perception andprincipal components analysis. Memory & Cognition, 24, 26–40.

Haxby, J. V., Hoffman, E. A., & Bobbini, M. I. (2000). The distributed human neural system for faceperception. Trends in Cognitive Sciences, 4(6), 223–233.

Hay, D. C., & Young, A. W. (1982). The human face. In A. W. Ellis (Ed.), Normality and pathologyin cognitive functions (pp. 173–202). London: Academic Press.

Henderson, Z., Bruce, V., & Burton, A. M. (2001). Matching the faces of robbers captured on video.Applied Cognitive Psychology, 15, 445–464. doi:10.1002/acp.718

Herrmann, M. J., Ehlis, A. C., Ellgring, H., & Fallgatter, A. J. (2005). Early stages (P100) of faceperception in humans as measured with event-related potentials (ERPs). Journal of NeuralTransmission, 112, 1073–1081. doi:10.1007/s00702-004-0250-8

Herzmann, G., Schweinberger, S. R., Sommer, W., & Jentzsch, I. (2004). What’s specialabout personally familiar faces? A multimodal approach. Psychophysiology, 41, 688–701. doi:10.1111/j.1469-8986.2004.00196.x

Hole, G. J., George, P. A., Eaves, K., & Rasek, A. (2002). Effects of geometric distortions onface-recognition performance. Perception, 31, 1221–1240. doi:10.1068/p3252

Itier, R. J., & Taylor, M. J. (2004). Effects of repetition learning on upright, in-verted and contrast-reversed face processing using ERPs. NeuroImage, 21, 1518–1532.doi:10.1016/j.neuroimage.2003.12.016

Jenkins, R., & Burton, A. M. (2008). 100% accuracy in automatic face recognition. Science, 319,435. doi:10.1126/science.1149656

Jenkins, R., & Burton, A. M. (in press). Stable face representations. Philosophical Transactions ofthe Royal Society, B.

Mental representations of familiar faces 957

Joassin, F., Pesenti, M., Maurage, P., Verreckt, E., Bruyer, R., & Campanella, S. (2011). Cross-modalinteractions between human faces and voices involved in person recognition. Cortex, 47,367–376. doi:10.1016/j.cortex.2010.03.003

Kaufmann, J. M., Schweinberger, S. R., & Burton, A. M. (2009). N250 ERP correlates of theacquisition of face representations across different images. Journal of Cognitive Neuroscience,21(4), 625–641.

Kirby, M., & Sirovich, L. (1990). Applications of the Karhunen–Loeve procedure for the charac-terisation of human face. IEEE: Transactions on Pattern Analysis and Machine Intelligence,12, 103–108.

Lander, K., & Bruce, V. (2003). The role of motion in learning new faces. Visual Cognition, 10,897–912. doi:10.1080/13506280344000149

Langeslag, S. J. E., Morgan, H. M., Jackson, M. C., Linden, D. E. J., & Van Strien, J. W.(2009). Electrophysiological correlates of improved short-term memory for emotional faces.Neuropsychologia, 47, 887–896. doi:10.1016/j.neuropsychologia.2008.12.024

Latinus M., & Taylor M. J. (2006). Face processing stages: Impact of difficulty and the separationof effects. Brain Research, 1123, 179–187. doi:10.1016/j.brainres.2006.09.031

Marr, D. (1982). Vision. San Francisco: Freeman.Marr, D., & Nishihara, K. (1978). Representation and recognition of the spatial organisation of

three-dimensional shapes. Proceedings of the Royal Society of London, Series B, 200, 269–294.

Morton, J. (1969). Interaction of information in word recognition. Psychological Review, 76 ,163–178.

Morton, J. (1979). Facilitation in word recognition: Experiments causing change in the logogenmodel. In P. A. Kolers & M. Wrolstad (Eds.), Processing of visible language (pp 259–268).New York: Plenum.

Munhall, K. G., Gribble, P., Sacco, L., & Ward, M. (1996). Temporal constraints on the McGurkeffect. Perception & Psychophysics, 58, 351–362.

Natu, V., & O’Toole, A. J. (2011). The neural processing of familiar and unfamiliar faces: Areview and synopsis. British Journal of Psychology, 102, 726–747. doi: 10.1111/j.2044-8295.2011.02053.x

Neumann, M. F., Mohamed, T. N., & Schweinberger, S. R. (2011). Face and ob-ject encoding under perceptual load: ERP evidence. NeuroImage, 54, 3021–3027.doi:10.1016/j.neuroimage.2010.10.075

Pfutze, E. -M., Sommer, W., & Schweinberger, S. R. (2002). Age-related slowing in face and namerecognition: Evidence from event-related brain potentials. Psychology and Aging, 17, 140–160.

Phillips, P. J., Scruggs, W. T., O’Toole, A. J., Flynn, P. J., Bowyer, K. W., Schott, C. L. & Sharpe, M.(2010). FRVT 2006 and ICE 2006 large-scale experimental results. IEEE Transactions PatternAnalysis and Machine Intelligence, 32(5), 831–846. doi:10.1109/TPAMI.2009.59

Puce, A., Allison, T., & McCarthy, G. (1999). Electrophysiological studies of human face perception.III: Effects of top-down processing of face-specific potentials. Cerebral Cortex, 9, 445–458.

Robertson, D. M. C., & Schweinberger, S. R. (2010). The role of audiovisual asynchronyin person recognition. Quarterly Journal of Experimental Psychology, 63, 23–30.doi:10.1080/17470210903144376

Schweinberger, S. R. (1996). How Gorbachev primed Yeltsin: Analyses of associative priming inperson recognition by means of reaction times and event-related brain potentials. Journal ofExperimental Psychology: Learning, Memory, and Cognition, 22, 1383–1407.

Schweinberger, S. R., & Burton, A. M. (2003). Covert recognition and the neural substrate for faceprocessing. Cortex, 39, 9–30.

Schweinberger, S. R., Huddy, V., & Burton, A. M. (2004). N250r – a face-selective brainresponse to stimulus repetitions. NeuroReport, 15, 1501–1505. doi:10.1097/01.wnr.0000131675.00319.42

958 A. Mike Burton et al.

Schweinberger, S. R., Kaufmann, J. M., Moratti, S., Keil, A., & Burton, A. M. (2007). Brain responsesto repetitions of human and animal faces, inverted faces, and objects – an MEG study. BrainResearch, 1184, 226–233. doi:10.1016/j.brainres.2007.09.079

Schweinberger, S. R., Pfutze, E.-M., & Sommer, W. (1995). Repetition priming and associativepriming of face recognition: Evidence from event-related potentials. Journal of ExperimentalPsychology: Learning, Memory, and Cognition, 21, 722–736.

Schweinberger, S. R., Pickering, E. C., Burton, A. M., & Kaufmann, J. M. (2002). Human brainpotential correlates of repetition priming in face and name recognition. Neuropsychologia,40, 2057–2073.

Schweinberger, S. R., Pickering, E. C., Jentzsch, I., Burton, A. M., & Kaufmann, J. M. (2002).Event-related brain potential evidence for a response of inferior temporal cortex to familiarface repetitions. Cognitive Brain Research, 14, 398–409.

Schweinberger, S. R., Robertson, D., & Kaufmann, J. M. (2007). Hearing facial identities. QuarterlyJournal of Experimental Psychology, 60, 1446–1456. doi:10.1080/17470210601063589

Stahl, J., Wiese, H., & Schweinberger, S. R. (2010). Learning task affects ERP-correlates of theown-race bias, but not recognition memory performance. Neuropsychologia, 48, 2027–2040.doi:10.1016/j.neuropsychologia.2010.03.024

Tanaka, J. W., Curran, T., Porterfield, A. L., & Collins, D. (2006). Activation of preexisting andacquired face representations: The N250 event-related potential as an index of face familiarity.Journal of Cognitive Neuroscience, 18, 1488–1497. doi:10.1162/jocn.2006.18.9.1488

Troje, N., & Bulthoff, H. (1995). Face recognition under varying pose: The role of texture andshape. Vision Research, 36 , 1761–1771.

Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience,3, 71–86.

van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window ofintegration in auditory-visual speech perception. Neuropsychologia, 45, 598–607.doi:10.1016/j.neuropsychologia.2006.01.001

von Kriegstein, K., Dogan, O., Gruter, M., & Giraud, A. L., Kell, C. A., Gruter, T., . . . Kiebel, S. J.(2008). Simulation of talking faces in the human brain improves auditory speech recognition.Proceedings of the National Academy of Sciences of the United States of America, 105,6747–6752. doi:10.1073/pnas.0710826105

von Kriegstein, K., Kleinschmidt, A., Sterzer, P., & Giraud, A. L. (2005). Interaction of face andvoice areas during speaker recognition. Journal of Cognitive Neuroscience, 17, 367–376.

Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy.Psychological Bulletin, 88, 638–667.

Wiese, H., & Schweinberger, S. R. (2008). Event-related potentials indicate different processes tomediate categorical and associative priming in person recognition. Journal of ExperimentalPsychology: Learning, Memory, and Cognition, 34, 1246–1263. doi:10.1037/a0012937

Young, A. W., & Burton, A. M. (1999). Simulating face recognition: Implications for modellingcognition. Cognitive Neuropsychology, 16 , 1–48.

Zaki, S. R., Nosofsky, R. M., Stanton, R. D., & Cohen, A. L. (2003). Prototype and exemplaraccounts of category learning and attentional allocation: A reassessment. Journal of Exper-imental Psychology Learning, Memory, and Cognition, 29, 1160–1173. doi:10.1037/0278-7393.29.6.1160

Zhao, W., Chellappa, R., Phillips, P. J., & RosenWeld, A. (2003). Face recognition: A literaturesurvey. ACM Computing Surveys, 35, 399–458.

Received 12 November 2010; revised version received 16 February 2011