Transcript
Page 1: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Infant and Child DevelopmentInf. Child Dev. 21: 555–578 (2012)Published online 2 May 2012 in Wiley Online Library(wileyonlinelibrary.com). DOI: 10.1002/icd.1757

Distinct Facial CharacteristicsDifferentiate Communicative Intentof Infant-Directed Speech

*Corresponden41, The Univeutdallas.eduThis research wthe doctoral de

Copyright © 201

Kate Georgelas Shepard*, Melanie J. Spence andNoah J. SassonSchool of Behavioral and Brain Sciences, The University of Texas at Dallas,Richardson, TX, USA

Adults and infants can differentiate communicative messages usingthe nonlinguistic acoustic properties of infant-directed (ID) speech.Although the distinct prosodic properties of ID speech havebeen explored extensively, it is currently unknown whether thevisual properties of the face during ID speech similarly conveycommunicative intent and thus represent an additional source ofsocial information for infants during early interactions. To examinewhether the dynamic facial movement associated with IDspeech confers affective information independent of the acousticsignal, adults’ differentiation of the visual properties of speakers’communicative messages was examined in two experiments inwhich the adults rated silent videos of approving and comfortingID and neutral adult-directed speech. In Experiment 1, adultsdifferentiated the facial speech groups on ratings of the intendedrecipient and the speaker’s message. In Experiment 2, an orig-inal coding scale identified facial characteristics of the speak-ers. Discriminant correspondence analysis revealed two factorsdifferentiating the facial speech groups on various characteristics.Implications for perception of ID facial movements in relation tospeakers’ communicative intent are discussed for both typicallyand atypically developing infants. Copyright © 2012 John Wiley& Sons, Ltd.

Key words: infant-directed speech; facial speech; face perception;emotion perception

ce to: Kate Shepard, School of Behavioral and Brain Sciences, Mailstop GRrsity of Texas at Dallas, Richardson, TX 75080, USA. E-mail: kateshepard@

as completed as a partial fulfilment of the qualifying thesis requirement forgree in Psychological Sciences for the first author.

2 John Wiley & Sons, Ltd.

Page 2: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

556 K. G. Shepard et al.

INTRODUCTION

Adults communicate with infants using infant-directed (ID) speech, regardless oftheir prior experience with babies (Jacobson, Boersma, Fields & Olson, 1983). IDspeech, compared with adult-directed (AD) speech, has higher fundamentalfrequency (F0), more variation in frequency range, longer pauses and shorterutterances (Fernald & Simon, 1984), which directly impact the prosody, or therhythm and intonation patterns of speech. Adults use ID speech for a variety offunctions, such as modulating infants’ affect and attention in the early months of lifeand facilitating language learning later on (Fernald, 1992). Adults communicatedistinct intents or messages to infants, such as approving, using different prosodicproperties (Papousek, Papousek & Symmes, 1991); these intents are distinguishedby adult listeners on the basis of the acoustic prosodic properties in the absence ofthe linguistic content of the speech (Fernald, 1989). Moreover, 6-month-old infantscategorize the acoustic signals of different intents (Moore, Spence & Katz, 1997;Spence &Moore, 2003), which suggests that preverbal infants may use nonlinguisticproperties of speech to facilitate their interpretation of caregivers’ messages.

A significant property of speech that has been understudied is the visualcomponent of ID speech or the facial movements and expressions that accompanythe acoustic signal (Chong,Werker, Russell & Carroll, 2003). Speakers often use facialexpressions to further communicate their intendedmessage, and adultsmay bemorevisually expressive when addressing children versus adults (Swerts & Krahmer,2010). A good listener is able to incorporate facial cues with the auditory signal tointerpret the intent of the message, and failure to do so is often considereddetrimental to the social interaction, as seen in certain social disorders (e.g. autismspectrum disorders). A casual observer of a mother speaking to her infant willquickly note the exaggerated, engaging expressions portrayed in the mother’s face(Werker & McLeod, 1989). The facial movements that accompany ID speech mayportray distinct facial expressions that capture infants’ attention differently thanAD faces (Chong et al., 2003), and ID faces may be especially salient to young infantsbecause most of their early social experiences involve close face-to-face interactionswith their caregivers.

While speaking to an infant, an adult’s face often displays such characteristics asexaggerated smiles, wide eyes and raised eyebrows (Werker & McLeod, 1989).Chong and her colleagues (2003) quantified facial expressions made by mothersduring interactions with their infants, which differed significantly from the mothers’adult-oriented facial expressions. Mothers from either Chinese-speaking or English-speaking backgrounds were video-recorded while speaking to their infants. Naïveraters identified three distinct facial expressions, or ID expressions, used by all 20mothers in the study, regardless of language background. These findings suggestthat distinct facial characteristics are portrayed during mother–infant interactions;whether these vary depending on the speaker’s communicative intent or spokenmessage has not been examined.

Although the unimodal auditory signal of ID speech has been thoroughlyinvestigated (e.g. Cooper, Abraham, Berman & Staska, 1997; Fernald, 1985, 1989,1992, 1993; Golinkoff & Alioto, 1995; Jacobson et al., 1983; Kitamura & Burnham,2003; Kitamura, Thanavishuth, Burnham & Luksaneeyanawin, 2002; Papousek,Bornstein, Nuzzo, Papousek & Symmes, 1990; Papousek et al., 1991), variations ofID facial movements and expressions are less understood. Fernald (1989), forexample, demonstrated that the prosody, or melodic rhythm, of the acoustic speechsignal carries the intended message in the absence of the linguistic content of themessage. Alternatively, Graf et al. (2002), (as cited in Blossom & Morgan, 2006)

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 3: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Facial Characteristics of ID Speech Intent 557

suggested the visual prosody of the rhythmic movements of the face and head duringspeech production may highlight the speech signal. It is well known that visualspeech perception affects auditory speech perception even in young infants, suchas during a test of the McGurk effect (Burnham & Dodd, 2004; McGurk &MacDonald, 1976), which suggests that infants are capable of integrating auditoryand visual information. Further, speech movements made during ID speech, suchas vowel-specific exaggerations of the lips, differ from AD speech (Green, Nip,Wilson, Mefferd & Yunusova, 2010), which may facilitate infants’ visual perceptionof the acoustic signal.

Beyond the visual linguistic properties, Walker-Andrews (1986) found that7-month-old infants matched the happy or angry affective visual properties of a faceto the corresponding auditory speech signal while watching the speaker’s face withthe mouth occluded. Thus, the infants demonstrated integration of the nonlinguisticcharacteristics of the face with the acoustic speech signal, an indication that facialexpressions, as portrayed in areas such as the eyes, may have provided affectivecommunicative information to the infants.

Despite these demonstrations of infants’ integration of audiovisual information,little is known concerning the degree to which speakers modulate nonlinguisticvisual properties of the face when communicating distinct affective messages.Modulation of more general actions has been demonstrated, which may beanalogous tomodifications in ID speech (Meyer, Hard, Brand,McGarvey& Baldwin,2011). The term motionese has been used to describe mothers’ actions duringinteractions with their infants; when showing a novel object to an infant or an adult,mothers’ ID actions are more exaggerated in motion, simplistic and repetitivecompared with their AD actions (Brand, Baldwin & Ashburn, 2002). Such actionsmay be particularly important in caregivers’ use of acoustic packaging or themultimodal alignment of speech with actions (Hirsh-Pasek &Golinkoff, 1996; Meyeret al., 2011). Meyer and colleagues (2011) found, for example, that mothers’ actiondemonstrations to their infants were often accompanied by action-describingutterances, providing multimodal input that may facilitate infants’ understandingof goal-directed actions. The combination of visual and auditory stimulation mayprovide salient redundant multimodal information to infants, as proposed bythe intersensory redundancy hypothesis (Bahrick, Lickliter & Flom, 2004). Yet, therelative contribution of visual facial modifications in relation to specific ID messagesremains unclear.

Infants’ prelinguistic differentiation of ID speech intents may be supported by thedynamic visual properties of the ID face, along with or in place of the acousticmessage. The current study seeks to identify the distinct properties of facialmovements portrayed during adults’ production of ID speech to examine whetheradults vary facial movement during ID speech dependent on communicative intent.Adults were asked in Experiment 1 to distinguish silent videos of women speakingin AD speech and in approving (e.g. ‘Good girl!’) or comforting (e.g. ‘Poor baby’)ID speech, and to quantify specific facial characteristics that helped make thosedistinctions in Experiment 2. Thus, the current study differs from and extends uponprevious reports examining visual properties of faces during ID speech (e.g. Chonget al., 2003) by (1) assessing if the facial expressions associated with ID speech canbe categorized by adults in the absence of the acoustic signal and (2) quantifyingwhether these characteristics differ from those activated during AD speech and varydepending upon affective ID intent of the speaker.

In Experiment 1, naïve adults rated silent videos of women talking to infants oradults on the basis of the speaker’s message (approving, comforting or neither) andthe speaker’s intended recipient (infant or adult). It was hypothesized that the visual

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 4: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

558 K. G. Shepard et al.

properties of ID interactions provide the listener with relevant social informationbeyond the acoustic signal. Although adults can distinguish ID communicativeintent on the basis of the prosodic acoustic properties without linguistic information(Fernald, 1989), support for this hypothesis would suggest that the visual IDsignal also independently communicates information about the speaker’s intent.That is, the visual properties of ID speech may carry the intended message muchas the acoustic prosodic properties do in the absence of linguistic information(Fernald, 1989).

Experiment 2 examined the specific characteristics of the silent facial speechvideos using a coding scale developed specifically for this study. The Facial SpeechCoding Scale (FSCS) was created to identify specific qualities of dynamicfacial speech that are captured in video stimuli by rating the intensity of facialcharacteristics, such as wide eyes, on a scale of 0 (none) to 3 (very wide). Existingcoding scales (e.g. Facial Action Coding System; Ekman & Friesen, 1978) wereinappropriate for the specific aim of this study because they identify specific musclemovements during affective expression rather than quantify facial movementsduring facial speech. The new coding scale was developed for the purpose ofidentifying visual prosodic properties that may exist during fluid facial speech. Itwas hypothesized that distinct facial characteristics support the nonlinguisticdifferentiation of ID speech intents.

EXPERIMENT 1

The purpose of Experiment 1 was to assess whether naïve adults coulddifferentiate silent videos of approving and comforting ID speech based on visualcues alone. Silent clips of neutral AD speech were also included to ensure thedistinctions between infant and adult directedness and affective versus neutralfacial speech qualities. Differentiation of the three facial speech groups wouldestablish that the visual characteristics of the speaking face support thelisteners’ interpretation of the message being portrayed, beyond its acoustic andlinguistic properties.

Method

ParticipantsAdult participants included 28 university students who were enrolled in

undergraduate psychology courses and received credit for research participation.Four participants could not provide rating data because of computer error. The final24 participants included nine male students, aged 18 to 31 years (M=22.56 years),and 15 female students, aged 18 to 25years (M=22.13 years). Participants includedone African American person, four Hispanic persons, five Asian persons, 10Caucasian persons and four individuals with mixed or other races. A fewexperience-based questions were asked to ensure that the adult sample had previousexperience with infants and children, which would provide validation for theirknowledge of the differences between ID and AD speech. Two participants (8%)had children (aged 6months to 4 years), whereas 17 (71%) of the participantsreported having had experience with children. Experiences ranged from caringfor younger siblings to working in a child care setting. Thirteen (76%) of the 17participants reported having had experience with children under the age of 3 years.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 5: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Facial Characteristics of ID Speech Intent 559

StimuliTwo sets of stimuli were used in the current set of experiments.

Infant-directed stimuliThe ID stimuli consisted of video recordings of the faces of adult womenwatching

videos of infants performing various activities (Atchison, Spence & Touchstone,2009). The adult women (n=79)were recruited from core undergraduate psychologycourses and the surrounding community through an online sign-up system or wordofmouth. Participants were invited to provide realistic samples of ‘how adults speakto infants’ for use in infant research. Participants spoke English as a primarylanguage.

Participants were video recorded using a Sony TRV530 Digital Handycam videocamera and audio recorded through a separate microphone. The women wereapproximately 2 ft (0.61m) from a 5500 Sony LCD Projection television on whichvideos of infants were displayed, and they wore hidden in-the-ear headphones sothey could hear the infants in the videos. Participants were encouraged to speak tothe infants as if they were actually interacting using approving and comfortingmessages. Approving and comforting ID speech are two well-validated and distinctaffective categories that communicate positive intent, are discriminable on funda-mental frequency and frequency variability, and are regularly heard by infants atabout 6months of age (Fernald, 1992; Kitamura & Burnham, 2003; Moore et al.,1997). Approving ID speech samples were collected while participants encouragedinfants in the video segments to perform tasks like walking or blowing raspberries,whereas comforting ID speech samples were elicited during video segments inwhich infants were crying or falling asleep.

Final recordingswere selected by the original authors (Atchison et al., 2009) on thebasis of the visibility of each woman’s head, neck and top of the shoulders, as well asthe appropriate rising pitch contours of her approving utterances and fallingcontours of her comforting utterances. A rating study was conducted on 210 videoclips of approving and comforting utterances to reduce the final stimulus set(Atchison, 2010). From the ratings, 20 pairs of videos from 10 female speakers (sevenCaucasian, one African American, one Asian and one person of another race) wereselected by the original authors. All women were in their early to mid-1920s. Thus,the final set of ID stimuli used in the current study consisted of 40 stimuli from10 women, with each speaker providing two approving and two comforting videos.

When the audio was retained in the original clips, approving ID speech videoshad a mean frequency of 309.76Hz (SD= 37.81), and comforting ID speechvideos had a mean frequency of 248.05Hz (SD=43.84). These frequencies werereflective of Fernald’s (1989) finding that approving utterances have higher frequen-cies than comforting utterances. The videos ranged in length from 1.43 to 3.60 s(approving videos, M=2.07, SD=0.53; comforting videos, M=2.05, SD=0.52). Thecurrent study of ID facial speech presented the 20 pairs of videos without audio.

Adult-directed stimuliThemain goal of Experiment 1 was to determine whether naïve adult raters could

differentiate meaningful ID speech videos in the absence of auditory information. Asa secondary goal, the ability for adults to perceive infant directedness in the videoswas assessed by including 15 AD facial speech videos. The AD videos were selectedfrom the DOD/DARPAHuman ID Project database (O’Toole et al., 2005), a databaseof videos and images of people portraying a variety of facial expressions and facial

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 6: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

560 K. G. Shepard et al.

orientations. Within this database are videos of women recorded while viewingvideos or interacting with live persons in an AD manner. The AD videos wererandomly chosen from the neutral speaking clips of women in the database,although videos did not contain auditory information.

All of the ADvideos chosen depicted recordings of female speakers (n=14)whoseface, neck and top of the shoulders were visible; one of the women appeared intwo of the videos as a result of differing hairstyles that significantly altered herappearance. The women, who were in their early to mid-twenties, included nineCaucasian women, four Asian women and one Hispanic woman. The women wereanswering questions during a conversation with another adult. Some of the videodurations lasted up to 30 s, so the videos were clipped into smaller segments usingAppleQuickTime Pro v 7.6.5, ensuring each video ended at appropriate times duringthe speaker’s utterance (e.g. between words). The final 15 videos ranged in lengthfrom 1.78 to 4.12 s (M=3.11, SD=0.64).

Comparability of infant-directed and adult-directed stimuli

The ID stimuli were elicited to approximate the images of the existingDOD/DARPAHuman ID Project database by having each woman wear a grey smock while sittingin front of a grey backdrop, as was carried out for the videos in the DOD/DARPAdatabase. This eliminated differences in clothing and background, although subtledifferences in lighting resulted in some videos in both stimulus sets having slightlylighter or darker backgrounds. However, the videos were presented successively,which likely minimized the detection of subtle background lighting differences. Asample image from the current ID set and existing AD set is depicted in Figure 1.

Figure 1. Still frames excerpted from the video stimuli used in Experiments 1 and 2. The topimage depicts an infant-directed speaker, and the bottom image depicts an adult-directedspeaker.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 7: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Facial Characteristics of ID Speech Intent 561

MEASURES

Two Likert-type rating scales were developed for the current rating study. TheSpeaker’s Message (SM) scale was developed to assess adults’ interpretation of thespeaker’s message while viewing the silent facial speech videos. This scale rangedfrom 1 (definitely comforting) to 3 (neither approving nor comforting) to 5 (definitelyapproving). Approving messages were described as ‘the type of speech one uses toencourage or praise someone for something done well. For example, someone justwon a race, so you say, “Congratulations, you did great!”’ Comforting messageswere described as ‘the type of speech one uses to soothe or calm someone who isupset. For example, someone is sad, so you say, “Don’t worry, you’ll be okay.”’

The Infant-Directed/Adult-Directed (IDAD) scalewas developed to assess adults’interpretation ofwhether the speakerwas addressing an infant or an adult. This scaleranged from 1 (definitely infant directed) to 3 (neither adult nor infant directed) to 5(definitely adult directed). Infant directedwas described as ‘the type of speech one useswhen talking to an infant (a baby between the ages of newborn to 1 year)’. Adultdirected was described as ‘the type of speech one uses when talking to an adult, likethe way the experimenter is speaking to you’. A score of 0 was used to denote ‘I don’tknow’ on both scales.

PROCEDURE

Adult participants attended one of four small group sessions during which theyprovided informed consent, completed a short questionnaire about demographicinformation and experience with infants and children, and received a packetfor recording ratings. A Microsoft PowerPoint presentation displayed the ratinginstructions and three practice videos to be rated. Following practice, the 55 videos(40 ID videos, 15 AD videos) were presented in random order (Randomizer.org).Each silent video was played for 5 s and presented two times in succession toallow for better familiarization with the stimuli. Two randomized PowerPointpresentations were created, so about half of the raters (n= 11) viewed randomcondition 1 and the other half viewed random condition 2 (n=13).

RESULTS AND DISCUSSION

Results were analysed by computing the average rating across raters for each videoon each of the two scales for the two conditions. In total, ratings of 0 (‘I don’t know’)were given 129 times out of 2640 ratings (or 4.8%); these ratings were excluded fromthe averages. Next, the two sets of averages from conditions 1 and 2 were averagedto compute the finalmean rating score for each of the 55 videos. Thus, each video hadonemean rating score for the SM scale and onemean rating score for the IDAD scale.The mean rating scores were analysed using t-tests to examine differences betweenthe ID and AD videos. An alpha level of 0.05 was used for all statistical tests.

Speaker’s Message scale

Group means for the SM scale for the three types of videos are reported in Table 1.The mean rating score for each of the 20 approving ID videos, 20 comforting IDvideos and 15 AD videos were compared. Approving ID videos (M=4.05, SD=0.45)

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 8: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Table 1. Speaker’s message and infant-directed/adult-directed ratings for ID and ADvideos

Facial speech video

Speaker’s message Infant or adult directed

Mean (SD) Max Min Mean (SD) Max Min

ID (n=40) 2.93 (1.24) 4.72 1.16 2.30 (0.66) 3.81 1.12Approving (n= 20) 4.05 (0.45) 4.72 3.17 2.23 (0.67) 3.81 1.12Comforting (n= 20) 1.81 (0.57) 2.67 1.16 2.37 (0.66) 3.50 1.21

AD (n= 15) 2.93 (0.37) 4.09 2.44 4.16 (0.33) 4.77 3.60

Speaker’s message ratings based on a 5-point scale (1 = definitely comforting, 3 =neither approving norcomforting, 5 = definitely approving). Infant-directed (ID)/adult-directed (AD) ratings based on a 5-pointscale (1 = definitely infant directed, 3 =neither adult nor infant directed, 5 = definitely adult directed).

562 K. G. Shepard et al.

were rated significantly more approving than the comforting ID videos (M= 1.81,SD=0.57), t(38) = 13.65, p< 0.001, two-tailed, and the AD videos (M=2.93,SD=0.37), t(33)= 7.73, p< 0.001, two-tailed. The comforting ID videos were ratedsignificantly more comforting than the AD videos, t(33) =�6.59, p< 0.001, two-tailed. These results indicate that the speaker’s message of the silent ID videos wasdistinguished by the adult raters, despite the inaudible presentation of the ID speech.Further, the AD videos were perceived to be neither approving nor comforting to theadult raters as indicated by a single-sample t-test against chance (a rating of 3 on theSM scale), t(14) = 0.72, p=0.48, two-tailed.

Infant-Directed/Adult-Directed scale

Group means for the IDAD scale are reported in Table 1. The mean rating scores ofeach of the 40 ID videos were compared with the mean rating scores of each of the15 AD videos. The hypothesis that adults would rate the ID videos (M=2.30,SD=0.66) as more infant directed than the AD videos (M=4.16, SD=0.33) wassupported, t(53) =�10.33, p< 0.001, two-tailed. These results suggest that adultraters were able to infer the intended recipient of the speaker’s message on the basisof visual information alone, although the visual information used by the adults tomake this inference was not assessed until Experiment 2.

Within the ID category, there were no differences between IDAD ratings of the 20approving (M=2.23, SD=0.67) and 20 comforting (M=2.37, SD=0.66) ID videos,t(38) =�0.68, p=0.50, two-tailed, suggesting that the adult raters did not perceiveone type of ID speech to be more infant directed than the other.

CONCLUSIONS

The purpose of Experiment 1 was to determine whether naïve adults coulddifferentiate silent videos of women speaking approving and comforting ID speechand to distinguish the ID intents from neutral AD speech. Adult raters differentiatedthe speakers’ intent in the silent ID videos, with the ADvideos appropriately rated as‘neither approving nor comforting’, because the AD speakers were speaking in aneutral tone to a fellow adult. Adult raters interpreted the approving ID speakersas talking approvingly and the comforting ID speakers as talking in a comfortingmanner. Because the videoswere played silently, one can assume that the adult ratersused some combination of a variety of visual cues to interpret the speaker’s message.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 9: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Facial Characteristics of ID Speech Intent 563

The adult raters may have read the lips of the speakers, as adults may do whenhearing speech in noise, such as at a noisy restaurant. Because ID speech is spokenmore slowly and with longer duration of syllables and sounds (Fernald & Simon,1984; Thiessen, Hill & Saffran, 2005), the words of the ID speakers may have beenvisually explicit enough for the adult raters to interpret. However, aside from theexaggerated lip movements present in ID faces (Green et al., 2010), there may alsobe exaggerated emotional facial expressions in ID faces (Chong et al., 2003), just asID speech exaggerates the audible expression of emotion (Trainor, Austin &Desjardins, 2000). Such facial exaggerations may facilitate the differentiation of IDfrom AD faces and may also facilitate the differentiation of specific intents, such asapproving and comforting properties of the face. Experiment 2 indirectly assessedthe possibility that differentiation of the silent videos was simply a result of lipreading by examining the specific featural characteristics of the faces in the ID andAD videos that adults may have used in their differentiation of the stimuli.

EXPERIMENT 2

Although the adult raters clearly distinguished approving ID, comforting ID andneutral AD speech videos in Experiment 1, the specific visual cues used tomake these distinctions was unclear. Experiment 2 assessed the specific facialcharacteristics that were portrayed in the silent facial speech videos to identify theprecise features the adults may have used to distinguish the facial speech intents.A coding scale was developed for this experiment that was intended to capture thedegree to which dynamic facial characteristics deviated from neutral features. Asmentioned, existing coding scales, such as the Facial Action Coding System (Ekman& Friesen, 1978), were not sufficient in capturing adults’ interpretations of theconfigurations of the faces at a qualitative level rather than through the identificationof specific muscle movements. Thus, the FSCSwas established to measure the visualproperties of the face during a social interaction, such as the affective quality of themouth (smiling, frowning, neutral) or eyes (smiling, sad, neutral).

It was hypothesized from qualitative inspection that approving ID videos wouldportray more up–down head movements (or nods), a greater degree of eyebrowraising, wider eyes that appeared to be ‘smiling’, a greater degree of smiling in thelips and greater teeth visibility than the comforting ID videos and the AD videos.On the other hand, comforting ID videos were expected to portray more side-to-sidehead movements (or head shaking), a greater degree of furrowed eyebrows, sadeyes, rounded lips and a greater degree of frowning than the other videos. If naïveadults detected these characteristics and discriminated intent based upon them, itwould suggest that distinct facial characteristics are portrayed during the productionof ID speech signals that express different intents, above and beyond informationcommunicated by the acoustic signal.

Method

ParticipantsThree female adult raters (M=23.33years, SD=2years) were trained to use

the FSCS. Training took place over the course of several weeks during the scaledevelopment; however, the raters remained blind to specific hypotheses about thefacial speech videos. The raters were also blind to the specific facial speech intent(e.g. approving ID) portrayed by each female speaker.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 10: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

564 K. G. Shepard et al.

ProtocolThe FSCS divided the facial stimulus into six areas: head movement/orientation,

forehead/eyebrows, eyes, nose, mouth/lips/teeth and chin. Each area was furtherdivided into 11 features with a total of 35 characteristics to be rated. The final scaleand coding guidelines are presented in Figure 2. Each of the 11 features in the far leftcolumn of Figure 2 was divided into descriptions of characteristics of the featuresthat might be present during facial speech. The bolded descriptions represented nineneutral characteristics that were coded as 1 if present (e.g. relaxed forehead) or 0 ifanother characteristic was present (e.g. if the forehead was wrinkled [1], the boldedRelaxed characteristic was rated as not present [0]). Thus, the neutral categories weremutually exclusive of the other descriptions within each feature category.The remaining 26 characteristics were coded on a Likert-type rating scale thatrepresented the intensity of each characteristic, such that 1 indicated ‘slightly present’,2 indicated ‘present’ and 3 indicated ‘greatly present’. Thus, a very wide smile wascoded as 3, whereas a slightly tilted head was coded as 1. An item was coded as 0if that characteristic was not portrayed.

StimuliThe video stimuli used in Experiment 1 were randomly inserted into a PowerPoint

presentation for the current experiment. The 55 facial speech videos included 20approving ID videos, 20 comforting ID videos and 15 AD videos. The PowerPointpresentation was designed so that each rater could watch a video as many times asneeded while coding the speaker’s facial features, which improved accuracy in theraters’ coding.

Inter-rater reliabilityAfter training on the final coding scale, each rater coded three randomly sampled

videos from the stimulus set. Inter-rater reliability was calculated by separating thedichotomous ratings of the neutral characteristics (n=9) and the Likert-scaleratings of the remaining characteristics (n=26). Exact agreement was 78% on thedichotomous ratings of the neutral characteristics. Agreement within one pointwas deemed appropriate for the Likert-scale ratings, because it was important thatraters perceived a relatively similar degree of variation from neutral in the featurecharacteristics. Raters had 94% agreement on these ratings. The raters discusseddifferences on the ratings and reached verbal agreements on how to score theremaining videos before proceeding with coding the final videos.

Inter-rater reliability was also calculated on the final ratings of 12 of the 55 videos(21.8%). Reliability of the dichotomous neutral ratings was 86%, and reliability of theLikert-scale ratings within one point was 95%.

Coding procedureDuring coding, the videos repeated until the rater stopped the playback. The

raters were instructed to focus on one area of the face at a time. To avoid misinter-preting one area of the face by viewing another area simultaneously, raters weretrained to cover noncoded features of the face on the computer screen with a sheetof paper as they rated specific regions. For example, because a smiling mouthmight affect the rater’s perception of the eyes, the mouth was covered while ratingthe eyes.

Because facial speech videos are variable in movement, such that the speaker’smouth might portray a slight smile while speaking, then a wider smile in between

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 11: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Figure 2. Facial Speech Coding Scale and coding guidelines. Six facial areas were dividedinto 11 features with a total of 35 characteristics to be rated.

Facial Characteristics of ID Speech Intent 565

words, the coders were instructed to choose the maximum level of intensity foreach characteristic that appeared throughout the entire video clip. Accountingfor the maximum level of intensity would provide greater differentiation of the

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 12: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

566 K. G. Shepard et al.

videos, while considering infants’ sensitivity to the stimuli. For example, if aspeaker ’s teeth were only slightly visible during the first second of the video butbecame more visible as she smiled at the end of the clip, teeth visibility was ratedas more visible. A visual guide to the coding scale was provided to the raters withpictures of examples of feature characteristics (e.g. a picture of rounded lips,furrowed eyebrows).

Analysis planThe FSCS datawere analysed using discriminant correspondence analysis (DiCA;

Abdi, 2007; for detailed description, see Williams, Abdi, French & Orange, 2010).DiCA analyses group variability in qualitative data by combining discriminantanalysis, which categorizes observations in a priori groups (e.g. the three ID andAD facial speech groups), and correspondence analysis, which analyses individualvariability in nominal data (Abdi, 2007). DiCA provided a visual depiction of thegroup differences based on the FSCS ratings, but it was first necessary to preprocessthe FSCS data to eliminate outliers and missing data.

Data preprocessingThe nine neutral characteristics were used to assess inter-rater reliability, as

aforementioned; the neutral properties were not analysed further, as DiCAaccounted for neutral qualities by analysing ratings of 0 on the remaining facialcharacteristics in each facial region. Two variables were removed from the dataset, because they were not used by raters (Wrinkled Nose) or used only once byone rater (Head Orientation Down).

Additionally, to attenuate the effect of a great amount of variation in only a fewvariables, which might inaccurately alter the results of the DiCA, some of theremaining 24 variableswere collapsed into fewer rating choices than the original fourchoices of 0, 1, 2 or 3. For example, only three of the 55 videos received a rating of 2 onthe Head Movement Up–Down variable, and none of the videos received a rating of 3.Thus, this variable was collapsed to ratings of just 0 or 1 (or ‘no head movementup–down’ or ‘slight head movement up–down’).

Similarly, five additional variables were collapsed to two rating choices (HeadOrientation to Side [0, 1], Sad Eyes [0, 2], Mouth Movement [1, 2], Lips Rounded [0, 1]and Lips Frowning [0, 1]). Thirteen of the variables were collapsed to three ratingchoices (HeadMovement to Side [0, 1, 2],HeadMovement Tilted [0, 1, 2],HeadOrientationTilted [0, 1, 2], Forehead Wrinkled [0, 1, 2], Eyebrows Furrowed [0, 1, 2], Eyebrows Moving[0, 1, 2], Eyes Squinting [0, 1, 2], Eyes Wide [0, 1, 2], Eyes Smiling [0, 1, 2] and ChinWrinkled [0, 1, 2]) because they did not receive a rating of 3 by one or more raters,which suggested inter-rater disagreement on the intensity of the characteristics, orthey did not receive a rating of 0 (Mouth Movement [1, 2, 3], Teeth Visible [1, 2, 3]and Chin Moving [1, 2, 3]).

The remaining five variables retained the original four rating choices of 0, 1, 2 and3 (Head Orientation Up, Eyebrows Raised, Iris Movement, Blinking and Lips Smiling).Four of the variables (Mouth Movement, Mouth Open, Teeth Visible and Chin Moving)were rated as ‘always present’ in the facial speech videos (i.e. the variables werenever rated as 0), which is likely due to the speechmovementsmade by the speakers.

A preprocessing step known as nominal coding was completed to performanalyses on the Likert-scale data. In this step, the Likert-scale data weretransformed to binary codes with 0s and 1s representing each coder’s rating foreach variable. For example, the Iris Movement responses ranged from 0 (Nomovement) to 3 (Much movement). These scores were transformed into binary codes,

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 13: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Facial Characteristics of ID Speech Intent 567

such that a score of 0 was represented by the values 1 0 0 0 and a score of 3 wasrepresented as 0 0 0 1.

The final data set included 24 nominalized variables or 71 variable rating choicesfor each rater (K, L and S). Thus, a total of 213 variable ratings were analysed in theDiCA. Note that each video (n=55) received three ratings (i.e. one from each rater).Each of the three ratings was included in the final analysis for two purposes: (1) toretain the ordinal-level Likert-scale choices as opposed to averaging the data, whichwould diminish the interpretation of the ratings, and (2) to retain each rater’sindividual responses, which accounts for slight variation in viewer’s interpretationof the speakers’ faces, while also assessing similarity in the faces despite such slightvariation.

Discriminant correspondence analysisDiscriminant correspondence analysis assessed group variability by analysing

the group counts for each variable. The analysis computed new variables calledfactor scores, and the factor scores were plotted onto a factor space that representedthe spatial relationship of the observations and variables. The first dimension, orprojection, accounted for the largest percentage of the variance explained by theratings of the videos; the second factor accounted for the additional independentportion of the total variance of the factor space. There were only two factors forthis data set, so the two dimensions accounted for the total variance of the factorspace.

To interpret the factors, observations (i.e. videos) and variables were identifiedas being important to each factor using a parameter called the contribution. If anelement contributed more than the expected value, the element was consideredto be a significant contributor to the factor. There were three a priori groups, soa group that contributed more than 1/3 (33.3%) to a factor was considered to bea significant contributor to that factor. Likewise, each of the 213 variables wasconsidered to be a significant contributor to a factor if its contribution was morethan 1/213 (0.5%).

Discriminant correspondence analysis provides descriptive analysis of the datausing a fixed-effect model, whereas other techniques are used to interpret theinferential analysis of the data (random effects). Specifically, the fixed-effect modelindicated how accurately the analysis classified the facial speech videos within thesample by assigning each observation to its closest facial speech group in thefactor space. A confusion matrix identified the number of observations that werecorrectly classified according to the model.

On the other hand, a random-effect model used the bootstrapping resamplingtechnique to show the stability and spread of the means of the groups by samplingeach observation with replacement many times (1000 times in this analysis) andplotting the results on a new factor space with 95% confidence intervals. If theconfidence ellipses did not overlap, the groups of facial speech videos wereconsidered to be significantly different from each other according to the model.Additionally, a jackknife, or leave-one-out, procedure provided additionalsignificance testing by evaluating how well the model generalized to the evaluationof new facial speech videos. In this technique, one video was removed at a time fromthe DiCA procedure, and the model was recreated with the missing video. Theremoved video was then projected into the new DiCA model, and the jackknifeprocedurewas repeated for each video in the original model. The jackknife confusionmatrix identified the number of observations that were correctly classified accordingto the newly created model.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 14: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

568 K. G. Shepard et al.

Results

Figure 3 displays the DiCAmap created from the analysis with barycenters, or centrepoints, of the three facial speech groups projected onto the factor space. Two factorswere produced from the analysis. The first factor contributed 64.2% of the varianceand appeared to differentiate ID facial speech groups based on facial characteristicsrelated to emotional expression. Thus, Factor 1 was named the Intent factor. Thesecond factor accounted for the remaining 35.8% of the variance and appeared todifferentiate eye and head movements. This factor was named the Movement factor.

Intent factorA total of 52 variables contributed above the expected contribution (or more than

0.5%) to the first factor, accounting for 84% of the contribution to Factor 1. Becauseeach of the three raters’ scores was included in the analysis, the variables weregrouped according to whether at least two of the raters’ variable ratings contributedabove the expected value. That is, if rater K’s variable rating for one facialcharacteristic (e.g. No Furrowed Brow, or a rating of 0) was above the expectedcontribution level, but rater L’s nor rater S’s ratings for the same variable ratingdid not contribute above the expected level, the variable rating was not includedas a significant contributor to the factor.

When the variables were grouped on the basis of at least two of the three ratersagreeing on the variables’ ratings, the final 15 grouped variables accounted for75% of the contribution to Factor 1. For example, separately, the variable SlightlyFurrowed Brow (a rating of 1) by rater K and rater S contributed 0.8% and 1.0%,respectively; when combined, the variable Slightly Furrowed Brow contributed 1.8%to the first factor. This variable and the remaining 14 variables are listed in Table 2and displayed in Figure 4, which is the DiCAmapwith significant contributing vari-ables to the Intent factor (horizontal projection) plotted onto the factor space.

The projected variables were related to facial expressions that may indicate eitherpositive or negative emotional valence, which could be indicative of the speakers’intent. Specifically, eyes rated as ‘smiling’ and ‘slightly wide’ were represented onthe positive valence side of the dimension (i.e. left side), whereas eyes rated as

Neutral AD Speech

Comforting ID Speech

Approving ID Speech

2 = 35.8%

1 = 64.2%

Figure 3. Discriminant correspondence analysis of facial speech videos. The facial speechvideos and barycenters of the groups have been projected as supplementary elements ontothe factor space.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 15: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Table 2. Variables contributing above average to the intent factor (grouped across raters)

Positive valence Negative valence

Variable Position Contribution (%) Variable Position Contribution

Eyes not sad �0.39 3.8 Brow slight furrow 0.96 1.8Eyes smiling �0.91 6.3 Brow furrow 1.25 4.3Eyes slight smile �0.66 1.6 Eyes sad 1.03 10.0Eyes slight wide �0.61 1.3 Eyes not smiling 0.71 5.9Lips smiling �0.87 6.1 Eyes squinting 1.27 2.0Lips great smile �1.04 3.9 Lips slight frown 1.00 5.4Some teeth �0.61 5.7 Lips not smiling 0.71 9.3

Few teeth 0.73 7.6Total contribution 28.7 Total contribution 46.3

Figure 4. Intent factor (horizontal projection) with the three facial speech groups (squaremarkers) plotted in relation to the 15 variables that contributed above the expected valuewith agreement between at least two raters. The closer proximity of variables to a facialspeech group indicates the greater likelihood of those variables to be observed in that groupof videos (e.g. lips that were smiling were likely to be observed in the approving ID videos,indicated by the close proximity of the lips smiling variable to the approving ID marker).

Facial Characteristics of ID Speech Intent 569

‘sad’ and brows as ‘furrowed’ fell on the opposing side of the dimension. Similarly,lips rated as ‘smiling’ or ‘greatly smiling’ were represented on the positive valenceside of Factor 1, whereas lips rated as ‘not smiling’ or ‘slightly frowning’ were onthe opposing end of the dimension. Teeth visibility increases with smiling, and in-deed, ’some teeth visible’ was on the positive end of the projection, whereas ’fewteeth visible’ was a significant contributor to the negative end.

Indeed, the facial speech videos that depicted the speakers’ intent were groupedaccording to the dichotomized emotional characteristics on the first projection. Thecontribution of the facial speech groups were such that the approving ID speechgroup accounted for 42.2% of the variance and the comforting ID speech group

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 16: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

570 K. G. Shepard et al.

accounted for 56.4% of the variance. The neutral AD speech category did not contrib-ute above the expected value to Factor 1, explaining only 1.4% of the variance for thatfactor, although the neutral quality of the speech fell within the centre or neutralrange of the Intent factor.

Movement factorA total of 65 variables contributed above the expected value to the second

factor, for a total of 77.6% of the contribution to the factor. As previously described,the contributing variables were grouped on the basis of whether at least two raters’variable ratings contributed above the expected value, which resulted in 14variables accounting for 43.3% of the contribution (or more than half of the totalabove expected contribution of 77.6%). For example, the variable Blinking Eyesby raters K, L and S contributed 2%, 1.6% and 2.5%, respectively; whengrouped, the variable contributed a total of 6.1% to the second factor. This variableand the remaining 13 variables are listed in Table 3, and Figure 5 illustrates theDiCA map with the Movement factor (vertical projection) and its contributingvariables.

The contributing variables were related to eye movements and head movementand orientation, especially as portrayed by speakers in the AD speech group,which accounted for 71.3% of the variance of the Movement factor, versuscontributions of 21.5% and 7.2% from the approving and comforting ID speechgroups, respectively. That is, the variables that indicated movement, such asHead Movement Tilting, were positioned closer to the AD speech group in theupper portion of the vertical projection. This portion of the projection accountedfor exaggerated movements of the head, more blinking of the eyes andgreater movement of the irises. In contrast, the lower portion of the dimensionincluded ratings of no head movement, less blinking and less iris movement.The projection also differentiated head orientation, with tilting and side-turnedhead plotted in the top portion and head oriented upward plotted in thebottom portion.

Table 3. Variables contributing above average to the movement factor (grouped acrossraters)

Less movement Greater movement

Variable Position Contribution (%) Variable Position Contribution (%)

Not blinking �0.61 5.5 Brow moving 0.47 1.2Iris not moving �0.33 4.2 Blinking 0.49 1.8Headmove nottilting

�0.22 1.6 Great blinking 0.94 6.1

Headorient slightup

�0.60 3.2 Iris moving 0.80 4.2

Headorient up �0.69 2.0 Headmove tilting 0.87 4.1Headmove slightup–down

0.40 1.9

Headorient slight side 1.01 3.6Headorient tilting 0.49 1.3Headorient not up 0.25 2.6

Total contribution 16.5 Total contribution 26.8

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 17: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Figure 5. Movement factor (vertical projection) with the three facial speech groups (squaremarkers) plotted in relation to the 14 variables that contributed above the expected value withagreement by at least two raters. The closer proximity of variables to a facial speech groupindicates the greater likelihood of those variables to be observed in that group of videos.

Facial Characteristics of ID Speech Intent 571

Statistical significance

Fixed-effect model. The confusion matrix in Table 4 shows the number of videos thatwere correctly classified into the predefined facial speech groups by the DiCAmodel.Each of the 20 approving and comforting ID speech videoswere classified correctly bythe model, whereas only one of the 15 AD speech videos was misclassified as a com-forting ID speech video on the basis of the facial speech ratings. The 95% toleranceintervals drawn as ellipses around the facial speech videos in Figure 3 illustrate theseparability of the three facial speech groups. Slight overlap of the neutral AD andcomforting ID groups suggest that these two groups were less separable from eachother, perhaps because of similarity in neutral and saddened facial features. Regard-less, the fixed-effect model was very accurate in differentiating the three groups.

Random-effect model.Using sampling with replacement, the bootstrapping techniquedeterminedwhether the group differences indicated by theDiCAanalysis were statis-tically significant. According to the bootstrapping analysis, the three facial speechgroups were statistically differentiated by the model. Figure 6 illustrates results ofthe bootstrapping with 95% confidence ellipses. The ellipses did not overlap, whichillustrated that the facial speech groups were statistically different from each other,p< 0.05, based on the random-effect model.

Additionally, the jackknifing procedure assessed the accuracy of the DiCAmodel in assigning new observations (or videos) within our sample. Results ofthe jackknifing procedure suggest that the model was accurate in assigning videosto the three facial speech groups. As displayed in Table 5, all but one of theapproving ID speech videos were correctly classified by the random-effect model.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 18: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Figure 6. Bootstrapping results with 1000 new observations plotted and encircled by 95%confidence ellipses.

Table 4. Fixed-effect model: discriminant correspondence analysis assignment of videoswithin the sample to the approving infant-directed (ID), comforting ID and adult-directed(AD) speech groups

Assigned group

Actual group

Approving ID Comforting ID AD speech

Approving ID 20 0 0Comforting ID 0 20 1AD speech 0 0 14

572 K. G. Shepard et al.

In the comforting ID speech group, only three of the 20 videos (15%) were misclas-sified, whereas four of the 15 AD speech videos (27%) were misclassified. Five ofthe seven misclassified videos in these two groups were incorrectly classified asapproving ID videos.

The random-effect model was more accurate at differentiating approving IDvideos from the other two facial speech groups, whereas the comforting ID andneutral AD videos that were misclassifiedwere likely to be classified as approvingID videos. The approving ID group tended to portray more positive facial fea-tures, such as smiling eyes and lips (see Figure 4); if a comforting ID or neutralAD video was to receive a rating of having smiling eyes or lips, the DiCA modelmay have misclassified the videos into the approving ID group. Alternatively, it isless likely that an approving ID video would portray frowning lips or sad eyes be-cause of the nature of the intended message (i.e. to provide positive reinforcementor encouragement), which is explained by the approving ID videos not beingmisclassified as comforting ID videos.

In general, the DiCA model created by the analysis was accurate in assigningthe sample of 55 videos to their respective facial speech groups (fixed-effectresults). Further, the random-effect model indicated relatively good separability(bootstrapping) and accuracy (jackknifing) of assigning new samples of videosto the groups.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 19: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Table 5. Random-effect model: discriminant correspondence analysis assignment of newvideos to the a priori groups

Assigned group

Actual group

Approving ID Comforting ID AD speech

Approving ID 19 2 3Comforting ID 0 17 1AD speech 1 1 11

ID, infant directed; AD, adult directed.

Facial Characteristics of ID Speech Intent 573

CONCLUSIONSThe purpose of this experiment was to determine whether specific facialcharacteristics distinguished approving ID, comforting ID and neutral AD facialspeech. Results supported the differentiation of the three groups based on the DiCAmodel. As illustrated by the Intent factor of the DiCA model, the FSCS ratingsdifferentiated the meaningful ID facial speech videos on the basis of facialcharacteristics portraying positive and negative emotional valence, such as smilinglips and eyes in the approving videos and frowning lips and sad eyes in thecomforting videos. Neutral AD speech was distinguished from the ID groups onthe basis of more general movements, which was captured by the Movement factorof the DiCA model.

The approving ID videos were expected to portray raised eyebrows, wide eyes,smiling eyes and lips and greater teeth visibility than the comforting ID videos; theseexpectations were supported, as the brows, eyes, lips and teeth visibility differed byID group on the Intent factor. However, the approving ID videos were also expectedto portray more up–down head movements, but head movement was not capturedby the first projection of the DiCA model. The comforting ID group expectationswere also supported, including sad eyes and rounded and frowning lips, althoughside-to-side head movements were not characteristic of the comforting group.

Unexpectedly, the AD facial speech group, but not the ID facial speech groups,was characterized by more movements, such as blinking, iris movement and tiltingmotions of the head (see Table 3, Figure 5). These movements were contributors tothe Movement factor of the DiCA model, which differentiated the AD group fromthe ID groups. The greater movement portrayed during AD interactions may haveresulted from at least two possibilities. First, previous research has found a positivecorrelation between speech rate and head movement in adult–adult conversation(Hadar, 1991). The adult speakers in the AD facial speech videos were likely to bespeaking at faster rates than the ID speakers, because ID speech is typically spokenmore slowly than AD speech (Fernald & Simon, 1984). The faster rate of speechduring AD speech may have resulted in more head movement or movements thatwere more easily perceived by raters in faster speech than in the slower ID speech.

Secondly, and as a limitation to the current study, the video stimuli used in thisstudy were selected from two broader stimulus sets, and each stimulus set wascollected in a different manner. Speaking clips from the AD set were recorded duringlive adult–adult interactions, whereas the ID clips were recorded while womenspoke to infants depicted in a video. The ID speakers may have moved less thanthe AD speakers, because their communication attempts, including body language,were not directly perceived by a live human being. However, the ID videos couldnot be created in the same manner as the AD videos because of the interference of

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 20: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

574 K. G. Shepard et al.

a live infant’s vocalizations and/or crying. Additionally, it is important to note thatthe ID stimuli were not originally collected for the purpose of comparing with theAD stimuli but rather were intended for use in infant categorization studies ofapproving and comforting ID speech. Future work concerned with these differencesmay choose to eliminate them by ensuring all stimuli are generated under similarconditions, perhaps during live face-to-face interactions to capture ecologically validstimuli. However, the DiCA model supported the primary aim of the experimentby successfully differentiating the two ID speech intents based on specific facialcharacteristics. The approving and comforting ID speech videos were generatedunder the same parameters, and facial characteristics alone and not methodologicalconfounds accounted for the reported discriminations of the ID speech intents.

Although the current study indicates that facial characteristics produced duringID speech of varying intent can be differentiated from those used in neutral ADspeech, future work may benefit from assessing differences between approvingand comforting AD speech in comparison with both neutral AD speech and themeaningful ID speech groups used in the current study. Specific ID and AD intentdifferences within a category (e.g. approving) would likely elicit differences in thedegree of variability in facial characteristics, with the expectation that ID faceswould be more exaggerated than AD faces (Swerts & Krahmer, 2010).

The significant findings from the current study demonstrate utility of the FSCSfor future sets of dynamic facial speech stimuli, although further fine tuning of therating scale may be warranted; as mentioned in the data preprocessing steps, someratings were not applicable to the facial speech videos or not mutually understoodby the raters. Additionally, this experiment extended the use of DiCA by applyingthe technique to the development of a coding scale, as well as capturing minutedifferences in stimuli, which may be useful in future work. The FSCS is availableupon request from the corresponding author.

GENERAL DISCUSSION

The current set of experiments sought to illuminate differences in adults’ facialmovements during their interactions with infants. Experiment 1 provided the basisfor adults’ distinction of silent ID facial speech that depicted different intents. Adultsnot only distinguished the approving and comforting ID intents from each other butalso accurately discriminated approving and comforting messages from neutral ADfacial speech. However, the adult raters in the first experiment may have used lipmovement cues to interpret the speaker’s message, as opposed to relying solely onintent-related facial characteristics. Thus, Experiment 2 further assessed the specificfacial characteristics that may have supported the differentiation of the silent videos.

As demonstrated by the results of Experiment 2, the ID face may provide visualcues about the emotion or intent of the speaker through varying facial movements,which suggests additional functions of ID speech that have not previouslybeen reported. Although it is well established that adults differentiate speakers’communicative intent using prosodic cues alone (Fernald, 1989), the current studyextends upon these findings by reporting that adults also differentiate intent usingonly visual cues. Additionally, this experiment successfully builds upon the workof Chong and colleagues (2003) by quantifying differences in ID faces based on thespeakers’ intent.

The current results suggest that specific facial expressions may be dependent on thespeakers’ intent, which is consistentwith the conclusions drawn byChong and her col-leagues. The main conclusion to be drawn from the current set of findings is that facial

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 21: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Facial Characteristics of ID Speech Intent 575

characteristics, above and beyond the auditory speech signal, may depict speakers’intent, whichmay facilitate infants’ prelinguistic perception of caregivers’ communica-tive attempts. This process may have important developmental ramifications.

During early social-communicative development, infants may use a combinationof the audible speech signal and the visual components of ID speech, includingemotive feature displays (e.g. smiling lips) as well as speech-specific movements(e.g. lip rounding). For example, Chong and her colleagues (2003) noted thatmothers’ facial expressions were almost always accompanied by vocalizations,which suggests that infants’ interactions with their mothers are rarely unimodal.Thus, the visual information depicted by the ID face may prove to be moremeaningful than classical ID speech work might suggest, especially considering thatinfants’ earliest social encounters are likely to be multimodal, including both visualand auditory stimulation from the caregiver (McCartney & Panneton, 2005;Walker-Andrews & Bahrick, 2001). A focus on the visual properties of ID speechmay encourage the revisiting of previous infant perception work that relied heavilyon the acoustic signal alone, with the inclusion of ecologically valid stimuli thatexamines infants’ perception of the audiovisual components of speech interactions.

Infants’ experience with certain facial expressions may facilitate their ability torecognize those expressions, such as the finding that 3-month-olds’ preference forsmiling faces was related to their mothers’ encouragement of their attention to herown smiling face at home (Kuchuk, Vibbert & Bornstein, 1986). Indeed, at this youngage, infants may be more sensitive to familiarity than emotional valence whenlearning to discriminate facial expressions (Walker-Andrews, Krogh-Jespersen,Mayhew & Coffield, 2011). Between the ages of 4 and 9months, however, infantsare able to discriminate still images of anger, surprise, fear, happy and neutral facialexpressions portrayed by unfamiliar faces, and their behavioural responses maydiffer when viewing certain expressions (Serrano, Iglesias & Loeches, 1992, 1995),perhaps indicating an early understanding of affective meanings in faces. By theage of 12months, infants are able to incorporate both vocal and facial emotional cuesproduced by theirmothers during a visual cliff task, although responses to facial cuesalonewere not as sufficient as themultimodal cues in affecting the infants’ behaviour(Vaish & Striano, 2004). The coupling of facial emotional expressions with vocalexpressions of emotion, as portrayed during ID speech, may contribute to suchemotional competence in infants.

Infants’ ability to detect the visual distinctions in facial speech intents wouldfurther support Walker-Andrews’ (1986) finding that infants match vocal affect withfacial affect. Facial characteristics may enhance the speech signal by portraying thespeakers’ emphasis or intent in a multimodal output that is both seen and heard,which, according to the intersensory redundancy hypothesis (Bahrick et al., 2004),may enhance infant attention and learning. The intersensory redundancy ofemotional qualities of ID facial speech may facilitate social–emotional developmentas the infant learns and, later, mimics the affective properties of the voice and face.

Whereas the ability to integrate auditory and visual social information supportsinfants’ development as sophisticated social beings, this audiovisual integrationmay be less accessible to infants from clinical populations, such as those withdepressed mothers who may be less socially expressive or interactive. Depressedmothers’ production of ID speech is less variable in pitch (Kaplan, Bachorowski,Smoski & Zinser, 2001), which raises the question concerning the degree towhich their production of ID facial expressions is similarly muted. If infantsof depressed mothers typically hear an impoverished auditory ID signal, itmay be the case that the visual cues depicted by their mothers’ faces are alsolacking in ID quality.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 22: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

576 K. G. Shepard et al.

Likewise, infants with hearing impairments or other developmental delays, suchas speech delay, may depend more greatly on the presence of redundant sensoryinformation provided by both the face and voice, as their perception of nonlinguisticcues may be of greater importance than the linguistic content of the speech. Whereasfuture work should focus on detailing the relative importance and perception of theID face in typically developing infants, additional avenues for research includeclinical populations in which the redundancy of audiovisual information providedby the ID face may facilitate the interpretation of caregivers’ messages. The speechsignal may become more salient when accompanied by engaging, emotive facialcharacteristics, and the accompaniment of affective facial qualities with the voicemay provide stimulating face-to-face interactions that encourage infants’ social-communicative development. Alternatively, future work should assess whetherintent-specific facial modifications, such as those identified in Experiment 2, aredisplayed during both speech and nonspeech interactions with infants. It may bethe case that adults’ facial expressions differ systematically during ID interactions,in the absence of speech.

By reporting that ID speech is accompanied by facial characteristics that conveydistinct affective messages independent of auditory information, the current studyhighlights an additional communicative component of ID speech that warrantsfuture investigation in both typically and atypically developing populations.Experimenters are encouraged to investigate how facial properties expressed duringID speech of varying intent are processed by infants of different ages and whetherthese properties facilitate aspects of social, emotional and linguistic development.

ACKNOWLEDGEMENTS

We are grateful for the contributions of the thesis committee, including Alice O’Tooleand Christine Dollaghan, and to Hervé Abdi and Derek Beaton for their guidancewith data analysis. We would also like to thank Kristin Atchison, Lindsey Collins,Lisa Keylon, Sarah Salomon and Kaia Wakamiya for their assistance in datacollection.

REFERENCES

Abdi, H. (2007). Discriminant correspondence analysis. In N. Salkind (Ed.), Encyclopedia ofmeasurement and statistics. Thousand Oaks (CA): Sage.

Atchison, K. K. (2010). Development of infant-directed speech categorization: Effects offacial vocal synchrony. Dissertation Abstracts International: Section B: The Sciences andEngineering, 71(1-B), 690.

Atchison, K. K., Spence, M. J., & Touchstone, E. W. (2009, April). Categorization ofsynchronous infant-directed speech by 4- and 6-month-old infants. Paper presented atthe biennial meeting for the Society of Research in Child Development, Denver, CO.

Bahrick, L. E., Lickliter, R., & Flom, R. (2004). Intersensory redundancy guides the developmentof attention, perception, and cognition in infancy. Current Directions in Psychological Science,13(3), 99–102.

Blossom,M., &Morgan, J. L. (2006). Does the face saywhat themouth says? A study of infants’sensitivity to visual prosody. Paper presented at the 30th annual Boston Universityconference on language development, Somerville, MA.

Brand, R. J., Baldwin, D. A., & Ashburn, L. A. (2002). Evidence for ‘motionese’: Modificationsin mothers’ infant-directed action. Developmental Science, 5(1), 72–83.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 23: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

Facial Characteristics of ID Speech Intent 577

Burnham, D., & Dodd, B. (2004). Auditory-visual speech integration by prelinguistic infants:Perception of an emergent consonant in the McGurk effect. Developmental Psychobiology,45(4), 204–220.

Chong, S. C. F., Werker, J. F., Russell, J. A., & Carroll, J. M. (2003). Three facial expressionsmothers direct to their infants. Infant and Child Development, 12, 211–232.

Cooper, R., Abraham, J., Berman, S., & Staska, M. (1997). The development of infants’preference for motherese. Infant Behavior & Development, 20(4), 477–488.

Ekman, P., & Friesen, W. V. (1978). Facial Action Coding System: Investigator’s guide. Parttwo: A technique for the measurement of facial movement. Palo Alto, CA: ConsultingPsychologists Press.

Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant Behavior &Development, 8(181–195).

Fernald, A. (1989). Intonation and communicative intent in mothers’ speech to infants: Is themelody the message? Child Development, 60, 1497–1510.

Fernald, A. (1992). Meaningful melodies in mothers’ speech to infants. In H. Papoušek,U. Jurgens & M. Papoušek (Eds.), Nonverbal vocal communication: Comparative anddevelopmental approaches (pp. 262–282). New York: Cambridge University Press.

Fernald, A. (1993). Approval and disapproval: Infant responsiveness to vocal affect infamiliar and unfamiliar languages. Child Development, 64, 657–674.

Fernald, A., & Simon, T. (1984). Expanded intonation contours in mothers’ speech tonewborns. Developmental Psychology, 20(1), 104–113.

Golinkoff, R. M., & Alioto, A. (1995). Infant-directed speech facilitates lexical learning inadults hearing Chinese: Implications for language acquisition. Journal of Child Language,22(3), 703–726.

Graf, P. H., Cosatto, E., Strom, V., & Huang, F. J. (2002). Visual prosody: Facial movementsaccompanying speech. Paper presented at the fifth IEEE international conference onautomatic face and gesture recognition, Washington, D.C.

Green, J. R., Nip, I. S. B., Wilson, E. M., Mefferd, A. S., & Yunusova, Y. (2010). Lip movementexaggerations during infant-directed speech. Journal of Speech, Language, and HearingResearch, 53, 1529–1542.

Hadar, U. (1991). Body movement during speech: Period analysis of upper arms and headmovement. Human Movement Science, 10, 419–446.

Hirsh-Pasek, K., & Golinkoff, R. M. (1996). The origins of grammar: Evidence from earlylanguage comprehension. Cambridge, MA: MIT Press.

Jacobson, J. L., Boersma, D. C., Fields, R. B., & Olson, K. L. (1983). Paralinguistic features ofadult speech to infants and small children. Child Development, 54, 436–442.

Kaplan, P. S., Bachorowski, J.-A., Smoski, M. J., & Zinser, M. (2001). Role of clinical diagnosisand medication use in effects of maternal depression on infant-directed speech. Infancy,2(4), 537–548.

Kitamura, C., & Burnham, D. (2003). Pitch and communicative intent in mother’s speech:Adjustments for age and sex in the first year. Infancy, 4(1), 85–110.

Kitamura, C., Thanavishuth, C., Burnham, D., & Luksaneeyanawin, S. (2002). Universalityand specificity in infant-directed speech: Pitch modifications as a function of infant ageand sex in a tonal and non-tonal language. Infant Behavior & Development, 24, 372–392.

Kuchuk, A., Vibbert, M., & Bornstein, M. H. (1986). The perception of smiling and itsexperiential correlates in three-month-old infants. Child Development, 57, 1054–1061.

McCartney, J. S., & Panneton, R. (2005). Four-month-olds’ discrimination of voice changes inmultimodal displays as a function of discrimination protocol. Infancy, 7(2), 163–182.

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748.Meyer, M., Hard, B., Brand, R. J., McGarvey, M., & Baldwin, D. A. (2011). Acousticpackaging: Maternal speech and action synchrony. IEEE Transactions on AutonomousMental Development, 3(2), 154–162.

Moore, D. S., Spence, M. J., & Katz, G. S. (1997). Six-month-olds’ categorization of naturalinfant-directed utterances. Developmental Psychology, 33(6), 980–989.

O’Toole, A. J., Harms, J., Snow, S. L., Hurst, D. R., Pappas, M. R., Ayyad, J. H. (2005). Avideodatabase of moving faces and people. IEEE Transactions on Pattern Analysis and MachineIntelligence, 27(5), 812–816.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd

Page 24: Distinct Facial Characteristics Differentiate ... · Distinct Facial Characteristics Differentiate Communicative Intent of ... infants during early ... Distinct Facial Characteristics

578 K. G. Shepard et al.

Papoušek, M., Bornstein, M. H., Nuzzo, C., Papoušek, H., & Symmes, D. (1990). Infantresponses to prototypical melodic contours in parental speech. Infant Behavior & Development,13, 539–545.

Papoušek, M., Papoušek, H., & Symmes, D. (1991). The meanings of melodies in motheresein tone and stress languages. Infant Behavior & Development, 14(4), 415–440.

Serrano, J. M., Iglesias, J., & Loeches, A. (1992). Visual discrimination and recognition offacial expressions of anger, fear, and surprise in 4- to 6-month-old infants. DevelopmentalPsychobiology, 25(6), 411–425.

Serrano, J. M., Iglesias, J., & Loeches, A. (1995). Infants’ responses to adult static facialexpressions. Infant Behavior & Development, 18, 477–482.

Spence, M. J., & Moore, D. S. (2003). Categorization of infant-directed speech: Developmentfrom 4 to 6 months. Developmental Psychobiology, 42(1), 97–109.

Swerts, M., & Krahmer, E. (2010). Visual prosody of newsreaders: Effects of informationstructure, emotional content, and intended audience on facial expressions. Journal ofPhonetics, 38, 197–206.

Thiessen, E. D., Hill, E. A., & Saffran, J. R. (2005). Infant-directed speech facilitates wordsegmentation. Infancy, 7(1), 53–71.

Trainor, L. J., Austin, C. M., & Desjardins, R. N. (2000). Is infant-directed speech prosody aresult of the vocal expression of emotion? Psychological Science, 11(3), 188–195.

Vaish, A., & Striano, T. (2004). Is visual reference necessary? Contributions of facial versusvocal cues in 12-month-olds’ social referencing behavior. Developmental Science, 7(3),261–269.

Walker-Andrews, A. S. (1986). Intermodal perception of expressive behaviors: Relation ofeye and voice? Developmental Psychology, 22(3), 373–377.

Walker-Andrews, A. S., & Bahrick, L. E. (2001). Perceiving the real world: Infants’ detectionof and memory for social information. Infancy, 2(4), 469–481.

Walker-Andrews, A. S., Krogh-Jespersen, S., Mayhew, E. M. Y., & Coffield, C. N. (2011).Young infants’ generalization of emotional expressions: Effects of familiarity. Emotion,11(4), 842–851.

Werker, J. F., & McLeod, P. J. (1989). Infant preference for both male and female infant-directed talk: A developmental study of attentional and affective responsiveness.Canadian Journal of Psychology, 43(2), 230–246.

Williams, L. J., Abdi, H., French, R., & Orange, J. B. (2010). A tutorial on Multi-BlockDiscriminant Correspondence Analysis (MUDICA): A new method for analyzingdiscourse data from clinical populations. Journal of Speech, Language, and Hearing Research,53, 1372–1393.

Copyright © 2012 John Wiley & Sons, Ltd. Inf. Child Dev. 21: 555–578 (2012)DOI: 10.1002/icd


Recommended