Speech Intelligibility The focus of this discussion will be on the measurement of speech intelligibility for clinical populations such as: dysarthric

Speech Intelligibility

The focus of this discussion will be on the measurement of speech intelligibility for clinical populations such as: dysarthric speakers, deaf and hearing impaired speakers, kids or adults with speech sound disorders, and speakers of English as a second language. For these kinds of speakers, a case can be made that speech intelligibility is the single most important measure. The central purpose of speech is to convey information from the speaker to the listener. This requires that the words uttered by the speaker are recovered accurately by the listeners.

Speech intelligibility is not the only thing that matters the naturalness of speech, for example, is also quite important. But a good case can be made that speech intelligibility is of central importance.

Definitions of intelligibility [emphasis added]: The quality of language that is comprehensible. http://www.thefreedictionary.com/speech+intelligibility http://www.thefreedictionary.com/speech+intelligibility The term intelligibility refers to 'speech clarity' or the proportion of a speaker's output that a listener can readily understand. www.speech-language- therapy.com/index.php?option=com_content&view=article&id=29:admin&catid=11:admin&Itemid=117 Intelligibility is a measure of how comprehensible speech is, or the degree to which speech can be understood. Intelligibility is affected by spoken clarity, explicitness, lucidity, comprehensibility, perspicuity, and precision. https://en.wikipedia.org/wiki/Intelligibility_(communication) Degree to which the speakers intended message is recovered by the listener (Kent et al., 1989, Journal of Speech and Hearing Disorders, 54, 482-499).

Start by taking a close look these definitions: Point #1: Does an utterance need to be understandable or comprehensible to be intelligible? Example 1: Colorless green ideas sleep furiously. Do you understand what it means? If this utterance were to be spoken clearly by a non-disordered, native English speaker and correctly transcribed by a neurologically intact native speaker in a quiet listening environment, would it be reasonable to say that the utterance was intelligible? In my opinion, the answer is yes.

Example 2: The velocity function is the 1 st derivative of the displacement function; the acceleration function is the 2 nd derivative of the displacement function. Assume that this utterance is spoken clearly in a quiet listening environment, and that it was accurately transcribed by a listener? Would the utterance necessarily be understood? Maybe, maybe not. Lets assume not. Would it be reasonable to say that the utterance is intelligible? I say yes.

Point #1 continued Example 3: Imagine that we asked listeners to transcribe (or repeat) nonsense syllables; e.g. ba, foo, blop, poot, These utterances are not comprehensible this is why they are called nonsense syllables. If listeners are able to repeat these utterances accurately, would it be reasonable to say that they are intelligible? In my opinion, the answer is yes. Moral: The term that should be used in these definitions is RECOGNITION, not understandability or comprehensibility do listeners recognize the speech sounds that are spoken?

Aside: Nonsense utterances are sometimes used to test intelligibility. Why? Nonsense utterances directly test the intelligibility of speech with almost no influence of language no syntax, no semantics, no lexicon. (The only part of the language system that comes into play is phonology the test utterances conform to English phonotactic* rules; e.g., utterances like svek [sv k], ngah [ ], or bih [b ] are not used because they violate English phonotactic rules.)

*Aside: For those who are not familiar with the concept of a phonotactic rule: Phonotactic rules are one type (out of three) of phonological rule. Phonotactic rules specify permissible and impermissible combinations of speech sounds. They are language specific, and all languages have them. Some examples of English phonotactic rules: English words cannot begin with / t/; e.g., stot /st t/ is not an English word, but it could be. On the other hand, a word such as shtot / t t/ is not permitted; i.e., it violates an English phonotactic rule. English words can begin with /m/ or /n/, but they cannot begin with //. English words cannot end in lax vowels (e.g., / /, / /, / /). For example, /di/ is a possible English word (/i/ is a tense vowel), but not /d / (a lax vowel); /fu/ is a possible English word (/u/ is a tense vowel), but not /f /, etc.

Point #1, the bottom line: I am arguing against the definitions of intelligibility that include understanding or comprehension. So, what definition should be used? My opinion is that for applications in this field we need a definition that focuses explicitly on the transmission of SPEECH information (i.e., not language and not meaning). My proposal: An utterance is intelligible to the degree that the speaker and the listener agree on what was said. Intelligibility is maintained to the degree that the listener recognizes the words and/or speech sounds that were intended by the talker.

For SLPs who work with dysarthric speakers, deaf speakers, kids with speech-sound disorders, etc., it is not a crazy idea to assume that variations in intelligibility arise mainly from the person who is talking. But, that does not mean that the listener plays no role in explaining variations in intelligibility. More soon.

Note: This concept that intelligibility requires agreement between the talker and the listener is a simple but important idea. Quiz: If the talker intends to say one thing but the listener hears something else, what might be responsible for the error? a.the speaker b.the listener c.the transmission channel (room acoustics, electronics, etc.) d.It is not possible to know. e.all of the above f.some combination of the above g.all of the below

Point #2: Does speech intelligibility characterize: (a) the speaker, (b) the listener, or (c) the transmission channel (room acoustics and any electronics between the speaker & listener more on this soon) ? Short answer: Yes. This is a pretty big deal, so well spend a little time talking about.

The 1 st scientists to take a serious interest in speech intelligibility were not phoneticians or SLPs. They were communications engineers the folks at Bell Labs. Their problem: talker > telephone system > listener Stated more generally: talker > transmission channel > listener Now, the reason for all this: If the communication engineer finds that intelligibility isnt ideal, he/she will assume that the problem: a.the talker b.the transmission channel c.the listener

The phone system (as it looked in the 1 st half of the 20 th century its way more complicated now): talker > mic > amp > BP filter (~300-4000 Hz) > conversion to FM radio signal > more amps > many miles of cable > switching network > more amps > more cables > conversion from FM signal back to sound > another amp > earphone > listener The phone company is interested in the stuff in between the talker and the listener the transmission channel.

Short form of the telephone system: talker > transmission channel > listener Now, the reason for all this: If the communication engineer finds that intelligibility isnt ideal, he/she will assume that the problem: a.The talker b.The transmission channel c.The listener

Recall the question we started with: Does speech intelligibility characterize: (a) the talker, (b) the listener, or (c) the transmission channel? To the communications engineer, the answer is (c) the transmission channel. Why? (1) The talker and listener are unremarkable (i.e., ordinary talker, ordinary listener), and (2) the transmission channel is the only part of the system they have any control over.

How does an SLP answer the same question? To the SLP, does speech intelligibility characterize: (a) the speaker, (b) the listener, or (c) the listening environment/transmission channel (explanation soon)? For an answer, lets look again at one of the definitions of intelligibility we saw earlier: Intelligibility is affected by spoken clarity, explicitness, lucidity, comprehensibility, perspicuity, and precision. Does this description assume that intelligibility characterizes the speaker, the listener, or the transmission channel?

It absolutely does: The terms spoken clarity, explicitness, lucidity, comprehensibility, perspicuity, and precision all refer explicitly to the talker, not the listener, not the communication channel. The assumption is made that: (1) the communication channel is unremarkable (e.g., a quiet room, live voice of very simple electronics), and (2) the listener is unremarkable (normal hearing adult). These are not crazy ideas. Q: Does this mean that the listener doest matter? A: No.

The most important idea that needs to be understood here is a pretty one: The listeners familiarity with the talker can make a big difference. Listeners are remarkably good at adapting or accommodating to speech that is distorted in a variety of different ways. There is all kinds of excellent literature on this topic, but ordinary experience with daily life is enough to make the point.

a. Gay Bardino b. You need nukin. Example 1: The speech of very young kids.

Example 2: Accented speech the case of Jimmy Brinegar. http://www.deeplake.com/content/sounds/koth/boomhauer/dogs.wav http://www.deeplake.com/content/sounds/koth/boomhauer/dogs.wav http://www.deeplake.com/content/sounds/koth/boomhauer/seinfeld.wav Example 3: Speech distorted by bad electronics. Moral: The goal is to find a measure of speech intelligibility that characterizes the talker, but the listener does matter. Why does the ability of listeners to accommodate to atypical speech matter? If intelligibility improves throughout the course of treatment, who is getting better the talker or the listener? Its a very simple problem. Solution? Ask someone other than the clinician to do the listening. Practical??

Two very different general approaches are used to assess speech intelligibility: 1.Subjective estimates made by clinicians (by far the most common) 50% intelligible, 80% intelligible,... Kent calls this method scaling (e.g., Kent et al., 1989, JSHD, 54, 482-499). 2.Objective measurement usually the percentage of words that are accurately recognized by a listener. Kent calls this method item identification (Kent et al., 1989). Subjective estimates are easy to make, but: (a) reliability is imperfect, (b) estimates can change as clinicians become more familiar with the talker. (The same can be true of item identification, depending on how its done).

What should be used as speech material? Big surprise: (a) there are many choices, (b) the choice of speech material matters. 1. Conversational speech: From one point of view, this is a great choice its exactly what you want to know. How well would the talker be expected to do in ordinary conversation? Obvious problem: The topic of conversation will vary all over the place, making it impossible to get an intelligibility measure that is standardized in any way either across different clients or even within the same talker across time.

2. Words: Standard word lists can be used. There are many of these word lists available. Word lists are very easy to score and, of course, they are standard across talkers. Some word intelligibility tests provide multiple word lists with equivalent intelligibility. This is a big deal: Listeners may still be adapting to the speech of the talker, but at least they will not be as likely to learn the word lists. (This can still present a problem if the word lists are used frequently.)

3. Sentences: Standard lists of sentences can be also be used. There are many of these available as well. Sentences can be very useful since there are some talkers who can speak intelligibly with single words but may have greater difficulty with more complicated utterances.

Effects of Predictability Speech is massively redundant, which means that listeners do not need to catch every little acoustic-phonetic detail in order to recognize what is being said. This applies to both words and sentences, but especially to sentences. All else being equal, as predictability goes up intelligibility goes up. Bone-headed simple example: Mary had a little [wildly distorted something-or-other]. Theres no mystery about the missing word.

Striking demo from Warren (Science, 167, 3923933). The state governors met with their respective legislatures convening in the capital city. Warren entirely deleted one of the speech sounds (the [s] of legislatures) and replaced it with a cough. The [s] was gone. Out of 20 listeners, 19 did not notice that anything was missing; one listener thought that a sound was missing but guessed wrong about which one. Q: How did listeners hear a sound that wasnt there? A: Their brains created it.

What is the relevance of this to intelligibility testing? Pretty simple: One speaker is 75% intelligible, another is 50% intelligible, but different sentence intelligibility tests were used. Does that comparison mean anything? Its hard to know for sure, but probably not. Example: HINT sentences: TIMIT sentences:

Documents

Speech Intelligibility The focus of this discussion will be on the measurement of speech intelligibility for clinical populations such as: dysarthric