The focus of this discussion will be on the measurement of
speech intelligibility for clinical populations such as: dysarthric
speakers, deaf and hearing impaired speakers, kids or adults with
speech sound disorders, and speakers of English as a second
language. For these kinds of speakers, a case can be made that
speech intelligibility is the single most important measure. The
central purpose of speech is to convey information from the speaker
to the listener. This requires that the words uttered by the
speaker are recovered accurately by the listeners.
Slide 4
Speech intelligibility is not the only thing that matters the
naturalness of speech, for example, is also quite important. But a
good case can be made that speech intelligibility is of central
importance.
Slide 5
Definitions of intelligibility [emphasis added]: The quality of
language that is comprehensible.
http://www.thefreedictionary.com/speech+intelligibility
http://www.thefreedictionary.com/speech+intelligibility The term
intelligibility refers to 'speech clarity' or the proportion of a
speaker's output that a listener can readily understand.
www.speech-language-
therapy.com/index.php?option=com_content&view=article&id=29:admin&catid=11:admin&Itemid=117
Intelligibility is a measure of how comprehensible speech is, or
the degree to which speech can be understood. Intelligibility is
affected by spoken clarity, explicitness, lucidity,
comprehensibility, perspicuity, and precision.
https://en.wikipedia.org/wiki/Intelligibility_(communication)
Degree to which the speakers intended message is recovered by the
listener (Kent et al., 1989, Journal of Speech and Hearing
Disorders, 54, 482-499).
Slide 6
Start by taking a close look these definitions: Point #1: Does
an utterance need to be understandable or comprehensible to be
intelligible? Example 1: Colorless green ideas sleep furiously. Do
you understand what it means? If this utterance were to be spoken
clearly by a non-disordered, native English speaker and correctly
transcribed by a neurologically intact native speaker in a quiet
listening environment, would it be reasonable to say that the
utterance was intelligible? In my opinion, the answer is yes.
Slide 7
Example 2: The velocity function is the 1 st derivative of the
displacement function; the acceleration function is the 2 nd
derivative of the displacement function. Assume that this utterance
is spoken clearly in a quiet listening environment, and that it was
accurately transcribed by a listener? Would the utterance
necessarily be understood? Maybe, maybe not. Lets assume not. Would
it be reasonable to say that the utterance is intelligible? I say
yes.
Slide 8
Point #1 continued Example 3: Imagine that we asked listeners
to transcribe (or repeat) nonsense syllables; e.g. ba, foo, blop,
poot, These utterances are not comprehensible this is why they are
called nonsense syllables. If listeners are able to repeat these
utterances accurately, would it be reasonable to say that they are
intelligible? In my opinion, the answer is yes. Moral: The term
that should be used in these definitions is RECOGNITION, not
understandability or comprehensibility do listeners recognize the
speech sounds that are spoken?
Slide 9
Aside: Nonsense utterances are sometimes used to test
intelligibility. Why? Nonsense utterances directly test the
intelligibility of speech with almost no influence of language no
syntax, no semantics, no lexicon. (The only part of the language
system that comes into play is phonology the test utterances
conform to English phonotactic* rules; e.g., utterances like svek
[sv k], ngah [ ], or bih [b ] are not used because they violate
English phonotactic rules.)
Slide 10
*Aside: For those who are not familiar with the concept of a
phonotactic rule: Phonotactic rules are one type (out of three) of
phonological rule. Phonotactic rules specify permissible and
impermissible combinations of speech sounds. They are language
specific, and all languages have them. Some examples of English
phonotactic rules: English words cannot begin with / t/; e.g., stot
/st t/ is not an English word, but it could be. On the other hand,
a word such as shtot / t t/ is not permitted; i.e., it violates an
English phonotactic rule. English words can begin with /m/ or /n/,
but they cannot begin with //. English words cannot end in lax
vowels (e.g., / /, / /, / /). For example, /di/ is a possible
English word (/i/ is a tense vowel), but not /d / (a lax vowel);
/fu/ is a possible English word (/u/ is a tense vowel), but not /f
/, etc.
Slide 11
Point #1, the bottom line: I am arguing against the definitions
of intelligibility that include understanding or comprehension. So,
what definition should be used? My opinion is that for applications
in this field we need a definition that focuses explicitly on the
transmission of SPEECH information (i.e., not language and not
meaning). My proposal: An utterance is intelligible to the degree
that the speaker and the listener agree on what was said.
Intelligibility is maintained to the degree that the listener
recognizes the words and/or speech sounds that were intended by the
talker.
Slide 12
For SLPs who work with dysarthric speakers, deaf speakers, kids
with speech-sound disorders, etc., it is not a crazy idea to assume
that variations in intelligibility arise mainly from the person who
is talking. But, that does not mean that the listener plays no role
in explaining variations in intelligibility. More soon.
Slide 13
Note: This concept that intelligibility requires agreement
between the talker and the listener is a simple but important idea.
Quiz: If the talker intends to say one thing but the listener hears
something else, what might be responsible for the error? a.the
speaker b.the listener c.the transmission channel (room acoustics,
electronics, etc.) d.It is not possible to know. e.all of the above
f.some combination of the above g.all of the below
Slide 14
Point #2: Does speech intelligibility characterize: (a) the
speaker, (b) the listener, or (c) the transmission channel (room
acoustics and any electronics between the speaker & listener
more on this soon) ? Short answer: Yes. This is a pretty big deal,
so well spend a little time talking about.
Slide 15
The 1 st scientists to take a serious interest in speech
intelligibility were not phoneticians or SLPs. They were
communications engineers the folks at Bell Labs. Their problem:
talker > telephone system > listener Stated more generally:
talker > transmission channel > listener Now, the reason for
all this: If the communication engineer finds that intelligibility
isnt ideal, he/she will assume that the problem: a.the talker b.the
transmission channel c.the listener
Slide 16
The phone system (as it looked in the 1 st half of the 20 th
century its way more complicated now): talker > mic > amp
> BP filter (~300-4000 Hz) > conversion to FM radio signal
> more amps > many miles of cable > switching network >
more amps > more cables > conversion from FM signal back to
sound > another amp > earphone > listener The phone
company is interested in the stuff in between the talker and the
listener the transmission channel.
Slide 17
Short form of the telephone system: talker > transmission
channel > listener Now, the reason for all this: If the
communication engineer finds that intelligibility isnt ideal,
he/she will assume that the problem: a.The talker b.The
transmission channel c.The listener
Slide 18
Recall the question we started with: Does speech
intelligibility characterize: (a) the talker, (b) the listener, or
(c) the transmission channel? To the communications engineer, the
answer is (c) the transmission channel. Why? (1) The talker and
listener are unremarkable (i.e., ordinary talker, ordinary
listener), and (2) the transmission channel is the only part of the
system they have any control over.
Slide 19
How does an SLP answer the same question? To the SLP, does
speech intelligibility characterize: (a) the speaker, (b) the
listener, or (c) the listening environment/transmission channel
(explanation soon)? For an answer, lets look again at one of the
definitions of intelligibility we saw earlier: Intelligibility is
affected by spoken clarity, explicitness, lucidity,
comprehensibility, perspicuity, and precision. Does this
description assume that intelligibility characterizes the speaker,
the listener, or the transmission channel?
Slide 20
It absolutely does: The terms spoken clarity, explicitness,
lucidity, comprehensibility, perspicuity, and precision all refer
explicitly to the talker, not the listener, not the communication
channel. The assumption is made that: (1) the communication channel
is unremarkable (e.g., a quiet room, live voice of very simple
electronics), and (2) the listener is unremarkable (normal hearing
adult). These are not crazy ideas. Q: Does this mean that the
listener doest matter? A: No.
Slide 21
The most important idea that needs to be understood here is a
pretty one: The listeners familiarity with the talker can make a
big difference. Listeners are remarkably good at adapting or
accommodating to speech that is distorted in a variety of different
ways. There is all kinds of excellent literature on this topic, but
ordinary experience with daily life is enough to make the
point.
Slide 22
a. Gay Bardino b. You need nukin. Example 1: The speech of very
young kids.
Slide 23
Example 2: Accented speech the case of Jimmy Brinegar.
http://www.deeplake.com/content/sounds/koth/boomhauer/dogs.wav
http://www.deeplake.com/content/sounds/koth/boomhauer/dogs.wav
http://www.deeplake.com/content/sounds/koth/boomhauer/seinfeld.wav
Example 3: Speech distorted by bad electronics. Moral: The goal is
to find a measure of speech intelligibility that characterizes the
talker, but the listener does matter. Why does the ability of
listeners to accommodate to atypical speech matter? If
intelligibility improves throughout the course of treatment, who is
getting better the talker or the listener? Its a very simple
problem. Solution? Ask someone other than the clinician to do the
listening. Practical??
Slide 24
Two very different general approaches are used to assess speech
intelligibility: 1.Subjective estimates made by clinicians (by far
the most common) 50% intelligible, 80% intelligible,... Kent calls
this method scaling (e.g., Kent et al., 1989, JSHD, 54, 482-499).
2.Objective measurement usually the percentage of words that are
accurately recognized by a listener. Kent calls this method item
identification (Kent et al., 1989). Subjective estimates are easy
to make, but: (a) reliability is imperfect, (b) estimates can
change as clinicians become more familiar with the talker. (The
same can be true of item identification, depending on how its
done).
Slide 25
What should be used as speech material? Big surprise: (a) there
are many choices, (b) the choice of speech material matters. 1.
Conversational speech: From one point of view, this is a great
choice its exactly what you want to know. How well would the talker
be expected to do in ordinary conversation? Obvious problem: The
topic of conversation will vary all over the place, making it
impossible to get an intelligibility measure that is standardized
in any way either across different clients or even within the same
talker across time.
Slide 26
2. Words: Standard word lists can be used. There are many of
these word lists available. Word lists are very easy to score and,
of course, they are standard across talkers. Some word
intelligibility tests provide multiple word lists with equivalent
intelligibility. This is a big deal: Listeners may still be
adapting to the speech of the talker, but at least they will not be
as likely to learn the word lists. (This can still present a
problem if the word lists are used frequently.)
Slide 27
3. Sentences: Standard lists of sentences can be also be used.
There are many of these available as well. Sentences can be very
useful since there are some talkers who can speak intelligibly with
single words but may have greater difficulty with more complicated
utterances.
Slide 28
Effects of Predictability Speech is massively redundant, which
means that listeners do not need to catch every little
acoustic-phonetic detail in order to recognize what is being said.
This applies to both words and sentences, but especially to
sentences. All else being equal, as predictability goes up
intelligibility goes up. Bone-headed simple example: Mary had a
little [wildly distorted something-or-other]. Theres no mystery
about the missing word.
Slide 29
Striking demo from Warren (Science, 167, 3923933). The state
governors met with their respective legislatures convening in the
capital city. Warren entirely deleted one of the speech sounds (the
[s] of legislatures) and replaced it with a cough. The [s] was
gone. Out of 20 listeners, 19 did not notice that anything was
missing; one listener thought that a sound was missing but guessed
wrong about which one. Q: How did listeners hear a sound that wasnt
there? A: Their brains created it.
Slide 30
Striking demo from Warren (Science, 167, 3923933). The state
governors met with their respective legislatures convening in the
capital city. Warren entirely deleted one of the speech sounds (the
[s] of legislatures) and replaced it with a cough. The [s] was
gone. Out of 20 listeners, 19 did not notice that anything was
missing; one listener thought that a sound was missing but guessed
wrong about which one. Q: How did listeners hear a sound that wasnt
there? A: Their brains created it.
Slide 31
What is the relevance of this to intelligibility testing?
Pretty simple: One speaker is 75% intelligible, another is 50%
intelligible, but different sentence intelligibility tests were
used. Does that comparison mean anything? Its hard to know for
sure, but probably not. Example: HINT sentences: TIMIT
sentences: