1
5,075,896 43.72.Ne CHARACTER AND PHONEME RECOGNITION BASED ON PROBABILITY CLUSTERING Lynn D. Wilcox and A. Lawrence Spitz, assignorsto Xerox Corporation 24 December 1991 (Class 382/39); filed 25 October 1989 This recognition technique, applicable to eitherwritten characters or spoken phoneroes, keeps the IDs of clusters in the phonetic candidate space instead of making specific phoneme choices immediately. In this way, more classification information about the best fitting candidate is retained while maintaining the low bandwidth between recognition com- ponents needed for modularity and fiexibility.--DLR 5,146,503 43.72.Ne SPEECH RECOGNITION Ian R. Cameron and Paul C. Millar, assignors to British Telecommunications public limited company 8 September 1992 (Class 381/43); filed in the United Kingdom28 August 1987 This method combines the time/spectral analysismatrices from multiplerepetitions of a given utterance by oneor more speakers to select a representative feature matrix for the word.Representative patternsare selected by computing the distances from each pattern to each other pattern of the same word, and to all patterns of wordswhich compete within the same syntax.A variety of token pronunciations is assured by combining "list" style readings with utterances promptedas answers to questions.--DLR 5,136,653 43.72.Ne ACOUSTIC RECOGNITION SYSTEM USING ACCUMULATE POWER SERIES Ryohei Kumagai et aL, assignors to Ezel, Incorporated 4 August1992 (Class381/43); filed 11 January 1989 The patent describes the•pplication of certain high-speed, two- dimensional image processing hardware to phonetic recognition. The sys- tem relies on local (small domain) logicelements to form a binary quan- tization of the log power levels of the input. Median filteringis used to smooth the log power data and a differentiation producesa two- dimensional patternof powerdips.A two-dimensional associative access system allowsretrieval of written characters or other patterns based on the power dip patterns.--DLR 5,136,654 43.72.Ne VOCABULARY PARTITIONED SPEECH RECOGNITION APPARATUS William F. Ganong, III, et aL, assignorsto Kurzweil Applied Intelligence, Incorporated 4 August1992 (Class 381/41); filed 19 October 1989 The search time needed to locate an isolated-word, vector-quantized spectral featurepattern in a large-vocabulary reference space is reduced by partitioning the reference space. A lookup tableof interframe distances between all entries in a frame codebook simplifies the calculation of in- terpattern distances. The partitioning method iteratively adjusts partition boundaries so as to roughlybalance partition sizes while minimizingthe distances within eachpartition to a selected representative pattern. Rec- ognition proceeds by findingdistances from a new input to all of the partition representatives, ordering the partitionsby this distance, then searching partitions in order until a match criterionis met.--DLR 5,144,672 43.72.Ne SPEECH RECOGNITION APPARATUS INCLUDING SPEAKER-INDEPENDENT DICTIONARY AND SPEAKER-DEPENDENT Shoji Kuriki, assignor to Ricoh Company,Limited 1 September 1992 (Class 381/41); filed in Japan 5 October 1989 As I understand it, thisisolated-word speech recognizer includes the usual speaker-dependent set of reference vocabulary patterns,but also maintains a set of speaker-independent patterns formed by averaging mul- tiple dependent patterns. Recognition matches are performed on the in- dependent patterns after amplitude normalizing to the maximumvaluein the time/spectralmatrix making up each pattern. How can this be an improvement overa traditional system with separate pattern sets for each speaker?--DLR 5,151,940 43.72.Ne METHOD AND APPARATUS FOR EXTRACTING ISOLATED SPEECH WORD Makoto Okazaki and Koji Eto, assignorsto Fujitsu Limited 29 September 1992 (Class381/43);,filed in Japan24 December t987 This methoddetects the beginning and end of an utterance in con- tinuous input,based on power levels in each of two overlapping frequency bands. The patent refers to power in a low band below 3 KHz as "vowel power", and powerin a high bandabove! KHz as "consonant power." Each band power value is compared to a distinct threshold.A simple decision tree looksat the durations of the intervals in which the power levels were above or below threshold.--DLR 5,179,624 43.72.Ne SPEECH RECOGNITION APPARATUS USING NEURAL NETWORK AND FUZZY LOGIC Akio Amano et aL, assignors to Hitachi, Limited 12 January 1993 (Class395/2); filed in Japan 7 September 1988 This phonetics-learning speech recognizer uses fuzzy logicto select a preliminary groupof candidate phonetic categories for each input pho- netic segment. Another fuzzy logic systemmakes a final choice from ,,,3.o• } ,•2kj •" • •' •lJ• •' 10-1 10-2 10-3 10-•, 10-16 among the candidates. Whenever correct recognition resultsare make known to the system, a neural network sensing the fuzzy weightings is trained to the new settings. The result is to reshape the fuzzy logic components.--DLR 1185 J. Acoust. Soc. Am., Vol. 95, No. 2, February 1994 Review of AcousticalPatents 1185 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 134.129.164.186 On: Sat, 20 Dec 2014 16:37:28

Vocabulary partitioned speech recognition apparatus

Embed Size (px)

Citation preview

Page 1: Vocabulary partitioned speech recognition apparatus

5,075,896

43.72.Ne CHARACTER AND PHONEME RECOGNITION BASED ON PROBABILITY CLUSTERING

Lynn D. Wilcox and A. Lawrence Spitz, assignors to Xerox Corporation

24 December 1991 (Class 382/39); filed 25 October 1989

This recognition technique, applicable to either written characters or spoken phoneroes, keeps the IDs of clusters in the phonetic candidate space instead of making specific phoneme choices immediately. In this way, more classification information about the best fitting candidate is retained while maintaining the low bandwidth between recognition com- ponents needed for modularity and fiexibility.--DLR

5,146,503

43.72.Ne SPEECH RECOGNITION

Ian R. Cameron and Paul C. Millar, assignors to British Telecommunications public limited company

8 September 1992 (Class 381/43); filed in the United Kingdom 28 August 1987

This method combines the time/spectral analysis matrices from multiple repetitions of a given utterance by one or more speakers to select a representative feature matrix for the word. Representative patternsare selected by computing the distances from each pattern to each other pattern of the same word, and to all patterns of words which compete within the same syntax. A variety of token pronunciations is assured by combining "list" style readings with utterances prompted as answers to questions.--DLR

5,136,653

43.72.Ne ACOUSTIC RECOGNITION SYSTEM

USING ACCUMULATE POWER SERIES

Ryohei Kumagai et aL, assignors to Ezel, Incorporated 4 August 1992 (Class 381/43); filed 11 January 1989

The patent describes the•pplication of certain high-speed, two- dimensional image processing hardware to phonetic recognition. The sys- tem relies on local (small domain) logic elements to form a binary quan- tization of the log power levels of the input. Median filtering is used to smooth the log power data and a differentiation produces a two- dimensional pattern of power dips. A two-dimensional associative access system allows retrieval of written characters or other patterns based on the power dip patterns.--DLR

5,136,654

43.72.Ne VOCABULARY PARTITIONED SPEECH

RECOGNITION APPARATUS

William F. Ganong, III, et aL, assignors to Kurzweil Applied Intelligence, Incorporated

4 August 1992 (Class 381/41); filed 19 October 1989

The search time needed to locate an isolated-word, vector-quantized spectral feature pattern in a large-vocabulary reference space is reduced by partitioning the reference space. A lookup table of interframe distances between all entries in a frame codebook simplifies the calculation of in- terpattern distances. The partitioning method iteratively adjusts partition boundaries so as to roughly balance partition sizes while minimizing the distances within each partition to a selected representative pattern. Rec- ognition proceeds by finding distances from a new input to all of the partition representatives, ordering the partitions by this distance, then searching partitions in order until a match criterion is met.--DLR

5,144,672

43.72.Ne SPEECH RECOGNITION APPARATUS

INCLUDING SPEAKER-INDEPENDENT DICTIONARY AND SPEAKER-DEPENDENT

Shoji Kuriki, assignor to Ricoh Company, Limited 1 September 1992 (Class 381/41); filed in Japan 5 October 1989

As I understand it, this isolated-word speech recognizer includes the usual speaker-dependent set of reference vocabulary patterns, but also maintains a set of speaker-independent patterns formed by averaging mul- tiple dependent patterns. Recognition matches are performed on the in- dependent patterns after amplitude normalizing to the maximum value in the time/spectral matrix making up each pattern. How can this be an improvement over a traditional system with separate pattern sets for each speaker?--DLR

5,151,940

43.72.Ne METHOD AND APPARATUS FOR EXTRACTING ISOLATED SPEECH WORD

Makoto Okazaki and Koji Eto, assignorsto Fujitsu Limited 29 September 1992 (Class381/43);,filed in Japan 24 December t987

This method detects the beginning and end of an utterance in con- tinuous input, based on power levels in each of two overlapping frequency bands. The patent refers to power in a low band below 3 KHz as "vowel power", and power in a high bandabove ! KHz as "consonant power." Each band power value is compared to a distinct threshold. A simple decision tree looks at the durations of the intervals in which the power levels were above or below threshold.--DLR

5,179,624

43.72.Ne SPEECH RECOGNITION APPARATUS

USING NEURAL NETWORK AND FUZZY LOGIC

Akio Amano et aL, assignors to Hitachi, Limited 12 January 1993 (Class 395/2); filed in Japan 7 September 1988

This phonetics-learning speech recognizer uses fuzzy logic to select a preliminary group of candidate phonetic categories for each input pho- netic segment. Another fuzzy logic system makes a final choice from

,,,3.o• } • ,•2kj •" • •'

•lJ• •'

10-1 10-2 10-3 10-•, 10-16

among the candidates. Whenever correct recognition results are make known to the system, a neural network sensing the fuzzy weightings is trained to the new settings. The result is to reshape the fuzzy logic components.--DLR

1185 J. Acoust. Soc. Am., Vol. 95, No. 2, February 1994 Review of Acoustical Patents 1185

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 134.129.164.186 On: Sat, 20 Dec 2014 16:37:28