David J. Attwater - Ipswich, GB Michael D. Edgington - Bridgewater MA, US Peter J. Durston - Suffolk, GB
Assignee:
British Telecommunications public limited company - London
International Classification:
G10L 1528
US Classification:
704255, 704257, 704245, 704238
Abstract:
In this invention dialogue states for a dialogue model are created using a training corpus of example human—human dialogues. Dialogue states are modelled at the turn level rather than at the move level, and the dialogue states are derived from the training corpus. The range of operator dialogue utterances is actually quite small in many services and therefore may be categorized into a set of predetermined meanings. This is an important assumption which is not true of general conversation, but is often true of conversations between telephone operators and people. Phrases are specified which have specific substitution and deletion penalties, for example the two phrases “I would like to” and “can I” may be specified as a possible substitution with low or zero penalty. Thus allows common equivalent phrases are given low substitution penalties. Insignificant phrases such as ‘erm’ are given low or zero deletion penalties.
Methods And Apparatus For Formant-Based Voice Systems
In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.
Methods And Apparatus For Replaceable Customization Of Multimodal Embedded Interfaces
Daniel Roth - Boston MA, US William Barton - Harvard MA, US Michael Edgington - Bridgewater MA, US Laurence Gillick - Newton MA, US
Assignee:
Voice Signal Technologies, Inc. - Woburn MA
International Classification:
G06F017/28
US Classification:
704005000
Abstract:
According to certain aspects of the invention a mobile voice communication device includes a wireless transceiver circuit for transmitting and receiving auditory information and data, a processor, and a memory storing executable instructions which when executed on the processor causes the mobile voice communication device to provide a selectable personality associated with a user interface to a user of the mobile voice communication device. The executable instructions include implementing on the device a user interface that employs the different user prompts having the selectable personality, wherein each selectable personality of the different user prompts is defined and mapped to data stored in at least one database in the mobile voice communication device. The mobile voice communication device may include a decoder that recognizes a spoken user input and provides a corresponding recognized word, and a speech synthesizer that synthesizes a word corresponding to the recognized word. The device includes user-selectable personalities that are either transmitted wirelessly to the device, transmitted through a computer interface, or provided as memory cards to the device.
Michael Edgington - Bridgewater MA, US Laurence Gillick - Newton MA, US Igor Zlokarnik - Natick MA, US
Assignee:
Voice Signal Technologies, Inc. - Woburn MA
International Classification:
G10L 15/08
US Classification:
704240000
Abstract:
A method of extracting a subset of speech units from a larger set of speech units for use by a speech synthesizer in synthesizing speech, wherein the speech units are stored in a compressed encoded representation that was generated by a codec, the method comprising: selecting members of the subset of speech units based on an overall cost associated with using the speech synthesizer to synthesize a test set of speech, wherein the overall cost includes at least one error introduced by using the codec to decode the stored representations of the speech units; and storing the selected subset of speech units on a speech-enabled device.
Using Codec Parameters For Endpoint Detection In Speech Recognition
Michael D. Edgington - Bridgewater MA, US Stephen W. Laverty - Worcester MA, US Gunnar Evermann - Boston MA, US
Assignee:
Nuance Communications, Inc. - Burlington MA
International Classification:
G10L 15/00
US Classification:
704231, 704E15001
Abstract:
Systems, methods and apparatus for determining an estimated endpoint of human speech in a sound wave received by a mobile device having a speech encoder for encoding the sound wave to produce an encoded representation of the sound wave. The estimated endpoint may be determined by analyzing information available from the speech encoder, without analyzing the sound wave directly and without producing a decoded representation of the sound wave. The encoded representation of the sound wave may be transmitted to a remote server for speech recognition processing, along with an indication of the estimated endpoint.
Configurable Speech Recognition System Using Multiple Recognizers
Michael Newman - Somerville MA, US Anthony Gillet - Wilmington MA, US David Mark Krowitz - Reading MA, US Michael D. Edgington - Bridgewater MA, US
Assignee:
Nuance Communications, Inc. - Burlington MA
International Classification:
G10L 19/00 G10L 15/00
US Classification:
704201, 704E19001, 704E15047
Abstract:
Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.
Configurable Speech Recognition System Using Multiple Recognizers
Michael Newman - Somerville MA, US Anthony Gillet - Wilmington MA, US David Mark Krowitz - Reading MA, US Michael D. Edgington - Bridgewater MA, US
Assignee:
Nuance Communications, Inc. - Burlington MA
International Classification:
G10L 15/00
US Classification:
704231, 704E15001
Abstract:
Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.
Configurable Speech Recognition System Using Multiple Recognizers
Michael Newman - Somerville MA, US Anthony Gillet - Wilmington MA, US David Mark Krowitz - Reading MA, US Michael D. Edgington - Bridgewater MA, US
Assignee:
Nuance Communications, Inc. - Burlington MA
International Classification:
G10L 15/00
US Classification:
704231, 704E15001
Abstract:
Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.