Eric Cosatto - Highlands NJ, US Hans Peter Graf - Lincroft NJ, US Fu Jie Huang - Plainsboro NJ, US
Assignee:
AT&T Corp. - New York NY
International Classification:
G10L 15/26 G06T 15/70 H04M 9/475
US Classification:
704235, 704260, 345473, 348515
Abstract:
A system and method for generating a video sequence having mouth movements synchronized with speech sounds are disclosed. The system utilizes a database of n-phones as the smallest selectable unit, wherein n is larger than 1 and preferably 3. The system calculates a target cost for each candidate n-phone for a target frame using a phonetic distance, coarticulation parameter, and speech rate. For each n-phone in a target sequence, the system searches for candidate n-phones that are visually similar according to the target cost. The system samples each candidate n-phone to get a same number of frames as in the target sequence and builds a video frame lattice of candidate video frames. The system assigns a joint cost to each pair of adjacent frames and searches the video frame lattice to construct the video sequence by finding the optimal path through the lattice according to the minimum of the sum of the target cost and the joint cost over the sequence.
System And Method For Triphone-Based Unit Selection For Visual Speech Synthesis
Eric Cosatto - Highlands NJ, US Hans Peter Graf - Lincroft NJ, US Fu Jie Huang - Middletown NJ, US
Assignee:
AT&T Corp. - New York NY
International Classification:
G10L 21/06 G10L 21/00
US Classification:
704235, 704270
Abstract:
A system and method for generating a video sequence having mouth movements synchronized with speech sounds are disclosed. The system utilizes a database of n-phones as the smallest selectable unit, wherein n is larger than 1 and preferably 3. The system calculates a target cost for each candidate n-phone for a target frame using a phonetic distance, coarticulation parameter, and speech rate. For each n-phone in a target sequence, the system searches for candidate n-phones that are visually similar according to the target cost. The system samples each candidate n-phone to get a same number of frames as in the target sequence and builds a video frame lattice of candidate video frames. The system assigns a joint cost to each pair of adjacent frames and searches the video frame lattice to construct the video sequence by finding the optimal path through the lattice according to the minimum of the sum of the target cost and the joint cost over the sequence.
System And Method For Triphone-Based Unit Selection For Visual Speech Synthesis
Eric Cosatto - Highlands NJ, US Hans Peter Graf - Lincroft NJ, US Fu Jie Huang - Middletown NJ, US
Assignee:
AT&T Intellectual Property II, L.P. - Atlanta GA
International Classification:
G10L 11/00 G06T 13/00 G06K 9/00
US Classification:
704235, 704258, 704270, 345473, 382100, 382118
Abstract:
A system and method for generating a video sequence having mouth movements synchronized with speech sounds are disclosed. The system utilizes a database of n-phones as the smallest selectable unit, wherein n is larger than 1 and preferably 3. The system calculates a target cost for each candidate n-phone for a target frame using a phonetic distance, coarticulation parameter, and speech rate. For each n-phone in a target sequence, the system searches for candidate n-phones that are visually similar according to the target cost. The system samples each candidate n-phone to get a same number of frames as in the target sequence and builds a video frame lattice of candidate video frames. The system assigns a joint cost to each pair of adjacent frames and searches the video frame lattice to construct the video sequence by finding the optimal path through the lattice according to the minimum of the sum of the target cost and the joint cost over the sequence.