روش برنامه ریزی پویا برای ثبت نام زمانی حرکت سطحی زبان سه بعدی از سخنان چندگانه
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24849||2002||9 صفحه PDF||سفارش دهید||4022 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Speech Communication, Volume 38, Issues 1–2, September 2002, Pages 201–209
This study proposes a new method to reconstruct three-dimensional (3D) tongue surface motion during speech using only a few sections of the tongue measured with ultrasound imaging. Reconstruction of static 3D tongue surfaces has been reported. This is the first report for reconstruction of 3D tongue surface motion using ultrasound imaging. To temporally align data from multiple scan locations, a dynamic programming (DP) algorithm was used to line up the tokens collected from different repetitions by using the acoustic signals recorded simultaneously with the ultrasound images. Reconstruction error was evaluated by using a pseudo-motion measurement of known 3D tongue shapes. The average error was 0.39 mm, which was within the ultrasound measurement error, of 0.5 mm.
Ultrasound imaging has been used to assess three-dimensional (3D) tongue surface shapes of English consonants and vowels (Stone and Lundberg, 1996). By this method, a series of static 2D contours was spatially aligned to reconstruct a detailed 3D tongue surface. A special transducer collected 60 ultrasound scans in a polar sweep of 60° in 10 s. This method is suitable for sustained vowels and consonants, but 10 s is too slow to collect tongue motion. Lundberg and Stone (1999) used the same data sets (Stone and Lundberg, 1996) to determine a minimal number of slices, or optimized sparse set, for reconstruction of 3D static tongue surfaces without significantly reducing the reconstruction quality. The results showed that 5–6 coronal slices were adequate to reconstruct 3D tongue surfaces, i.e., the 3D tongue surface could be reconstructed by collecting a few 2D tongue contours at the optimized locations. In order to reconstruct 3D tongue surface motion during speech we must collect multiple 2D data sets at different scan locations. Any single data set, which is a sequence of 2D tongue contours, contains the 2D tongue motion at that specific location. The premise is that a number of 2D data sets can be combined later into a single 3D tongue motion by spatial and temporal alignment. The spatial position of different scans can be aligned using the pre-measured location of the transducer. For the temporal alignment we must consider that subjects vary in speaking rate and articulation for multiple repetitions. Speaking rate differences are even more likely when the repetitions are separated in time by other speech materials, as is the case with multiple ultrasound data sets. A time-warping algorithm is needed to align temporal variations in multiple repetitions. In automatic speech recognition, to eliminate the effect of large variation in the speaking rates and inter and intra-speaker variation, the dynamic programming (DP) algorithm has been used successfully (Itakura, 1975; Sakoe, 1979; Sakoe and Chiba, 1978; Rabiner and Juang, 1993; Ney and Ortmanns, 1999). The DP algorithm finds the optimal time registration between two repetitions based on the minimum total distance measure of the acoustic feature. Dang et al. (1997) used an X-ray microbeam system to measure the position of 8 points on the tongue surface during speech. Five metal pellets glued along the mid-sagittal tongue and three metal pellets glued on the para-sagittal tongue (1 cm apart from the mid-sagittal) were tracked separately by the system. Time differences between the data sets were synchronized by using spectrograms of the speech signals. Strik and Boves (1991) applied the DP algorithm to time-alignment and averaging of repeated physiological signals to improve the signal-to-noise ratio. The result showed that the DP algorithm was able to correct the timing differences among the repetitions. In this study, ultrasound imaging is used to reconstruct 3D tongue motion during normal speech using eight ultrasound images (2D), five coronal and three sagittal slices which were collected at different scan angles. The different slices were aligned manually on the computer using pre-measured information as to transducer location. The sagittal slices also were used to retrieve tongue tip information. For each section, the acoustic signal was recorded simultaneously with the ultrasound images. To temporally align data from multiple scans, a DP algorithm based on Rabiner and Juang (1993) was used on the acoustic signals to line up the tokens collected from different repetitions. Reconstruction error of the proposed method was evaluated with the 3D tongue shapes of Lundberg and Stone (1999).
نتیجه گیری انگلیسی
To measure and model 3D tongue motion is the goal of our research. This experiment provided a way to gather detailed information about the performance of DP algorithm used for 3D tongue motion reconstruction. It also gave us an approximate evaluation of the proposed method. The study established a reasonable method to reconstruct 3D tongue surface movement during speech using only a few sections of ultrasound images. The spatial position of the coronal contours was checked against the three sagittal sections, and the result showed that the relative shape of the coronal tongue changed consistently with the sagittal changes. Tongue surface reconstruction quality can be improved with a finely tuned surface smoothing algorithm. The 3D tongue surface sequence can be converted to a movie file containing the acoustic signal of the reference token. The movie facilitates observation of tongue motion during speech. The method provides a visualization tool to investigate tongue movement during speech. In future work, a 3D motion model is expected to extract tongue shape features from the 3D tongue motion data.