Language is the most natural tool for human communication [1]. As the economy grows steadily and globalisation accelerates, a lot of people choose to learn a language besides their mother tongue. During the acquisition of a second language, pronunciation plays an important role. Wrong pronunciation can impede people to learn a language, and make them hard to communicate using the language. The main reason for pronunciation errors in learning a foreign language is the interference of mother tongue. With the popularity of second language learning, the study of learning rules and pronunciation evaluation has sprung up. In the late 1960s, people began to pay attention to the Second Language Acquisition (SLA) study, which focuses on the nature of SLA and the process of such acquisition [2].
The main domestic dynamic studies on second language phonetic acquisition include Li Hongyin’s analysis (1995) on the Thai students’ errors in learning Chinese phonetics in terms of sound, rhyme and tone by use of contrastive analysis method [3]. Fu Shimei and Zhang Weijia (2004) made some specific tests on the Vietnamese learners with primary, intermediate and advanced Chinese proficiency to analyse and study their errors in perception and pronunciation of Chinese initial consonants. Wang Maolin and Sun Yuqing (2007) conducted analyses and studies on the difficulties faced by the Indonesian Chinese students in the pronunciation of Chinese compound vowel sound under experimental phonetics [5]. In 2010 Deng Dan analysed the errors in three kinds of compound vowels of American students learning Mandarin [6]. In 2015, Wu Sicong analysed and studied the errors of Vietnamese students in the process of acquiring initial consonant + [ie], [iou], [ia] and [ua], and suggested some effective teaching methods for Chinese teaching in the future [7, 8].
In recent years, with the development of science and technology, online education is more and more popular, and the favourable conditions such as strong computation resources taking cloud as core, highly popular mobile smart devices and rapidly developing speech processing technologies, make the Computer Assisted Language Learning (CALL) become more and more welcome [9]. In addition, as China plays a more and more important role in the international community, most countries in the world have turned their attention to China, and began to pay attention to Chinese culture and learn the Chinese language. Nowadays, there is a surge of excitement in learning Chinese all over the world, and shortly, Chinese will go to the world and build a bridge of communication among people in the globe [10]. With the development of computer technology and the arrival of the information age, China is also actively promoting the informationisation of minority languages, and relevant research institutions have been set up like bamboo shoots after a spring rain for the past few years. As phonetics and computer technology develop, great achievements have been made in the research of Uyghur speech synthesis and recognition technology. However, there is still a long way to go in the research of speech recognition technology and computer-aided language learning system for SLA.
This study takes learners as the center, collects and records the relevant pronunciation of Putonghua in ethnic minority areas, and analyses the acoustic differences of different Mandarin learners in pronouncing Putonghua double vowels, and to the parties to learn more about their language learning methods and processes, to grasp the causes of the differences. The remainder is organised as below: Section 2 introduces the database; Section 3 analyses the experimental results; Section 4 deals with the similarity analysis; Section 5 summarises and makes prospects.
In this study, there are a total of 40 subjects in the diphthong acquisition experiment, including 30 learners. The other 10 are standard speakers from Beijing, whose Mandarin belongs to standard Mandarin, with an average age of 23.
According to their Putonghua proficiency, the 30 learners are divided into three groups as learners at different learning stages. Learners’ Putonghua proficiency (score) is based on a Putonghua proficiency test platform (MHK score), which is very authoritative. The Chinese proficiency of ethnic minorities in Xinjiang is mainly based on the MHK score. The detailed grouping is shown in Table 1.
Details of subjects’ grouping.
Primary level group | 10 | Class III B | Beginner |
Intermediate level group | 10 | Grade 3A | Intermediate |
Advanced level group | 10 | Grade 4A | Advanced |
Standard group (reference group) | 10 | – | Standard |
The experimental voice is collected in a special recording studio, with notebooks, external sound cards, microphones and some interconnected data cables. The external sound card can control the sound volume, denoise, and supervise burst sound and so on. Matlab compiles the recording programme, with the voice acquisition frequency of 16 kHz. The recording work lasts from March 2017 to October 2017.
The subjects’ reading materials included 500 Chinese monosyllables and 300 continuous Chinese language flows. A total of 500 Chinese single bytes contain double vowel single bytes. Each student needs 4–5 h of recording.
After the subjects’ voice is collected, it needs to be processed in a series of ways, as follows:
(1) Speech tagging: Forced alignment of Putonghua speech of collected learners and standard speakers, the automatic segmentation system used in this study can divide the speech into phoneme levels, and the accuracy of the automatic segmentation system is 93.92% within the error-limited 20 ms [11]. Because this method aims at the recording data of standard speakers, the learners’ speech tagging error is very large and needs to be checked manually. To obtain the accuracy of the speech feature parameters and the reliability of the research, we repeatedly check to ensure that the correct rate of labelling is >95%. (2) Extracting subjects’ speech: according to the marked single-byte speech, the number of double vowels extracted from each group is about 380 for ɑi/ai/,180 for [ei]/ei/, about 435 for ao/au/, about 340 for [ou]/ou/and about 75 for iɑ/i A/. For [ie]/i/•/, there are about 240, for uɑ/u A/, there are about 100, for [uo]/uo/, there are about 365, and for üe/y•/, there are about 105. After we extract the vowel data from the speech annotation file, we begin to calculate the acoustic feature parameters of the vowels. (3) The praat speech analysis software is used to extract the first two resonance peaks (F1, F2) for the resonance peak parameters used in this study. Because of the continuity of the double vowels, we extract the resonance peaks of 10 points at equal intervals for each double vowel. And to reduce the error of each pronunciation sample, we convert the formant frequency value (Hz) into Barker value (bark). The Barker value (bark) is an auditory equivalent. The general formula is expressed as follows.
(4) Draw the formant trajectory map and acoustic vowel diagram, and use the analysis of variance and Euclidean distance to analyse the experimental results.
What needs to be studied in this experiment is the process of learners’ acquisition of Putonghua diphonetics. The flow chart of the whole experiment design is shown in Figure 1.
For female learners, the starting point of [ai] is higher F(1,171) = 25.3,
The female learners’ starting point in [ei] is higher F(1,85) = 10.3,
The female learners’ starting point of [ao] forwards F(1,197) = 20.4, (
Female learners had a higher [ou] starting point, F(1,137) = 25.9,
The starting point of female learners in [ia] is a little higher F(1,31) = 7.2,
For female learners, the starting point of [ie] is at the front, F(1,95) = 6.5,
The end point of female learners’ pronunciation of [ua] is basically the same as that of female standard speakers’ pronunciation of [ua]. The starting point is a little bit behind F(1,39) = 4.9,
The starting position of female learners’ pronunciation is more front F(1,157) = 8.6,
In general, there are obvious differences in the pronunciation of [ai], [ao], [ou], [ua] for female learners while in the pronunciation of [ai], [ao], [ou], [ua], [uo] for male learners.
Figures 4 and 5 are the vowel charts for intermediate level learners and standard speakers. The starting position of [ai] for female learners is a little higher F(1,189) = 40.6,
The starting point of [ei] for female learners is a little higher F(1,75) = 41.2,
The starting point of [ao] for female learners is more front F(1,215) = 92.5,
The female learners pronounce [ou] basically the same as female standard speakers, only the end position is more front F(1,165) = 5.7,
The starting point of [ia] for female learners is more front F(1,39) = 4.8,
The starting point of [ie] for female learners is more front F(1,75) = 6.5,
The starting point of pronunciation [ua] for female learners is basically the same as that of female standard speakers. The end position is a little higher F(1,49) = 27.7,
The starting position of pronunciation [uo] for female learners is a little backward F(1,155) = 14.2,
The starting point of pronunciation [üe] for female learners is basically the same as that of female standard speakers. The end position is a little backward F(1,47) = 23.1,
In general, there are obvious differences in the pronunciation of [ai], [ao], [ie], [ua] for female learners while in the pronunciation of [ai], [ao], [ie], [ua], [uo] for male learners.
Figures 6 and 7 are the acoustic vowel charts for advanced level learners and standard speakers. Form 3 shows the similarity of Euclidean Distance.
The starting point of pronunciation [ai] for female learners is backward F(1,170) = 29.6,
The end tongue position of pronunciation [ei] for female learners is a little higher F(1,85) = 5.1,
The end tongue position of pronunciation [ao] for female learners is more front F(1,196) = 10.53,
The pronunciation [ou] for female learners is basically the same as that of female standard speakers, only the end position is more front F(1,136) = 5.7,
The starting point of pronunciation [ia] for female learners is more front F(1,31) = 4.8,
The end tongue position of [ie] for female learners is more front F(1,31) = 7.41,
When female learners pronounce [ua], [uo] and [üe], there is no obvious difference in the starting point and final point. When male learners pronounce [ua], the starting position is backward F(1,37) = 7.42,
Generally speaking, girls have significant differences in [ai] and [ua], while boys have significant differences in [ai], [ei], [ua] and [uo].
Euclidean distance is the most easily understood method to calculate distance, and it is also one of the commonly used measures to measure the degree of similarity [1]. The size of the Euclidean distance reflects the degree of similarity between the two factors. The smaller the number of Euclidean distance is, the smaller will be the difference between the two factors are [12, 13]. The Euclidean distance formula between two points
Taking formant as an acoustic feature, this paper calculates the Euclidean distance between the vowels of learners and standard speakers to compare and analyse the acquisition trajectory of Putonghua double vowels. Tables 2–4 show the Euclidean distance between the double vowels of learners at primary, intermediate and advanced levels and standard speakers, respectively.
Euclidean distance between the primary level learner and the standard speaker.
[ai] | 0.72 | [ai] | 0.44 | ||
[ei] | 0.92 | [ei] | 0.82 | 0.21 | |
[ao] | 0.71 | 0.39 | [ao] | 0.94 | 0.28 |
[ou] | 0.61 | 0.67 | [ou] | 0.72 | |
[ia] | 0.68 | 0.79 | [ia] | 0.39 | 0.58 |
[ie] | 0.88 | 0.01 | [ie] | 0.31 | |
[ua] | [ua] | 0.62 | |||
[uo] | 0.42 | 0.45 | [uo] | ||
[üe] | 0.68 | 0.76 | [üe] | 0.25 | 0.33 |
Euclidean distance between intermediate learners and standard speakers.
[ai] | 0.83 | [ai] | |||
[ei] | 0.51 | 0.61 | [ei] | 0.87 | 0.31 |
[ao] | 0.93 | 0.27 | [ao] | 0.72 | 0.83 |
[ou] | 0.52 | 0.39 | [ou] | 0.23 | |
[ia] | 0.63 | [ia] | 0.26 | 0.62 | |
[ie] | 0.54 | 0.37 | [ie] | 0.29 | |
[ua] | [ua] | 0.68 | |||
[uo] | 0.21 | 0.62 | [uo] | 0.87 | |
[üe] | 0.40 | 0.19 | [üe] | 0.67 |
Euclidean distance between advanced level learners and standard speakers.
[ai] | [ai] | 0.84 | |||
[ei] | 0.65 | 0.91 | [ei] | 0.84 | 0.85 |
[ao] | 0.53 | [ao] | |||
[ou] | 0.57 | 0.64 | [ou] | 0.58 | |
[ia] | 0.98 | 0.62 | [ia] | 0.95 | 0.99 |
[ie] | 0.52 | 0.80 | [ie] | 0.93 | 0.83 |
[ua] | [ua] | ||||
[uo] | 0.36 | 0.68 | [uo] | ||
[üe] | 0.38 | 0.70 | [üe] | 0.91 | 0.34 |
As can be seen from Table 2, there is little difference between [ao] and [üe] during the acquisition of Putonghua double vowels by primary-level learners. Females have a large difference in ending segments of forward vowels [ai] and [ei], miss some syllables, and only read the part with clear pronunciation, which make the pronunciation to hear like the unit vowel [a] or [e]. There is also a great difference in the late vowel [ua], and they are more difficult to pronounce [ua]. Males have great differences in the pronunciation of prevowel [ai] and [ou]. The initial segments of the post-sound vowels [ua] and [uo] are quite different, and some boys only pronounce the clear and resounding part.
It can be seen from Table 3 that in the process of intermediate-level learners acquiring Putonghua double vowels, girls: have little difference in pronunciation of [ao], [ou], [ie], [uo] and [üe]. The main acquisition differences are shown in the ending segment of the forward vowel [ai] and the initial segment of the double vowel [ia]. There are obvious differences between the initial and ending segments of the double vowel [ua], so it is difficult to pronounce it. Boys: the pronunciation difference of [ei], [ao] and [ia] is small. The main difference is that it is difficult to pronounce the forward vowel [ai]. The initial segments of double vowels [ua] and [uo] are quite different, and their pronunciation is similar to that of single vowels.
It can be seen from Table 4 that in the process of intermediate level learners acquiring Putonghua double vowels, there are great differences in Females’ pronunciation of [ai] and [ua]. Males have great differences in [ai], [ao], [ou], [ua] and [uo], and have difficulty in pronunciation.
To sum up, whether male or female, learners of three-level proficiency acquire the worst pronunciation of double vowels [ua], which is significantly different from that of standard speakers. The acquisition of double vowels [üe] is the best. Among the three levels, the acquisition of male learners is worse than that of female learners, especially when pronouncing [uo], boys of all three level proficiency have difficulties. Among the three-level learners, with the improvement of the level of Putonghua, both the quality of their pronunciation and the acquisition become better and better.
The study of SLA is a hot topic. Both at home and abroad, good achievements have been made in the study of SLA both technically and theoretically. This paper mainly explores the learning trajectory and pronunciation quality of Putonghua double vowels by using phonetics analysis method.
Based on the characteristics of experimental phonetics, the law of phonetic learning is studied through acoustic analysis. In this paper, the characteristic parameters of the speech signal (formant frequency parameters) are extracted to obtain the acoustic vowel map of Putonghua double vowels acquired by learners. By calculating the difference (Euclidean distance) between learners and standard speakers, this paper explores the dynamic development and changes of Putonghua diphthong in different stages. By visualising the double vowel space of learners at different stages of Putonghua learning, we can intuitively observe the dynamic characteristics of the spatial structure of Putonghua double vowels acquired by learners and observe the acquisition trajectory of Putonghua double vowels. This paper studies the pronunciation quality of Putonghua double vowels of Uyghur learners based on phonetics analysis. The study shows that the learning trajectories of different double vowels are also different. The results show that with the improvement of learners’ proficiency in Putonghua, their overall structure of double vowels is closer to the standard spatial structure of double vowels in Putonghua. The contrastive analysis theory is further verified by observing the acoustic vowels. The inter-language dynamically tends to the target language. When Uyghur students pronounce Putonghua double vowels, they may be interfered with by their mother tongue, so their vowel space is relatively high, slightly to the right. Whether male or female, learners of the three levels acquire the worst pronunciation of double vowels [ua], which is significantly different from that of standard speakers. Among the three levels, male learners acquire worse than female learners, especially when pronouncing [uo], men of three levels have difficulties.