A contrastive study on the production of double vowels in Mandarin

Language is the most natural tool for human communication [1]. As the economy grows steadily and globalisation accelerates, a lot of people choose to learn a language besides their mother tongue. During the acquisition of a second language, pronunciation plays an important role. Wrong pronunciation can impede people to learn a language, and make them hard to communicate using the language. The main reason for pronunciation errors in learning a foreign language is the interference of mother tongue. With the popularity of second language learning, the study of learning rules and pronunciation evaluation has sprung up. In the late 1960s, people began to pay attention to the Second Language Acquisition (SLA) study, which focuses on the nature of SLA and the process of such acquisition [2].

The main domestic dynamic studies on second language phonetic acquisition include Li Hongyin’s analysis (1995) on the Thai students’ errors in learning Chinese phonetics in terms of sound, rhyme and tone by use of contrastive analysis method [3]. Fu Shimei and Zhang Weijia (2004) made some specific tests on the Vietnamese learners with primary, intermediate and advanced Chinese proficiency to analyse and study their errors in perception and pronunciation of Chinese initial consonants. Wang Maolin and Sun Yuqing (2007) conducted analyses and studies on the difficulties faced by the Indonesian Chinese students in the pronunciation of Chinese compound vowel sound under experimental phonetics [5]. In 2010 Deng Dan analysed the errors in three kinds of compound vowels of American students learning Mandarin [6]. In 2015, Wu Sicong analysed and studied the errors of Vietnamese students in the process of acquiring initial consonant + [ie], [iou], [ia] and [ua], and suggested some effective teaching methods for Chinese teaching in the future [7, 8].

In recent years, with the development of science and technology, online education is more and more popular, and the favourable conditions such as strong computation resources taking cloud as core, highly popular mobile smart devices and rapidly developing speech processing technologies, make the Computer Assisted Language Learning (CALL) become more and more welcome [9]. In addition, as China plays a more and more important role in the international community, most countries in the world have turned their attention to China, and began to pay attention to Chinese culture and learn the Chinese language. Nowadays, there is a surge of excitement in learning Chinese all over the world, and shortly, Chinese will go to the world and build a bridge of communication among people in the globe [10]. With the development of computer technology and the arrival of the information age, China is also actively promoting the informationisation of minority languages, and relevant research institutions have been set up like bamboo shoots after a spring rain for the past few years. As phonetics and computer technology develop, great achievements have been made in the research of Uyghur speech synthesis and recognition technology. However, there is still a long way to go in the research of speech recognition technology and computer-aided language learning system for SLA.

This study takes learners as the center, collects and records the relevant pronunciation of Putonghua in ethnic minority areas, and analyses the acoustic differences of different Mandarin learners in pronouncing Putonghua double vowels, and to the parties to learn more about their language learning methods and processes, to grasp the causes of the differences. The remainder is organised as below: Section 2 introduces the database; Section 3 analyses the experimental results; Section 4 deals with the similarity analysis; Section 5 summarises and makes prospects.

Experimental designs

2.1

Situation of the subjects

In this study, there are a total of 40 subjects in the diphthong acquisition experiment, including 30 learners. The other 10 are standard speakers from Beijing, whose Mandarin belongs to standard Mandarin, with an average age of 23.

According to their Putonghua proficiency, the 30 learners are divided into three groups as learners at different learning stages. Learners’ Putonghua proficiency (score) is based on a Putonghua proficiency test platform (MHK score), which is very authoritative. The Chinese proficiency of ethnic minorities in Xinjiang is mainly based on the MHK score. The detailed grouping is shown in Table 1.

Table 1

Details of subjects’ grouping.

Grouping of subjects	Number of subjects	MHK results	Proficiency
Primary level group	10	Class III B	Beginner
Intermediate level group	10	Grade 3A	Intermediate
Advanced level group	10	Grade 4A	Advanced
Standard group (reference group)	10	–	Standard

2.2

Speech acquisition

The experimental voice is collected in a special recording studio, with notebooks, external sound cards, microphones and some interconnected data cables. The external sound card can control the sound volume, denoise, and supervise burst sound and so on. Matlab compiles the recording programme, with the voice acquisition frequency of 16 kHz. The recording work lasts from March 2017 to October 2017.

The subjects’ reading materials included 500 Chinese monosyllables and 300 continuous Chinese language flows. A total of 500 Chinese single bytes contain double vowel single bytes. Each student needs 4–5 h of recording.

2.3

Voice data processing

After the subjects’ voice is collected, it needs to be processed in a series of ways, as follows:

(1) Speech tagging: Forced alignment of Putonghua speech of collected learners and standard speakers, the automatic segmentation system used in this study can divide the speech into phoneme levels, and the accuracy of the automatic segmentation system is 93.92% within the error-limited 20 ms [11]. Because this method aims at the recording data of standard speakers, the learners’ speech tagging error is very large and needs to be checked manually. To obtain the accuracy of the speech feature parameters and the reliability of the research, we repeatedly check to ensure that the correct rate of labelling is >95%.

(2) Extracting subjects’ speech: according to the marked single-byte speech, the number of double vowels extracted from each group is about 380 for ɑi/ai/,180 for [ei]/ei/, about 435 for ao/au/, about 340 for [ou]/ou/and about 75 for iɑ/i A/. For [ie]/i/•/, there are about 240, for uɑ/u A/, there are about 100, for [uo]/uo/, there are about 365, and for üe/y•/, there are about 105. After we extract the vowel data from the speech annotation file, we begin to calculate the acoustic feature parameters of the vowels.

(3) The praat speech analysis software is used to extract the first two resonance peaks (F1, F2) for the resonance peak parameters used in this study. Because of the continuity of the double vowels, we extract the resonance peaks of 10 points at equal intervals for each double vowel. And to reduce the error of each pronunciation sample, we convert the formant frequency value (Hz) into Barker value (bark). The Barker value (bark) is an auditory equivalent. The general formula is expressed as follows. (1) $Bark = 7 ln {(f / 650) + {〈 {(f / 650)}^{2} + 1 〉}^{\frac{1}{2}}}$ {\rm Bark}=7\ln \left\{\left({f/650} \right)+\left\langle \left({f/ 650} \right)^{2} +1\right\rangle^{\frac{1}{2}} \right\}

(4) Draw the formant trajectory map and acoustic vowel diagram, and use the analysis of variance and Euclidean distance to analyse the experimental results.

Flow chart and result analysis of the experiment

What needs to be studied in this experiment is the process of learners’ acquisition of Putonghua diphonetics. The flow chart of the whole experiment design is shown in Figure 1.

Acoustic vowel map of premonitory vowels for learners with primary level and standard speakers.

Acoustic vowel map of post-sound vowels for learners with primary level and standard speakers.

3.1

Analysis of acoustic vowels

3.1.1

Experiment on the acquisition of Putonghua diphthongs by primary level learners

For female learners, the starting point of [ai] is higher F(1,171) = 25.3, p < 0.001, and the end position of [ai] forwards F(1,171) = 20.3, p < 0.001. For male learners, the starting point of [ai] is higher F(1,155) = 26.05 (p < 0.001) and back F(1,155) = 77.13, p < 0.001.

The female learners’ starting point in [ei] is higher F(1,85) = 10.3, p = 0.002. The starting point of male learners in [ei] is higher F(1,71) = 19.83, p < 0.001.

The female learners’ starting point of [ao] forwards F(1,197) = 20.4, (p < 0.001) and are somewhat higher F(1,197) = 29.7, p < 0.001, while their end position of [ao] forward F(1,197) = 43.2, p < 0.001. The starting point of male learners in [ao] is higher F(1,173) = 163.6, p < 0.001, and the position of the end point is also higher F(1,173) = 17.5, p < 0.001.

Female learners had a higher [ou] starting point, F(1,137) = 25.9, p < 0.001. The position of the end point is somewhat back F(1,137) = 8.01, p = 0.005. When male learners produce vowels [ou], the starting point is higher F(1,135) = 37.6, p < 0.001. The position of the end point is back F(1,135) = 13.5, p = 0.0003.

The starting point of female learners in [ia] is a little higher F(1,31) = 7.2, p = 0.01. At the end point, there is a slight deviation F(1,31) = 7.8, p = 0.009. The male learners forward slightly at the end point of [ia] F(1,27) = 10.2, p < 0.001.

For female learners, the starting point of [ie] is at the front, F(1,95) = 6.5, p = 0.01 and a little higher F(1,73) = 9.9, p = 0.002. At the end point, there is a little bit behind F(1,75) = 82.2, p < 0.001, which is basically the same as that of the girls with standard speakers. When male learners pronounce the [ie] sound, the starting position is somewhat forward F(1,95) = 7.5. p = 0.007, and the end position is at the front F(1,95) = 30.01, p < 0.001, and is lower F(1,95) = 95.7, p < 0.001.

The end point of female learners’ pronunciation of [ua] is basically the same as that of female standard speakers’ pronunciation of [ua]. The starting point is a little bit behind F(1,39) = 4.9, p < 0.03. When male learners pronounce [ua] sounds, the starting position is lower F(1,39) = 19.7, p < 0.001 and it is at the backward, F(1,39) = 12.8, p < 0.001. The end position is lower F(1,39) = 10.4, p = 0.002.

The starting position of female learners’ pronunciation is more front F(1,157) = 8.6, p = 0.004 while the final position is a little higher F(1,157) = 9.1, p = 0.003. The starting position of the male learners is a little higher F(1, d145) = 38.2, p < 0.001 and more backward F(1,145) = 22.04, p < 0.001. The starting position of female learners is more front F(1,49) = 7.9, p = 0.007. The starting position of male learners is a little higher F(1,41) = 333.9, p < 0.001, and is more front F(1,41) = 1044 and p < 0.001. The end position is a little higher F(1,41) = 243.1, p < 0.001 and more front F(1,41) = 588, p < 0.001.

In general, there are obvious differences in the pronunciation of [ai], [ao], [ou], [ua] for female learners while in the pronunciation of [ai], [ao], [ou], [ua], [uo] for male learners.

3.1.2

The experiment on the acquisition of Mandarin diphthongs of intermediate level learners

Figures 4 and 5 are the vowel charts for intermediate level learners and standard speakers. The starting position of [ai] for female learners is a little higher F(1,189) = 40.6, p < 0.001 and backward F(1,189) = 24.6, p < 0.001; the end position of [ai] is a little lower F(1,189) = 31.9, p < 0.001, meanwhile, more front F(1,189) = 20.3, p < 0.001. The starting position of [ai] for male learners is a little higher F(1,189) = 54.1, p < 0.001 and backward F(1,189) = 103, p < 0.001; the end position of [ai] is a little higher F(1,189) = 71.1, p < 0.001.

Vowel chart of front vowel acoustics for intermediate level learners and standard speakers.

Vowel chart of back vowel acoustics for intermediate level learners and standard speakers.

The starting point of [ei] for female learners is a little higher F(1,75) = 41.2, p < 0.001; the end position of [ei] is backward F(1,75) = 5.1, p = 0.03. The starting point of [ei] for male learners is a little higher F(1,75) = 11.6, p < 0.001, meanwhile, backward F(1,75) = 17.8, p < 0.001; the end position of [ei] is more front F(1,75) = 10.09, p = 0.002.

The starting point of [ao] for female learners is more front F(1,215) = 92.5, p < 0.001 and a little higher F(1,215) = 16.5, p < 0.001; the end position of [ao] is more front F(1,215) = 17.4, p < 0.001, however, the rise and fall are basically the same as that of female standard speakers. The male learners pronounce [ao] very differently from male standard speakers. The starting point’s first formant is F(1,207) = 39.4, p < 0.001 while the second formant F(1,207) = 2.8, p = 0.098. The end point’s first formant is F(1,207) = 9.3, p = 0.003 while the second formant is F(1,207) = 8.3, p = 0.004.

The female learners pronounce [ou] basically the same as female standard speakers, only the end position is more front F(1,165) = 5.7, p = 0.02. When the male learners pronounce [ou], the starting point is a little higher F(1,165) = 37.6, p < 0.001 and more front F(1,165) = 8.81, p = 0.003; the end position of [ou] is basically the same as that of male standard speakers.

The starting point of [ia] for female learners is more front F(1,39) = 4.8, p = 0.04. The end position is basically the same as that of female standard speakers. The pronunciation of [ia] for male learners is basically the same as that of male standard speakers, only the end position is a little higher F(1,39) = 10.2, p = 0.003.

The starting point of [ie] for female learners is more front F(1,75) = 6.5, p = 0.01, and a little higher F(1,73) = 9.9, p = 0.002. The end position is a little backward F(1,75) = 82.2, p < 0.001, and the rise and fall is basically the same as that of female standard speakers. When male learners pronounce [ie], the starting position is more front F(1,111) = 34.1, p < 0.001; the end position is more front F(1,111) = 53.1, p < 0.001, and a little lower F(1,111) = 14.9, p < 0.001.

The starting point of pronunciation [ua] for female learners is basically the same as that of female standard speakers. The end position is a little higher F(1,49) = 27.7, p < 0.001. For Figure 3–23, when the male learners pronounce [ua], the starting point is a little lower F(1,45) = 8.7, p = 0.005 and a little backward F(1,45) = 7.8, p = 0.008; the end position of [ua] is basically the same as that of male standard speakers.

The starting position of pronunciation [uo] for female learners is a little backward F(1,155) = 14.2, p < 0.001. The end position of [uo] is basically the same as that of female standard speakers. The pronunciation of [uo] for male learners is basically the same as that of male standard speakers, only the end position is more front F(1,149) = 16.3, p < 0.001.

The starting point of pronunciation [üe] for female learners is basically the same as that of female standard speakers. The end position is a little backward F(1,47) = 23.1, p < 0.001. The starting position of pronunciation [üe] for the male learners is the same as that of male standard speakers. The end position of pronunciation [üe] is a little lower F(1,47) = 5.7, p = 0.002.

In general, there are obvious differences in the pronunciation of [ai], [ao], [ie], [ua] for female learners while in the pronunciation of [ai], [ao], [ie], [ua], [uo] for male learners.

3.1.3

The experiment on the acquisition of Mandarin diphthongs of advanced level learners

Figures 6 and 7 are the acoustic vowel charts for advanced level learners and standard speakers. Form 3 shows the similarity of Euclidean Distance.

Vowel chart of front vowel acoustics for advanced level learners and standard speakers.

Vowel chart of back vowel acoustics for advanced level learners and standard speakers.

The starting point of pronunciation [ai] for female learners is backward F(1,170) = 29.6, p < 0.001, and the end point is a little higher F(1,170) = 13.28, p < 0.001, and backward F(1,170) = 8.76, p = 0.0035. The starting tongue position for male learners is a little higher F(1,146) = 10.12, p < 0.00178, and backward F(1,146) = 38.54, p < 0.001; the end tongue position is more front F(1,146) = 11.8, p < 0.001.

The end tongue position of pronunciation [ei] for female learners is a little higher F(1,85) = 5.1, p = 0.026; the end tongue position of pronunciation [ei] for male learners is more front F(1,70) = 24.57, p < 0.001, and backward F(1,70) = 31.27, p < 0.001.

The end tongue position of pronunciation [ao] for female learners is more front F(1,196) = 10.53, p = 0.0013. The starting tongue position for male learners is backward F(1,162) = 20.6, p < 0.001, while the end point is more front F(1,162) = 19.94, p < 0.001.

The pronunciation [ou] for female learners is basically the same as that of female standard speakers, only the end position is more front F(1,136) = 5.7, p = 0.02. When male learners pronounce [ou], the starting point is a little higher F(1,132) = 37.6, p < 0.001, and more front F(1,165) = 8.81, p = 0.003.

The starting point of pronunciation [ia] for female learners is more front F(1,31) = 4.8, p = 0.04 while the end point is backward F(1,31) = 7.41, p = 0.01. The starting tongue position of [ia] for male learners is more front F(1,26) = 5.8, p = 0.03 while the end tongue position is more front F(1,26) = 6.27, p = 0.01.

The end tongue position of [ie] for female learners is more front F(1,31) = 7.41, p = 0.01. The starting point for male learners is more front F(1,26) = 19.8, p < 0.001.

When female learners pronounce [ua], [uo] and [üe], there is no obvious difference in the starting point and final point. When male learners pronounce [ua], the starting position is backward F(1,37) = 7.42, p = 0.0098; the end tongue position is lower F(1,37) = 4.857, p = 0.03, and more front F(1,37) = 11.36, p = 0.001. The starting tongue position of pronunciation [uo] for male learners is a little lower F(1,145) = 4.857, p = 0.03, and more front F(1,145) = 11.36, p = 0.001. The starting point of pronunciation [üe] for male learners is more front F(1,41) = 8.03, p = 0.0071.

Generally speaking, girls have significant differences in [ai] and [ua], while boys have significant differences in [ai], [ei], [ua] and [uo].

Similarity analysis

Euclidean distance is the most easily understood method to calculate distance, and it is also one of the commonly used measures to measure the degree of similarity [1]. The size of the Euclidean distance reflects the degree of similarity between the two factors. The smaller the number of Euclidean distance is, the smaller will be the difference between the two factors are [12, 13]. The Euclidean distance formula between two points a(x₁,y₁) and b(x₂,y₂) on the two-dimensional plane is shown in 2: (2) $d_{12} = \sqrt{{(x_{1} - x_{2})}^{2} + {(y_{1} - y_{2})}^{2}}$ d_{12} =\sqrt{(x_{1} -x_{2})^{2} +(y_{1} -y_{2})^{2} }

Taking formant as an acoustic feature, this paper calculates the Euclidean distance between the vowels of learners and standard speakers to compare and analyse the acquisition trajectory of Putonghua double vowels. Tables 2–4 show the Euclidean distance between the double vowels of learners at primary, intermediate and advanced levels and standard speakers, respectively.

Table 2

Euclidean distance between the primary level learner and the standard speaker.

Female	Initial	Ending	Male	Initial	Ending
[ai]	0.72	1.13	[ai]	1.21	0.44
[ei]	0.92	1.15	[ei]	0.82	0.21
[ao]	0.71	0.39	[ao]	0.94	0.28
[ou]	0.61	0.67	[ou]	1.25	0.72
[ia]	0.68	0.79	[ia]	0.39	0.58
[ie]	0.88	0.01	[ie]	0.31	1.33
[ua]	2.04	1.12	[ua]	1.81	0.62
[uo]	0.42	0.45	[uo]	1.18	1.02
[üe]	0.68	0.76	[üe]	0.25	0.33

Table 3

Euclidean distance between intermediate learners and standard speakers.

Female	Initial	Ending	Male	Initial	Ending
[ai]	0.83	1.36	[ai]	1.22	1.01
[ei]	0.51	0.61	[ei]	0.87	0.31
[ao]	0.93	0.27	[ao]	0.72	0.83
[ou]	0.52	0.39	[ou]	1.11	0.23
[ia]	1.30	0.63	[ia]	0.26	0.62
[ie]	0.54	0.37	[ie]	0.29	1.46
[ua]	2.96	1.41	[ua]	1.93	0.68
[uo]	0.21	0.62	[uo]	1.21	0.87
[üe]	0.40	0.19	[üe]	0.67	1.07

Table 4

Euclidean distance between advanced level learners and standard speakers.

Female	Initial	Ending	Male	Initial	Ending
[ai]	1.02	1.52	[ai]	1.27	0.84
[ei]	0.65	0.91	[ei]	0.84	0.85
[ao]	1.01	0.53	[ao]	1.02	1.98
[ou]	0.57	0.64	[ou]	1.09	0.58
[ia]	0.98	0.62	[ia]	0.95	0.99
[ie]	0.52	0.80	[ie]	0.93	0.83
[ua]	2.17	1.35	[ua]	2.56	1.16
[uo]	0.36	0.68	[uo]	1.43	1.27
[üe]	0.38	0.70	[üe]	0.91	0.34

As can be seen from Table 2, there is little difference between [ao] and [üe] during the acquisition of Putonghua double vowels by primary-level learners. Females have a large difference in ending segments of forward vowels [ai] and [ei], miss some syllables, and only read the part with clear pronunciation, which make the pronunciation to hear like the unit vowel [a] or [e]. There is also a great difference in the late vowel [ua], and they are more difficult to pronounce [ua]. Males have great differences in the pronunciation of prevowel [ai] and [ou]. The initial segments of the post-sound vowels [ua] and [uo] are quite different, and some boys only pronounce the clear and resounding part.

It can be seen from Table 3 that in the process of intermediate-level learners acquiring Putonghua double vowels, girls: have little difference in pronunciation of [ao], [ou], [ie], [uo] and [üe]. The main acquisition differences are shown in the ending segment of the forward vowel [ai] and the initial segment of the double vowel [ia]. There are obvious differences between the initial and ending segments of the double vowel [ua], so it is difficult to pronounce it. Boys: the pronunciation difference of [ei], [ao] and [ia] is small. The main difference is that it is difficult to pronounce the forward vowel [ai]. The initial segments of double vowels [ua] and [uo] are quite different, and their pronunciation is similar to that of single vowels.

It can be seen from Table 4 that in the process of intermediate level learners acquiring Putonghua double vowels, there are great differences in Females’ pronunciation of [ai] and [ua]. Males have great differences in [ai], [ao], [ou], [ua] and [uo], and have difficulty in pronunciation.

To sum up, whether male or female, learners of three-level proficiency acquire the worst pronunciation of double vowels [ua], which is significantly different from that of standard speakers. The acquisition of double vowels [üe] is the best. Among the three levels, the acquisition of male learners is worse than that of female learners, especially when pronouncing [uo], boys of all three level proficiency have difficulties. Among the three-level learners, with the improvement of the level of Putonghua, both the quality of their pronunciation and the acquisition become better and better.

Summary and prospects

The study of SLA is a hot topic. Both at home and abroad, good achievements have been made in the study of SLA both technically and theoretically. This paper mainly explores the learning trajectory and pronunciation quality of Putonghua double vowels by using phonetics analysis method.

Based on the characteristics of experimental phonetics, the law of phonetic learning is studied through acoustic analysis. In this paper, the characteristic parameters of the speech signal (formant frequency parameters) are extracted to obtain the acoustic vowel map of Putonghua double vowels acquired by learners. By calculating the difference (Euclidean distance) between learners and standard speakers, this paper explores the dynamic development and changes of Putonghua diphthong in different stages. By visualising the double vowel space of learners at different stages of Putonghua learning, we can intuitively observe the dynamic characteristics of the spatial structure of Putonghua double vowels acquired by learners and observe the acquisition trajectory of Putonghua double vowels. This paper studies the pronunciation quality of Putonghua double vowels of Uyghur learners based on phonetics analysis. The study shows that the learning trajectories of different double vowels are also different. The results show that with the improvement of learners’ proficiency in Putonghua, their overall structure of double vowels is closer to the standard spatial structure of double vowels in Putonghua. The contrastive analysis theory is further verified by observing the acoustic vowels. The inter-language dynamically tends to the target language. When Uyghur students pronounce Putonghua double vowels, they may be interfered with by their mother tongue, so their vowel space is relatively high, slightly to the right. Whether male or female, learners of the three levels acquire the worst pronunciation of double vowels [ua], which is significantly different from that of standard speakers. Among the three levels, male learners acquire worse than female learners, especially when pronouncing [uo], men of three levels have difficulties.

eISSN:: 2444-8656
Idioma:: Inglés

Calendario de la edición:: Volume Open
Temas de la revista:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

RSS Feed de revista

A contrastive study on the production of double vowels in Mandarin

Publicado en línea: 08 abr 2022

Páginas: 655 - 664

Recibido: 28 mar 2021

Aceptado: 27 sept 2021

DOI: https://doi.org/10.2478/amns.2022.1.00006

Palabras clave
Formant frequency characteristics, acoustic analysis, ANOVA data testing, cross language contrastive analysis

© 2022 Kai Yu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

A contrastive study on the production of double vowels in Mandarin

Publicado en línea: 08 abr 2022

Páginas: 655 - 664

Recibido: 28 mar 2021

Aceptado: 27 sept 2021

DOI: https://doi.org/10.2478/amns.2022.1.00006

Palabras claveFormant frequency characteristics, acoustic analysis, ANOVA data testing, cross language contrastive analysis

© 2022 Kai Yu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Palabras clave
Formant frequency characteristics, acoustic analysis, ANOVA data testing, cross language contrastive analysis