Correctness and Fluency across Speech Styles: A Study on the Speech Errors made by Cantonese Speakers

INTRODUCTION

Background

In the previous decades, speech production and processing have been well-established linguistic topics. Much has been done to map the evidence elicited in speech errors and the mental processing during discourse production. Along with the intensive growth of corpus linguistics, the slips in raw spoken data have been a gateway to studying language performance and actual language use. Speech errors among children and patients with language disorders have played a significant role in investigating early language acquisition and clinical linguistics.

Literature Review

I. Speech errors

According to Boomer and Laver, speech errors, commonly known as the slips of the tongue, can be defined as the ‘involuntary deviation in performance from the speaker’s current phonological, grammatical or lexical intension’ (1968, p. 123). Further features such as nonhabitual (Dell, 1986) and non-pathological (Zhang, 1990) were later apprehended to emphasize that the presence of speech errors concerns one’s language performance instead of one’s language competence. Scholars have investigated the psycholinguistic effects on slips (MacKay, 1970; Shattuck-Hufnagel, 1987) and their relationship with speech production models (Dell, 1986; Fromkin, 1973; Levelt, 1989). It is evident that speech errors could occur on all morphological, phonological, semantic and syntactic levels, from segmental to constituent (Carroll, 2008).

Fromkin (1973) categorized speech errors into eight main types: shift, exchange, anticipation, perseveration, addition, deletion, substitution and blend (explanations see appendix B). Shen (1992) and Chen (2001) classified the slips in Chinese similarly. They can also be divided into selection and assemblage errors (Aitchison, 2008). Although the errors are alike cross-linguistically, their patterns could be irregular due to different phonological constructions, syntactic structures and typology. From previous studies on contour tonal language like Cantonese (Alderete et al., 2019), tone slips can also be expected in various forms such as substitutions and blends.

II. Hesitations

The occurrence of pauses strongly relates to the cognitive demand during speech planning and encoding. Henderson et al. (1966) first introduced production as a cycle of hesitations and fluency. Hesitations could be divided into unfilled pauses (silent pauses) and filled pauses (fillers, parenthetical remarks, vowel lengthening) in spontaneous speech (Boomer, 1965). The length and frequency of pauses, especially unfilled ones, have been a parameter determining one’s fluency or proficiency (Kowal et al., 1975) and are also influential in speech comprehension by hearers (See Appendix B for commonly used Cantonese fillers).

III. Self-repairs

It is unmistakable that speakers could monitor their utterances and detect errors in them. Levelt (1983) suggested a theoretical framework for the corrections speakers made regarding the misarticulations known as self-repairs. He proposed that the procedure consists of three parts: self-interruptions, usage of editing expressions (e.g. 吖aa1 ‘ah’, 唔係m4hai6 ‘no’ and 即係zik1hai6 ‘that is/I mean’ in Cantonese) and self-repairs. The final part could be distinguished as instant repairs (revisiting a single error and editing it with the proper word), anticipatory retracing (repeating the articulations before the error) and fresh starts (abandoning the initial syntactic structure and forming a new utterance).

Aims of the study

So far, limited research has been dedicated to examining the slips of the tongue that occurred in adults' discourses of Cantonese, an under-studied language (Alderete, 2022). The dissimilarities between careful and casual speech registers have been demonstrated in many languages, especially Indo-European languages. Nevertheless, most believe that the current research, skewed toward ‘major’ languages, might not reflect genuine linguistic diversity (Anand et al., 2011). More could be done to test hypotheses in the Cantonese setting. The present study aims to discover patterns in both planned and spontaneous speeches by L1 Hong Kong Cantonese speakers regarding misarticulations, pauses and self-corrections. This paper attempts to answer the following questions:

1. What are the distributions of speech errors across casual speech and careful speech? What are the reasons behind them?

2. What are the distributions of hesitations and self-repairs across casual speech and careful speech? What are the reasons behind them?

3. Do the patterns suggest certain features of mental processing or language-specific characteristics of Cantonese?

METHODOLOGY

Data collection

Eight participants are recruited to collect casual and careful speech data (see appendix C). Their demographics would function as controlled variables. All of them are in the identical age group (18-22 years old) and have similar educational backgrounds (undergraduate students born, raised and educated in Hong Kong). The gender is evenly distributed (four males, four females).

All participants are required to complete three tasks (see Appendix A). The experiments are conducted via the video conferencing tool Zoom. Apart from the requisite audio recorded by the participants using their mobile devices, the Zoom recording function is also activated to provide backups.

Task 1 consists of a reading-aloud text expected to be articulated in about two minutes (see Appendix A.1). The text containing 400 Chinese characters is an excerpt from Hong Kong Diploma of Secondary Education (HKDSE) Liberal Studies material on the social issue ‘Conscientious Consumption’. This task seeks to examine and analyze their articulations and fluency of written Chinese (書面語) in a formal, academic-like context. Participants are anticipated to be more cautious and attentive in their verbal presentations. The text has a type-token ratio (TTR) of 60.0% (number of types: 162, number of tokens: 272), indicating its higher lexical density and syntactic complexity as a written text. However, the complexity is only at the high-school level, not as high as university-level academic texts, which usually have a TTR of around 70%.

Task 2 would be a timed picture-elicited narration (see Appendix A.2), in which subjects will be provided 10 seconds to look at the picture before they begin their narratives about it for about three minutes. The picture is selected from a test paper designed for the Territory-wide System Assessment (TSA) at a primary-six level. The imaginative stories produced in spoken Cantonese (口語) could assist in studying the patterns in a casual and informal narrative speech style.

Task 3 belongs to the most casual type of speech production in colloquial Cantonese (see Appendix A.3). Participants will be asked to produce a 3-minute speech on a given topic of what they enjoy doing in their free time. They would not be offered time to plan their answers. No personal details nor academic knowledge is required to respond to this general question. The discourse would be spontaneous and free flow, which is significant in determining the fluency of everyday language use.

Data analysis

The data would serve as a mini corpus. All data would be transcribed using the CHAT format of the Computerized Language ANalysis (CLAN) tool developed for the TalkBank corpus project (MacWhinney, 2000). The study is quantitative based. The occurrences of annotated speech errors, hesitations, and self-repairs would be categorized and calculated in divisions of the three speech formality and complexity levels. If given a consistent pattern of the distributions, a detailed qualitative analysis would be incorporated to investigate the possible reasons behind individual slips. The overall TTR of speeches would also be computed and contrasted between tasks to reconfirm the potential relation between lexical density and fluency.

RESULTS

The data collected have reconfirmed the lexical density and thus the complexity of the three tasks. Task 1, with a TTR of 60% as aforementioned, is the most complex and formal style of speech. Task 2, with an average TTR of 43.76%, ranked second in terms of formality (see Figure 1). The average TTR in Task 3 is 38.54%, verifying that it contains the most casual and informal speech. The following parts will directly regard the three levels as formal (Task 1), informal (Task 2) and casual (Task 3).

Figure 1 Type-token ratio (TTR) of Tasks 2 and 3 in descending order

(Data from Appendix D Table 1)

Speech errors

Among all speech styles, errors are present in forms of substitution (27%) most frequently, while exchange (1%) is the least common (see Figure 2). However, when different speech styles are compared, it is surprisingly found that casual speeches contained the highest amount of speech errors (48.1%; 115 occurrences) and formal ones the least (15.5%; 37 occurrences). The distributions of errors also vary (see Figure 3). Deletion is the major source of errors in casual, spontaneous speech (25.2%; 29 occurrences), proceeded by shift and blend. In the formal context, substitution errors were dominant (59.4%; 22 occurrences). The distribution of deletion and substitution in informal speeches was identical (28.7% each; 25 occurrences each). It could be deduced that the more formal the speech style is, the more substitution errors were made, whereas the more casual the speech style is, the more deletion errors would take place (See Appendix E for the complete list of speech errors divided by tasks).

Figure 2 Distribution of different types of speech errors

Figure 3 Distribution of speech errors across the three tasks

(Data from Appendix D Table 2)

Hesitations

Short and unfilled pauses occurred most frequently in all speech styles (See Figures 4 and 5), while very long pauses were the least common. No filled pauses were found in formal speeches. The distribution of filled hesitations in informal and casual speeches was similar. In total, more hesitations happened during casual speech (640 occurrences). A limited number of pauses (131 occurrences) appeared in formal speech. Among the fillers, typical ones include 誒 (e6), 呢 (ne1), 即係 (zik1hai6), 咁 (gam2), ‘hm’, ‘em’ and ‘um’. Their occurring locations vary, such as 誒 (e6) and 即係 (zik1hai6) and 咁 (gam2) are mostly utterance-initial, while 呢 (ne1) is utterance-final and ‘hm’, ‘um’, ‘er’ and ‘em’ usually occur intra-utterance. Another observation from the data worth noting is the presence of suprasegmental/paralinguistic hesitations, including the ‘tsk’ sound, throat-clearing, chuckles and laughs, and gasps.

Figure 4 Distribution of Pauses by length Figure 5 Distribution of Pauses by type

(Data from Appendix D Table 3) (Data from Appendix D Table 4)

Self-repairs

Self-repairs concern the corrections speakers made to their misarticulations. In Figure 6, instant repairs occurred most frequently in all three tasks (50.7%; 77 occurrences), followed by anticipatory retracing (32.9%; 50 occurrences) and fresh starts (16.4%; 25 occurrences). There is no fresh start observed in formal speech. The total number of self-repairs is the lowest in formal speech style (8.55%; 13 occurrences), primarily due to fewer speech errors. The occurrences in the two informal speech styles are alike (78 and 61 occurrences respectively), yet in Task 2 are slightly more (51.3%).

Figure 6 Distribution of self-repairs

(Data from Appendix D Table 5)

DISCUSSION

Speech errors

Overview

The data collected appear to be inconsistent with the dominant finding of formal speech styles containing more speech errors (Carroll, 1986). Most analyses stated the possible reason was due to the nervousness when addressing interlocutors in formal contexts, also known as situational anxiety. An explanation of Task 1 in this study containing the fewest errors could be that the speech elicitation conducted online did not fully portray the formal setting, minimizing the errors caused by nervousness. The following part will provide analyses of predominant speech errors in spontaneous speech, the distinct tendency of substitution occurrences and tonal substitutions in Cantonese.

Speech errors in Casual Speech (Deletions and Shifts)

As the dominant type of speech error in casual and informal speech styles, deletion errors, including phoneme and morpheme/lexical deletion, reflect the speakers’ goal of delivering continuous speeches (Morrison & Shriberg, 1992). They often occur when speakers are in the fluency cycle, in which they might slightly speed up to express intended messages. The word onset effect (Shattuck-Hufnagel, 1987) was also observed in the data (see Table 1). For instance, the onset glottal fricative [h] is deleted in 喺(hai2), 係(hai6), 之後(zi1hau6), 就係(zau6hai6) and 即係(zik1hai6). As Shattuck-Hufnagel (1987) suggested, the rationale is that the word onset segments are less crucial in determining the phrasal prosody than other segments, such as the peak in rhyme. In other words, hearers can have no problem comprehending the lexical item even if the onset is reduced. However, the effect is limited in careful speech as speakers focus more on their articulations than on self-expressing.

Table 1 Examples of the word onset effect

之au6	之後(hau6)
ak1	得(dak1)
ui5	會(wui5)
o2	咗(zo2)
就ai6	就係(hai6)
即ai6	即係(zik1hai6)

Shifts are widespread in spontaneous speeches at both phrasal and lexical levels (see Table 2). In most cases, the word order of the sentences with shifts is ill-formed, implying that they are not in the standard Cantonese SVO format (Matthews & Yip, 2013). Such errors suggest that the semantic meaning is established before constructing the syntactic structure as in the Serial mental processing Model (Fromkin, 1973). For example, phrasal shifts can result in the word order of OSV (e.g. 冇事發生囉我得閒嗰時係會). The OSV version is also known to be topicalization, which is reported to be processed more quickly (Matthews & Yeung, 2001). The faster processing required in spontaneous speech could account for the high tendency of shift errors. It is very seldom for shifts to occur in formal speech with reading materials provided.

Table 2 Examples of shifts

Phrasal shift:

冇事發生囉(我得閒嗰時係會)	我得閒嗰時係會冇事發生囉
(喺佢哋後面)佢哋擺低哂佢哋自己嘅&em隨身財物	佢哋擺低哂佢哋自己嘅&em隨身財物喺佢哋後面

Lexical shift:

(行山)又冇乜 (再) 點去	又冇乜點再去行山
都幾無奈 (好似)	都好似幾無奈

Distribution of Substitutions

Substitutions accounted for the most extensive proportion of speech errors in careful speech (59.4%), while the percentages are relatively low in informal (28.7%) and casual (14.8%) speech compared to other error types. Overall, the occurrences of semantical substitutions (59.3%) are negligibly more than phonological ones (see Table 3 for examples). When a lexical item/phoneme is processed, its semantic associations would also be activated, sometimes inaccurately obtaining higher activation than the intended item/unit and therefore being inserted into the proceeding slot (Wan & Jaeger, 1998). For instance, 嚟(lai4) and 來(loi4) are semantic equivalents used in spoken Cantonese and written Chinese respectively. They are also phonologically close, only differing in the peak diphthongs [ɐi] for lai4 and [ɔi] for loi4. Several phonological substitution errors can be related to the phonetic and semantic radical irregularity and mismatch in the Chinese characters (Hsiao & Shillcock, 2006). In the case of an error, the phonological radicals were activated during the pronunciation processing instead of the irregular and less frequent but correct phonetic form. For example, the phonetic radical 將(zoeng1) substituted the correct pronunciation of 鏘(coeng1).

Table 3 Examples of substitutions

Semantical substitutions: Phonological substitutions:

以嚟	以來
購物	購買
蔬菜	蔬果

心有餘kwai3	心有餘悸(gwai3)
鏗zoeng1	鏗鏘(coeng1)
heoi5哋	佢(keoi5)哋

Tonal substitutions in Cantonese

Tonal substitutions comprised 46% of the phonological substitutions from the collected data. Alderete et al. (2019) performed a large-scale study on Cantonese tone errors and concluded that tone errors are mostly contextual and phonologically selected with segments. Some tones could be easily confused with specific tones. Their study has estimated the phonetic distance between Cantonese tones and thus determined how common such tone errors would occur. The results from this study correspond with the confusion matrix they proposed; namely, tone 2 is frequently intruded by tones 6, 3 and 5, while tone 4 is usually substituted by tone 3 (see Table 4).

Table 4 Examples of Tonal substitutions

社wui5	社會(wui2)
jyu3 購買	如(jyu4)購買
個(go3)	嗰(go2)
係(hai6)	喺(hai2)
係(hai6)	喺(hai2)

Hesitations and Self-repairs

The results have illustrated the relationship between casualness and fluency of speech. They are in line with earlier findings that hesitations and self-repairs occur more frequently in spontaneous speech than in careful speech (Shriberg, 2001), indicating that the flow of the speech would be less fluent. Both filled (with hesitation markers) and unfilled pauses are found to be appearing regularly in the following environments (see Table 5):

Table 5 Environments of Filled and Unfilled Pauses and Examples

Short pauses(.): after a long utterance before a high-frequency word (with more word choices) before or after adjectivesafter speech errors (especially those with self-repairs)	希望佢哋可以盡快(.)幫你搵番部手機同埋你哋啲貴重物品啦(.) …衣著係灰色嘅(.)鴨嘴帽… … &誒紫 [//] 淺紫色嘅… …<覺得咁辛苦>[//]即係喺咁辛苦嘅壓力之下… <咁因(為)> [//] (.) 咁每次… <幫佢>[//](.)幫佢收埋…
Long pauses(..): after a long utterance before or after conjunctions before verb phrases	…亦都好鍾意演唱一啲唔同音樂劇嘅音樂啦(..) (..)daai6[: 但係]啲人都指出佢冇工作證咁樣做… 因為(..)點都會有少少嘢做嘅其實(.) 因為 (.) &誒呢個假期… 因為 &=gasps我比較鍾意… (.) &誒 set@s:eng up@s:eng 咗… 同朋友或者屋企人即係相聚… 又要&=gasps 預備上堂嘅嘢…
Very long pauses(...): before or after articulating a rare/low-frequency content word after a long and semantically dense utterance before beginning a new topic	…將呢個嘅賊人(.)繩之(.)於法(...) 咁而我最鍾意演奏嘅都係浪漫時期嘅音樂啦,包括有蕭邦啊&誒李斯特啊(..)等等咁樣啦(...) …啲人講點樣可以打得好啲咁樣(...),再(..)得閒啲即ai6 [: 即係]將上述嘅嘢全部做哂… 同自己相處 ,(..)係喇(..),跟住(..)&誒(..)得閒<我仲會>[//]我會打排球 &hm 咁 &=clears:throat 我自己&呢點解會鍾意唱歌&呢 …

Self-repairs appeared more frequently in informal and casual speech due to more speech errors occurring as the mental processing of spontaneous speech involves immediate retrieval from the speakers’ mental lexicon to produce utterances from scratch. Its processing is far more complex and mentally demanding than the formal speech provided with reading-aloud material, which only requires speakers to retrieve the phonetic segments (also tones) related to the orthographical characters. Speakers usually tend to repair instantly after the error since it is more mentally demanding to correct as a fresh start (see Appendix E Table 4 for examples).

CONCLUSION

The present study investigated the patterns in speech errors, hesitations and self-repairs across careful and casual speech styles in Cantonese. They have agreed with those in English to a large extent. Although the interaction between speech correctness and formality was inconsistent with previous scholarly works, a positive correlation between casualness and disfluency has been exhibited. The findings have pointed to the proposal that deletion errors increase with casualness, while the occurrences of substitutions increase with formality. However, more data and more well-rounded control variables in data collection, such as detail-planned laboratory conditions and more careful selection of tasks, should be incorporated to verify it in the future to minimize the effect of individual differences. The Cantonese-specific tonal substitutions were also briefly investigated, whereas future research could work on comparing them with other contour tonal languages.

References

Aitchison, J. (2008). The Articulate Mammal: An Introduction to Psycholinguistics (5th edition). London and New York: Routledge.

Alderete, J., Chan, Q., & Yeung, H. H. (2019). Tone slips in Cantonese: Evidence for early phonological encoding. Cognition, 191, 103952.

Alderete, J. (2022). Cross-Linguistic Trends in Speech Errors: An Analysis of Sub-Lexical Errors in Cantonese. Language and Speech. DOI: 00238309211071045.

Anand, P., Chung, S., & Wagers, M. (2011). Widening the net: Challenges for gathering linguistic data in the digital age (NSF SBE 2020: Future research in the social, behavioral, & economic sciences). https:// www.nsf.gov/sbe/sbe_2020/Abstracts.pdf

Boomer, D. S. (1965). Hesitation and grammatical encoding. Language and speech, 8(3), 148-158.

Boomer, D. S. & Laver, J. D. M. (1968). Slips of the tongue. In V. A. Fromkin (Ed.), Speech errors as linguistic evidence (pp. 120-131). The Hague: Mouton.

Carroll. D. W. (2008) Psychology of Language. Belmont, California: Wadsworth/ Thomson Learning.

Chen Y. (2002). Xiandai hanyu yuwu [Speech errors in modern Chinese]. Harbin, China: Heilongjiang Renmin Press.

Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93(3), 283-321.

Fromkin, V. A. (Ed.) (1973). Speech Errors As Linguistic Evidence. The Hague: Mouton.

Fromkin, V.A. (Ed.) (1980). Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand. New York: Academic Press.

Henderson, A., Goldman-Eisler, F., & Skarbek, A. (1966). Sequential temporal patterns in spontaneous speech. Language and Speech, 9(4), 207-216.

Hong Kong Examinations and Assessment Authority (2021). Territory-wide System Assessment Question Papers (Primary 6). Retrieved from https://www.bca.hkeaa.edu.hk/web/Common/res/2021priPaper/P6Chi/2021_6CSY2.pdf

Hsiao, J. H. W., & Shillcock, R. (2006). Analysis of a Chinese phonetic compound database: Implications for orthographic processing. Journal of psycholinguistic research, 35(5), 405-426.

Kowal, S., O’Connell, D. C., & Sabin, E. J. (1975). Development of temporal patterning and vocal hesitations in spontaneous narratives. Journal of Psycholinguistic Research, 4(3), 195-207.

Levelt, W. J. M. (1983). Monitoring and self-repair in speech. Cognition, 14(1), 41-104.

Levelt, W. J. M. (1989). Speaking: from intention to articulation. Cambridge, Mass: MIT Press.

MacKay, D. G. (1970). Spoonerisms: The structure of errors in the serial order of speech. Neuropsychologia, 8, 323–350.

Matthews, S. and Yeung, L. Y. Y. (2001) Processing motivations for topicalization in Cantonese. In K. Horie & S. Sato (eds), Cognitive- functional Linguistics in an East Asian Context (pp. 81–102). Tokyo: Kurosio.

Matthews, S., & Yip, V. (2013). Cantonese: A comprehensive grammar. Routledge.

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates.

Morrison, J. A., & Shriberg, L. D. (1992). Articulation testing versus conversational speech sampling. Journal of Speech, Language, and Hearing Research, 35(2), 259-273.

Shattuck-Hufnagel, S. (1987). The role of word onset consonants in speech production planning: New evidence from speech error patterns. In E. Keller & M. Gopnik (Eds.), Motor and sensory processes of language (pp. 17–51). Lawrence Erlbaum.

Shen, J. (1992). Kouwu leilie [Examples of speech errors]. Zhongguo Yuwen, 1992(4), 306-316.

Shriberg, E. (2001). To ‘errrr’ is human: Ecology and acoustics of speech disfluencies. Journal of the International Phonetic Association, 31(1), 153–169.

Tong Six (January 19, 2022). Yishi de liangzhi xiaofei xingqi [The rising of Conscientious Consumption in Food and Clothing]. In Tong Six, 453, p.6. Hong Kong: Hong Kong Economic Times.

Wan, I. P., & Jaeger, J. (1998). Speech errors and the representation of tone in Mandarin Chinese. Phonology, 15(3), 417-461.

Zhang, N. (1990). Yuyan yanjiu yu kouwu [Language research and speech errors]. Waiguo yu, 4, 29-33.

APPENDICES

Appendix B

Table 1 Major types of speech errors (Carroll, 2008, p. 195)

Table 2 Commonly used Cantonese fillers (Matthews & Yip, 2013)

咁 (gam2) ‘so’, ‘you see’	誒 (e6) ‘eh’
嗱 (naa4) ‘you see’	嗯 (ng6) ‘hm’
即係 (zik1hai6) ‘that is’, ‘I mean’

Appendix C

Table 1 Recording details and demographic information of participants

	Recording duration			Demographics
Participants	Task 1	Task 2	Task 3	age	gender
001	02:06	03:08	02:58	18	female
002	01:55	02:30	02:25	22	male
003	02:08	03:04	02:50	21	female
004	01:55	03:34	03:10	21	female
005	02:05	02:58	03:23	21	male
006	01:46	03:05	02:50	21	male
007	01:46	03:02	03:04	21	female
008	01:49	02:54	02:53	20	male

Appendix D

Table 1 Type-token ratio (TTR) of Tasks 2 and 3 among participants

Participants	Task 2	Task 3
001	46.3%	39.0%
002	39.3%	37.4%
003	45.5%	41.9%
004	42.1%	33.9%
005	42.3%	34.8%
006	50.9%	39.5%
007	40.8%	39.5%
008	42.9%	42.3%

Table 2 Distribution of Speech errors

Types of speech errors	Task 1	Task 2	Task 3	Total
Shift	1	10	24	35
Exchange	0	2	0	2
Anticipation	2	3	3	8
Perseveration	0	1	4	5
Addition	5	15	17	37
Deletion	6	25	30	61
Substitution	22	25	16	63
Blend	1	6	21	28
Total number of speech errors	37	87	115	239

Table 3 Distribution of Pauses by length

	Task 1	Task 2	Task 3
Short pauses	106	264	279
Long pauses	22	102	83
Very long pauses	3	25	16
Total number of pauses	131	391	378

Table 4 Distribution of Pauses by type

	Task 1	Task 2	Task 3
Unfilled pauses	131	391	378
Filled pauses	0	225	262
Total number of pauses	131	616	640

Table 5 Distribution of self-repairs

	Task 1	Task 2	Task 3
Instant repairs	8	37	32
Anticipatory retracing	5	27	18
Fresh starts	0	14	11
Total number of self-repairs	13	78	61

html, it will be parsed as HTML.

Share on

Twitter Facebook LinkedIn

Kassey Chang