Scale-free and Small-world Networks in Word Associations

Introduction

The notable theoretical linguist Noam Chomsky (1957) described language as a set of infinite combinations constructed from a finite set of linguistic units. However, these combinations are not formed randomly or unsystematically due to the lexico-syntactical rules and collocations in real-life language use. Along with the rise of quantitative methodology in linguistics, more statistical evidence has proven the mutual information in the co-occurrence of words. This report aims to present a review of the article The small world of human language (Canchon & Solé, 2001) and how the findings in physics interact with the disciplinary knowledge in linguistics.

Overview

Canchon and Solé (2001) proposed that the concept of word associations could be regarded as a graph of word interactions in complex systems, namely small-world (SW) networks and scale-free (SF) networks. The analyzed language data from British National Corpus have demonstrated identical statistical properties to the complex networks, including clustering coefficient and short average distance in SW networks, the growing property and the power-law degree distribution in SF networks (theoretically more commonly known as Zipf’s Law). The connectivity distribution for kernel word networks (KWN) has further discussed the word associations in smaller lexicons. They also predicted that the disconnection of most connected edges could indicate symptoms of both Broca’s aphasia and Wernicke’s aphasia.

SW properties of Human Languages

The primary quantitative approach to predicting the networks of a lexicon is to assume the presence of a link between every pair of co-occurring words. If the links are considered edges in a network, the words would be nodes. The shorter the distance between nodes is, the more correlated the words are, indicating that there would be a higher probability for them to co-occur. Most of the short-distance links with the value of one or two are based on syntactic rules, such as adjectival noun phrases (e.g. ‘green tree’), determiner phrases (e.g. ‘the cat’) or predicates (e.g. ‘buy food’). The graphs generated by two consecutive words and all the connected words are found to demonstrate SW properties: they are highly clustered, and their average path length is small. These features contrast with random graphs (non-clustered and small ) and regular lattices (clustered and large ). Their study calculated the word network patterns using the unrestricted word network (UWN) and the restricted word network (RWN) in BNC. They have shown consistent results with an average minimum distance below 3, while random graphs would be above 3 in both networks. The clustering coefficients are 0.687 and 0.437, respectively, far higher than the value of 1.55 x 10^-4 in random graphs. It implied that any word in the lexicon could be attained with an average of fewer than three lexical items in between.

SF properties of Human Languages

Canchon and Solé (2001) reported that the feature corresponds to the robustness of randomly chosen nodes and the high fragility of highly connected nodes against perturbations. Their analysis has exhibited the exponents of the degree distribution in UWN and RWN as γ=-1.50 and γ=-2.70, respectively. The value of RWN connectivity could be aligned to the Barabási-Albert model (γ_BA=-3), which incorporated preferential attachment to scale-free distributions. With this rule, it could be supposed that new nodes/words are preferentially attached to an existing node/word proportional to that node/word’s degree. The phenomenon agrees with actual language use, as neologisms usually obey syntactic rules and still occur with highly collocated words like functional words. For example, the neologism that emerged during the COVID-19 pandemic ‘coronacation’ (blending of ‘corona’ and ‘vacation’) follows the constraints of its noun stem ‘vacation.’ Thus, the high-degree and frequently collocated verb ‘have’ would more probably connect to it than other verbs (e.g. ‘do’) or adverbials (e.g. ‘happily’), as in ‘I’m having a coronacation’.

            Canchon and Solé (2001) further proved the network properties using the kernel word network (KWN), which contains a smaller lexicon of general speakers in the community with high-frequency words. It consists of the 5000 most interconnected edges in RWN. The results would be more accurate as it principally agrees with the notion of Zipf’s Law, demonstrating real-life language use, and it excludes less commonly used terminologies and jargon. With the power-law tail exponent of γ_KWN=-3.07, every word in KWN is correlated to 24% of the remaining kernel words. They considered that this lexicon of a few thousand words could express everything or almost everything in everyday language. It is compatible with the typical saying in linguistics and language pedagogy that 20% of the words in a language make up 80% of the texts, based on this power-law relation in Zipf’s Law.

            The finding is beneficial for language acquisition and pedagogy, as it directs to a more effective and realistic approach of ‘user-specific language study,’ in which a list of kernel words that are essential for daily conversations is focused. It is impossible to acquire 100% of the vocabulary in a language, and it is also not always necessary to understand all vocabulary to comprehend a text or a speech. The knowledge of high-degree words covering 80% to 90% would be sufficient.

Further Applications of Graph Theory

            Although the research is yet to be a heated topic in physics or linguistics, the findings are significant to understanding the mental processing of speech, speech production and speech complexity. Given the proper syntactic structure, words could be retrieved and articulated with a shorter duration during speech production if it is closely connected to the previously produced lexis. For example, providing a structure of verb phrase (VP) followed by adjective phrase (AdjP), if the VP is ‘is’, retrieving the adjective ‘special’ would be quicker than ‘extraordinary’ as ‘special’ is connected with ‘is’ in a shorter distance. Reversely, when the hearer comprehends a word, it would be easier and faster to process the frequently used word at a shorter distance mentally. In other words, the higher the degree of a lexical item, the higher its availability for production and comprehension. Overall, this phenomenon could be regarded as ‘frequency or recency effect.’

            With the quantitative evidence provided by the complex network properties, it could be deduced that the more small-degree words are used, the more complicated the discourse is to articulate or comprehend. The concept is practical for speakers or writers to produce speeches or texts that are mentally more accessible to the audience. For instance, the cause of why the literary genre stream of consciousnesses is often demanding to comprehend is the longer path length between concepts/nodes/words. The distance between ‘sofa’ and ‘house’ is undoubtedly shorter than between ‘sofa’ and ‘war.’

            Finally, Canchon and Solé (2001) related the SW properties with the symptoms of aphasia. If the speech produced by an individual derives from a highly clustered and small pattern, it could be due to either self-conscious speed-up navigation or, in an unfortunate case, navigation deficits caused by aphasia. They could be characterized by removing highly connected words/nodes in speech, mainly functional words (e.g. ‘and,’ ‘the,’ ‘of’). Patients of Broca’s aphasia tend to omit those lexical items, leading to long pauses and influent speech. On the other hand, patients with Wernicke’s aphasia substitute high-degree words with ungrammatical or inappropriate lexical items despite being fluent. If an AI is developed to form a network of one’s speech, it could be possible to apply the SW properties in diagnosing aphasia.

Discussion

To extend the study by Canchon and Solé (2001) using a corpus linguistic approach, a corpus of Mandarin Chinese, the Lancaster Corpus of Mandarin Chinese (LCMC) (McEnery et al., 2003), is incorporated instead of the BNC to examine if similar degree distribution would be found in a language with different topology, lexical composition and syntactic rules.

Table 1 The ten words with highest degree

LCMCBNC (Canchon & Solé, 2001)
的,是,在,和,他,不,我,个,有,这and, the, of, in, a, to, ’s, with, by, is

The ten most connected words in LCMC have shown to be very distinct from those in BNC in that they include copula, pronouns and possessive ‘have’ (see Table 1). The connectivity distribution of LCMC’s KWN is similar to that in BNC, implying a power-law relation (see Figure 1).

Figure 1 Connectivity distribution for the kernel word network (KWN) in LCMC

Conclusion

            The application of graph theory and complex networks in linguistics is yet to be explored. The discoveries could be promising, especially in developing computational networks used in natural language processing. The evidence of degree distributions, word distances and the methodology of KWN could also allow us to understand the lexicon composition better, leading to more well-formed linguistic theories.

References

Cancho, R. F. I., & Solé, R. V. (2001). The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482), 2261-2265.

Chomsky, N. (1957). Syntactic Structures (Januar Linguarum, Series Minor 4). The Hague: Mouton.

McEnery, T., Xiao, R., & Mo, L. (2003). Aspect Marking in English and Chinese: Using the Lancaster Corpus of Mandarin Chinese for Contrastive Language Study. Literary and Linguistic Computing, 18(4), 361-378.