class: inverse, center, middle <center>Virtual Psycholinguistics Forum, 10th August 2022 ## An Introduction to Network Analysis for Psycholinguists <br> ### Part 1: Introduction <br> ### Cynthia S. Q. Siew .large[Department of Psychology, National University of Singapore] Email: cynthia@nus.edu.sg Website: http://hello.csqsiew.xyz --- ## Outline <big> 1. What is Network Science? 2. What is a network made up of? 3. The mental lexicon as a network 4. Examples of language networks --- class: center, middle, inverse layout: false ## 1. What is Network Science? -- <big> Network Science uses techniques from Mathematics, Sociology, Physics, Computer Science, and other fields to model a complex system. A Complex System consists of independent units that interact locally in simple ways to produce unexpected global behavior. --- ### History of Network Science #### The birth of graph theory The [Seven Bridges of Königsberg](https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg) Problem: Is it possible to only cross each bridge *once* while visiting all islands? .center[![](img/bridges.png)] -- <br> Euler showed that there was no solution. He showed this by developing a new technique for analyzing networks (graph theory). --- ### History of Network Science #### Many fields use network analysis - 1930s: Sociometry (Moreno) - 1940-60s: Substantial interest in social psychology (Festinger, Milgram, Newcomb) - 1970s-present: Economic sociology (Granovetter, White, Uzzi) - 1990s: Social capital research (Putnam, Burt) - 2000s-present: Physicists jump in (Watts, Barabasi, Newman) --- ### What is a network and why is it cool Networks are .blue[simple]. - they are made up of two components: **Nodes** and **Edges** - nodes/vertices = individual units - edges/links/connections = relationships between units Networks are .green[everywhere] (once you know what to look for). - biological, economic, communication, technological, social, cultural networks <center>![:scale 45%](img/internet-net.png) .caption[Network of the internet, Mark Newman] --- ### What is a network and why is it cool Networks are .purple[useful]. - networks reveal *patterns* in systems - networks describe *relationships* and interactions - networks can *predict* "behaviors" or processes of the system Networks are .red[powerful]. - when combined with mathematical and discipline-specific theory, computing resources, and availability of big data <center>![:scale 40%](img/lotr.png) .caption[[Social network](https://moviegalaxies.com/) of LOTR characters.] --- class: center, middle, inverse layout: false ## 2. What is a network made up of? --- ### The Building Blocks of Networks .pull-left[ All networks have the following: - a set of nodes - nodes = vertices - a set of relations between the nodes - edges = links = connections Representational formats - <small>graphical approaches - pros: intuitive, visually compelling - cons: cannot do much with a picture - mathematical representations - pros: very powerful and informative because of its explicitness - cons: looks scary to people</small> ] .pull-right[ <center>![:scale 50%](img/communities.png)</center> ```r network <- matrix( data = sample(0:1, size = 25, replace = TRUE), nrow = 5, ncol = 5) network ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 1 1 0 0 ## [2,] 1 1 1 1 1 ## [3,] 0 0 1 0 1 ## [4,] 1 0 1 0 1 ## [5,] 0 1 0 1 0 ``` ] --- ### Caveat about edge types In network science, there are four types of edges: | |Undirected|Directed| |--|--|--| |Unweighted| facebook | twitter, instagram | |Weighted| facebook + number of interactions | twitter + likes | <br> ❗ In this tutorial we will deal mostly with simple networks that have **undirected, unweighted** edges. But you should keep in mind that edges can carry additional information as well and there are techniques for analyzing this information. --- class: center, middle, inverse layout: false ## How are networks mathematically represented? --- ### Adjacency Matrix Each row/column = 1 node Each cell = presence or absence of edge .pull-left[ ``` ## 1 2 3 4 5 6 7 8 9 10 ## 1 0 1 0 0 1 1 0 0 0 0 ## 2 1 0 0 0 0 0 1 0 0 0 ## 3 0 0 0 1 0 0 0 1 0 0 ## 4 0 0 1 0 1 0 0 0 0 0 ## 5 1 0 0 1 0 0 0 0 0 1 ## 6 1 0 0 0 0 0 1 0 0 1 ## 7 0 1 0 0 0 1 0 1 0 0 ## 8 0 0 1 0 0 0 1 0 1 0 ## 9 0 0 0 0 0 0 0 1 0 1 ## 10 0 0 0 0 1 1 0 0 1 0 ``` ] .pull-right[ ![](part1-intro_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] --- ### Edge List Each row = 1 edge between node X and node Y .pull-left[ <table class="table table-striped table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> V1 </th> <th style="text-align:right;"> V2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 6 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 15 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 21 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 22 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 8 </td> </tr> </tbody> </table> ] .pull-right[ ![](part1-intro_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] --- class: center, middle, inverse layout: false ## 3. The Mental Lexicon as a Network --- ## Representation and Process in Psycholinguistics Psycholinguistics is the study of **psychological processes** related to understanding, producing, remembering, acquiring language. But what is language? Language is a **system** of symbols and rules used for human communication. -- <br> <br> <center>If psycholinguists care about how people learn, recognize, and produce words and sentences, they probably also care about how linguistic units are represented in long-term memory. How can we leverage on network science methods to study memory structures that we can't directly observe? --- ### The Mental Lexicon as a Network The *mental lexicon* is generally accepted as the set of words and concepts known to an individual that is stored in that person's long-term memory. *Semantic memory* is the part of long-term memory that stores general facts and information about the world. There is a lot of structure within these systems! Linguistic units are not independent of each other, and have complex, varied relationships with each other. ![:scale 30%](img/speech1.png) ![:scale 30%](img/speech2.png) ![:scale 30%](img/communities.png) --- ### The Building Blocks of Language Networks What counts as a node? - phonemes, letters, words, syllables, morphemes, sentences - levels of analysis in psycholinguistics - needs to be consistently implemented across the same network -- What counts as an edge? - usually depicts *similarity* or *overlap* in features or *association* between words (a meaningful relation between linguistic units) - ideally theoretically driven (but could be exploratory as well) - what is a meaningful relationship to model? - what is the question you want to ask of the network? <center>![:scale 30%](img/network1.png) --- class: center, middle, inverse layout: false ## 4. Examples of Language Networks --- ### Language networks based on form-similarity Phonological network (Vitevitch, 2008) - nodes = words - edges = phonological neighbors - two words are phonological neighbors if they differ by the substitution, deletion, or addition of a single **phoneme** (Luce & Pisoni, 1998) - phonological neighbors of the word 'speech' include 'peach', 'speed', 'speak' -- Orthographic network (Siew, 2018) - nodes = words - edges = orthographic neighbors - two words are orthographic neighbors if they differ by the substitution, deletion, or addition of a single **letter** - orthographic neighbors of the word 'cat' include 'hat', 'cap', 'at' .footnote2[ Vitevitch, M. S. (2008). What can graph theory tell us about word learning and lexical retrieval?. 51(2): 408–422. Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear and Hearing, 19(1), 1-36. Siew, C. S. Q. (2018). The orthographic similarity structure of English words. Insights from network science. Applied Network Science. 3:13. ] --- ### Language networks based on semantics Free association network (De Deyne et al., 2020) - nodes = words/concepts - edges = associations - free associations: given the cue word "cat", you respond with the words "cat", "bone", "bark" -- Feature network (Engelthaler & Hills, 2017) - nodes = words/concepts - edges = overlap in semantic features - feature listing task: list the features associated with the concept "cat", you respond with "is an animal", "it meows", "has fur" (McRae et al., 2005) .footnote2[ De Deyne, S., Navarro, D. J., Perfors, A., Brysbaert, M., & Storms, G. (2019). The “Small World of Words” English word association norms for over 12,000 cue words. Behavior Research Methods, 51(3), 987-1006. Engelthaler, T., & Hills, T. T. (2017). Feature biases in early word learning: Network distinctiveness predicts age of acquisition. Cognitive Science, 41, 120-140. McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods, 37(4), 547-559. ] --- ### Language networks based on co-occurrences Co-occurrence network (Ferrer-i-Cancho & Sole, 2001) - nodes = words - edges = co-occurrence in corpora (i.e., the order of words) - given the sentence "I am Sam", the words "I" and "am" co-occur and are connected and the words "am" and "Sam" co-occur and are connected .footnote2[ Cancho, R. F. I., & Solé, R. V. (2001). The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482), 2261-2265. ] --- ### Language network structure affects processing #### Semantic network Well-connected words in the semantic network are better learned, recalled, and retrieved than words that are less well-connected. - this holds even after taking word frequency into account <center> ![:scale 30%](img/bark.png) ![:scale 30%](img/dark.png) </center> .footnote2[ Mak, M. H., & Twitchell, H. (2020). Evidence for preferential attachment: Words that are more well connected in semantic networks are better at acquiring new links in paired-associate learning. Psychonomic Bulletin & Review, 27(5), 1059-1069. De Deyne, S., Navarro, D. J., Perfors, A., Brysbaert, M., & Storms, G. (2019). The “Small World of Words” English word association norms for over 12,000 cue words. Behavior Research Methods, 51(3), 987-1006. ] --- ### Language network structure affects processing #### Phonological network Words with higher clustering coefficients are more slowly recognized than words with lower clustering coefficients in spoken word recognition tasks. - even though the *number* of phonological neighbors is the same <center> ![:scale 60%](img/hcc-lcc.jpg) </center> .footnote2[ Chan, K. Y., & Vitevitch, M. S. (2009). The influence of the phonological neighborhood clustering coefficient on spoken word recognition. Journal of Experimental Psychology: Human Perception and Performance, 35(6), 1934-1949. Chan, K. Y., & Vitevitch, M. S. (2010). Network structure influences speech production. Cognitive Science, 34(4), 685-697. ] --- ### Where do we get the data to make language networks? Dictionaries, thesauri, corpora, psycholinguistic megastudies, databases... You could also collect your own data to create networks. In most cases, the data is already in a format suitable for network analysis (e.g., [free association](https://smallworldofwords.org/en/project/research) and [feature listing](https://doomlab.shinyapps.io/double_words/) data are already in a form of an **edge list**). If not, the data can be easily converted into network formats with some data wrangling. Some resources: - [cooccurNet](https://github.com/csqsiew/cooccurNet): A R shiny application that generates a network of word cooccurrences given a sample text. The user can visualize the network and download a simple edge list of this network for further analysis. This application was created as a teaching tool for PL4246, Networks in Psychology. - [langnetr](https://github.com/csqsiew/langnetr): The `langnetr` package was built to help the user easily convert lists of words into network objects based on an edit distance of 1. This package would work for phonological transcriptions and orthographic representations of words (of alphabetic languages).